docs: turn_stats sink + event-driven agent badges + dashboard event vocabulary

This commit is contained in:
müde 2026-05-17 23:28:34 +02:00
parent e772182724
commit d890509be3
3 changed files with 171 additions and 30 deletions

View file

@ -45,9 +45,19 @@ hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
src/crash_watch.rs poll every 10s; fire HelperEvent::ContainerCrash
when a previously-running container disappears
without an operator-initiated transient
src/container_view.rs ContainerView struct + build_all helper;
shared between dashboard.rs (cold-load via
/api/state) and coordinator.rs's
rescan_containers_and_emit
src/coordinator.rs shared state (broker/approvals/operator_questions/
transient/sockets) + tombstone enumeration +
kick_agent + notify_agent (helper-event push)
kick_agent + notify_agent (helper-event push) +
last_containers cache + rescan_and_emit diff helper
src/open_threads.rs loose-ends aggregator (pending approvals +
unanswered questions) — for_agent (filtered) and
hive_wide (manager surface). Backs
AgentRequest::GetOpenThreads + ManagerRequest::
GetOpenThreads (the get_open_threads MCP tool).
src/actions.rs approve/deny/destroy (transient-aware)
src/auto_update.rs startup rebuild scan + ensure_manager +
meta::lock_update_hyperhive bump
@ -85,6 +95,9 @@ hive-ag3nt/ in-container harness crate; produces TWO binaries
src/client.rs generic JSON-line request/response over unix socket
src/web_ui.rs per-container axum HTTP page (incl /api/cancel,
/api/compact, /api/model, /events/history)
src/turn_stats.rs per-turn analytics sink (one sqlite row per
turn at /state/hyperhive-turn-stats.sqlite);
schema + best-effort writer
src/events.rs LiveEvent + broadcast Bus + sqlite-backed history
(/state/hyperhive-events.sqlite) + TurnState +
model selection (persisted at /state/hyperhive-model)
@ -193,6 +206,34 @@ Prune freely.
domain tooling — the agent flake's `inputs` block pulls
the external flake, `agent.nix` references it via
`flakeInputs.<name>.packages.${pkgs.system}.default`.
- **Just landed:** per-turn analytics sink. New
`hive-ag3nt::turn_stats` writes one row per claude turn to
`/state/hyperhive-turn-stats.sqlite`: identity (model,
wake_from, result_kind), timing (started/ended_at,
duration_ms), cost (full token-usage breakdown), behaviour
(tool_call_count + per-tool JSON map), and post-turn snapshot
metrics (open_threads_count, open_reminders_count fetched via
the existing GetOpenThreads + new CountPendingReminders RPC).
Both ag3nt + m1nd bin loops capture, both Bus accumulates
tool_use blocks via observe_stream during the stdout pump.
Writes are best-effort. No host-side vacuum yet — TODO under
Telemetry; same shape as events_vacuum, target 90d retention.
- **Just landed:** agent web UI event-driven badges. New
`LiveEvent::StatusChanged / ModelChanged / TokenUsageChanged
/ TurnStateChanged` variants replace the per-agent page's
/api/state polling for the state row. Status/model/token/state
badges all update from SSE; /api/state only fetched on cold
load + during the login flow (session output isn't event-
shaped). Per-agent endpoints (`/api/cancel|compact|model|
new-session`, `/login/*`) all flip 303→200. New `alive-badge`
chip carries the harness reachability signal (replaces the
"● harness alive" paragraph); new `ctx-badge` mirrors Claude
Code's bottom-right "N tokens" indicator. Every chip carries
a `title=...` tooltip for hover detail.
- **Just landed:** events_vacuum simplified to age-only —
`KEEP_SECS = 7d`, no row cap. Chatty turn no longer evicts
a quiet day's history sooner than expected. Hourly sweep
unchanged.
- **Just landed:** Phase 6 container events. New
`DashboardEvent::ContainerStateChanged { container }` +
`ContainerRemoved { name }` close the last refetch loop on the

View file

@ -41,17 +41,36 @@ One table:
harness emits during turn loop execution.
The harness writes; the host vacuums. `hive-c0re::events_vacuum`
runs hourly and sweeps every existing agent state dir, applying the
same two-stage delete to each file: drop rows older than 7 days,
then trim to the 2000 most-recent. Centralising retention on the
host means a misbehaving harness can't disable its own vacuum and
agents don't need any cleanup wiring of their own.
runs hourly and sweeps every existing agent state dir, deleting
rows older than 7 days. Age-only — no row cap — so a chatty turn
doesn't lose history sooner than a quiet one; disk pressure on a
sustained burst is the cheaper problem to have. Centralising
retention on the host means a misbehaving harness can't disable
its own vacuum and agents don't need any cleanup wiring of their
own.
Path overridable via `HYPERHIVE_EVENTS_DB` (for dev / no-`/state`
setups). On open failure the `Bus` falls back to no-store mode
rather than crashing the harness — events still broadcast over SSE,
just nothing persisted.
### `/state/hyperhive-turn-stats.sqlite` (per agent)
Per-turn analytics sink. One row per claude turn captures
identity (`model`, `wake_from`, `result_kind`), timing
(`started_at`, `ended_at`, `duration_ms`), cost (input / output /
cache_read / cache_creation token counts), behaviour
(`tool_call_count` + `tool_call_breakdown_json`), and post-turn
snapshot metrics (`open_threads_count`,
`open_reminders_count` — fetched via the same socket the harness
already uses for `GetOpenThreads` + `CountPendingReminders`).
Bin-loop helpers `build_row` + `record` land each row at
`turn_end`; writes are best-effort, a sqlite hiccup logs + lets
the turn loop continue.
No host-side vacuum yet — tracked in `TODO.md` under Telemetry
(target retention ~90 days, age-only sweep like events_vacuum).
### `/state/hyperhive-model` (per agent)
Single-line text file holding the claude model name currently
@ -68,8 +87,10 @@ Under `/var/lib/hyperhive/agents/<name>/`:
- `config/` — the proposed nix repo (manager-editable).
- `claude/` — claude OAuth credentials, bind-mounted RW to
`/root/.claude` inside the container.
- `state/` — durable notes + the events.sqlite db, bind-mounted
to `/state` inside the container.
- `state/` — durable notes, the events.sqlite db, and the
turn-stats sqlite db. Bind-mounted to `/agents/<name>/state`
inside the container (the manager still uses the legacy
`/state` mount point — same host path either way).
Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
applied repo. Tracks `flake.nix` (module-only boilerplate; never

View file

@ -201,6 +201,22 @@ not ours.
a managed container.
- `GET /api/agent-config/{name}` — read-only view of the applied
`agent.nix`.
- `GET /api/state-file?path=<host-or-container-path>` — bounded
text read of a file under the per-agent `state/` subtree or
the shared `/var/lib/hyperhive/shared/`. Accepts the
container-view forms (`/agents/<n>/state/...`, `/shared/...`)
and the host form. Canonicalises + verifies the path stays
inside the allow-list, refuses anything but a regular file,
refuses `/agents/<n>/claude` / `config` subtrees, truncates
bodies at 1 MiB. Backs the dashboard's inline path-link
preview (PATH_RE detects pointer strings in message bodies,
question/answer text, and the operator inbox; clicking
expands a `<details>` that lazy-fetches via this endpoint).
Trailing-slash matches (i.e. directory paths) are skipped on
the client side — only files linkify.
- `GET /api/reminders` — list pending reminders for the
dashboard's queued-reminders panel.
- `POST /cancel-reminder/{id}` — hard-delete a pending reminder.
- `GET /dashboard/stream` — unified live event channel:
broker `sent` / `delivered`, plus the mutation events listed
below. Each frame carries `seq`.
@ -223,21 +239,37 @@ payload):
queue + history mutations. Client mutates a derived store and
re-renders only the approvals section.
- `question_added` (id, asker, question, options, multi,
asked_at, deadline_at) / `question_resolved` (id, answer,
answerer, answered_at, cancelled) — operator-targeted
questions only (peer-to-peer questions never fire these). The
ttl watchdog fires `question_resolved` with
`answerer = "ttl-watchdog"` on expiry.
asked_at, deadline_at, target) / `question_resolved` (id,
answer, answerer, answered_at, cancelled, target) — both
operator-targeted and peer (agent-to-agent) threads fire
these. The dashboard's questions pane surfaces both, with
filter chips (all / @operator / @peer / per-participant) and
an `0V3RR1D3` button on peer rows so the operator can
answer when an agent is stuck. The ttl watchdog fires
`question_resolved` with `answerer = "ttl-watchdog"` on
expiry.
- `transient_set` (name, transient_kind, since_unix) /
`transient_cleared` (name) — lifecycle action spinners. The
client ticks the elapsed-seconds badge off `since_unix`
client-side, no polling.
- `container_state_changed` (container: ContainerView) /
`container_removed` (name) — per-row container mutations,
emitted by `Coordinator::rescan_containers_and_emit` from
every mutation site (`actions::approve` post-spawn,
`actions::destroy`, the lifecycle_action wrapper,
`auto_update::rebuild_agent`) and from the 10s
`crash_watch` poll. Client upserts/removes by name; the
pending overlay is read from `transientsState` since the
payload doesn't carry it.
`/api/state` still serves `approvals` / `approval_history` /
`questions` / `question_history` / `transients` for cold-start
on first page load and as a safety-net resync from the 5s poll;
the client maintains the same arrays in derived stores and
applies the events on top.
`/api/state` is **only fetched on cold-load and on the few
forms that mutate non-event-derived state** (PURG3 +
meta-update, since tombstones + meta_inputs aren't event-
shaped yet). Every other section — approvals, questions,
transients, containers, operator inbox, message flow —
derives from `/dashboard/stream` after the initial snapshot,
maintaining its own client-side store and applying events on
top. The 5s periodic poll is gone.
Generalised form helpers: `form[data-confirm="…"]` pops
`confirm()` before submit; `form[data-prompt="…"]` pops
@ -250,16 +282,34 @@ Layout, top to bottom:
- Banner (gradient shimmer while state=thinking).
- Title with `↑ DASHB04RD` back-link (new tab) + `↻ R3BU1LD`.
- Status section (online / needs login / login-in-progress).
- **State row**: state badge + model chip + last-turn timing +
cancel-turn button + new-session button.
- Status section: empty when online (alive-badge in the state
row carries the signal), populated with the login form /
OAuth URL when `status` is `needs_login_*`.
- **State row**: alive badge + state badge + model chip + ctx
badge + last-turn timing + cancel-turn button + new-session
button. Every chip carries a `title=...` tooltip with the
detailed breakdown.
- Alive badge: `● alive` (green) / `◌ needs login` (amber) /
`◌ logging in` / `○ offline` / `… connecting`. Driven by
`LiveEvent::StatusChanged`; replaces the old "harness alive
— turn loop running" paragraph so the state row carries
every reachability signal.
- State badge: `💤 idle` / `🧠 thinking` / `📦 compacting` /
`○ offline` / `… booting`, with an age suffix (`12s`,
`2m 14s`). Driven from `/api/state.turn_state` +
`turn_state_since`; SSE turn_start/turn_end still flip it
instantly between polls. Authoritative source is the
harness's `Bus::state_snapshot()`.
- Model chip: `model · <name>` (e.g. `model · haiku`).
`2m 14s`). Driven by `LiveEvent::TurnStateChanged`
(`{state, since_unix}`) — the bus emits on every
`Bus::set_state` so the badge updates without a /api/state
refetch. Cold-load via `/api/state.turn_state` +
`turn_state_since`.
- Model chip: `model · <name>` (e.g. `model · haiku`). Driven
by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
- Ctx badge: `ctx · 142k` — total prompt tokens in the
current context window (input + cache_read + cache_write),
mirroring claude code's bottom-right indicator. Hover for
the breakdown including output. Driven by
`LiveEvent::TokenUsageChanged`; emitted from
`Bus::record_usage` whenever the terminal `result` event
delivers a fresh usage block.
- Last-turn chip: `last turn 12.3s` appears after the first
turn ends, computed from the state-since deltas.
- `■ cancel turn` button: visible only while state=thinking,
@ -269,6 +319,11 @@ Layout, top to bottom:
arm a one-shot Bus flag — the next turn drops
`--continue`, starting a fresh claude session. Subsequent
turns resume normal `--continue`.
Polling: `/api/state` is fetched **once** on cold load, and
again while `status === 'needs_login_in_progress'` (login
session output isn't event-shaped yet). Every other badge
updates from SSE; no periodic refresh timer runs.
- Inbox `<details>` block (collapsed): `inbox · N` — last 30
messages addressed to this agent, fetched via
`AgentRequest::Recent { limit: 30 }`. (Separate from
@ -345,14 +400,38 @@ Unknown `/foo` shows an error row instead of being silently sent.
### Per-agent endpoints
All POSTs return 200 (no 303 redirects). The matching mutations
fire `LiveEvent` variants on the per-agent bus, so the client
doesn't refetch `/api/state` on submit — the SSE stream
delivers the new state faster anyway. Only the login flow still
polls (session output streams in updates that aren't event-
shaped).
- `POST /send` — operator-injected message into this agent's inbox.
- `POST /login/{start,code,cancel}` — claude OAuth login flow.
- `POST /api/cancel` — SIGINT the in-flight claude turn.
Start/cancel emit `LiveEvent::StatusChanged` to flip the
badge to/from `needs_login_in_progress`.
- `POST /api/cancel` — SIGINT the in-flight claude turn. Emits a
`LiveEvent::Note`.
- `POST /api/compact` — run `/compact` on the persistent session
(same MCP config + system prompt + allowed tools as a normal
turn — only the stdin payload differs).
turn — only the stdin payload differs). Flips state to
`Compacting` via `Bus::set_state`, which emits
`TurnStateChanged`.
- `POST /api/model` (`model=<name>`) — switch the model for
future turns.
future turns. `Bus::set_model` emits `ModelChanged`.
- `POST /api/new-session` — arm a one-shot for the next turn to
drop `--continue`.
drop `--continue`. Emits a `LiveEvent::Note`.
- `GET /events/history` — replay buffer for the terminal.
Bus events (new vocabulary on `/events/stream`):
- `status_changed { status }``online` /
`needs_login_idle` / `needs_login_in_progress`. Drives the
alive-badge.
- `model_changed { model }` — drives the model chip.
- `token_usage_changed { usage: TokenUsage }` — drives the
ctx-badge. Emitted from `Bus::record_usage` whenever the
stream-json `result` event delivers a fresh usage block.
- `turn_state_changed { state, since_unix }` — drives the
state badge (`idle`/`thinking`/`compacting`).