docs: turn_stats sink + event-driven agent badges + dashboard event vocabulary

This commit is contained in:
müde 2026-05-17 23:28:34 +02:00
parent e772182724
commit d890509be3
3 changed files with 171 additions and 30 deletions

View file

@ -41,17 +41,36 @@ One table:
harness emits during turn loop execution.
The harness writes; the host vacuums. `hive-c0re::events_vacuum`
runs hourly and sweeps every existing agent state dir, applying the
same two-stage delete to each file: drop rows older than 7 days,
then trim to the 2000 most-recent. Centralising retention on the
host means a misbehaving harness can't disable its own vacuum and
agents don't need any cleanup wiring of their own.
runs hourly and sweeps every existing agent state dir, deleting
rows older than 7 days. Age-only — no row cap — so a chatty turn
doesn't lose history sooner than a quiet one; disk pressure on a
sustained burst is the cheaper problem to have. Centralising
retention on the host means a misbehaving harness can't disable
its own vacuum and agents don't need any cleanup wiring of their
own.
Path overridable via `HYPERHIVE_EVENTS_DB` (for dev / no-`/state`
setups). On open failure the `Bus` falls back to no-store mode
rather than crashing the harness — events still broadcast over SSE,
just nothing persisted.
### `/state/hyperhive-turn-stats.sqlite` (per agent)
Per-turn analytics sink. One row per claude turn captures
identity (`model`, `wake_from`, `result_kind`), timing
(`started_at`, `ended_at`, `duration_ms`), cost (input / output /
cache_read / cache_creation token counts), behaviour
(`tool_call_count` + `tool_call_breakdown_json`), and post-turn
snapshot metrics (`open_threads_count`,
`open_reminders_count` — fetched via the same socket the harness
already uses for `GetOpenThreads` + `CountPendingReminders`).
Bin-loop helpers `build_row` + `record` land each row at
`turn_end`; writes are best-effort, a sqlite hiccup logs + lets
the turn loop continue.
No host-side vacuum yet — tracked in `TODO.md` under Telemetry
(target retention ~90 days, age-only sweep like events_vacuum).
### `/state/hyperhive-model` (per agent)
Single-line text file holding the claude model name currently
@ -68,8 +87,10 @@ Under `/var/lib/hyperhive/agents/<name>/`:
- `config/` — the proposed nix repo (manager-editable).
- `claude/` — claude OAuth credentials, bind-mounted RW to
`/root/.claude` inside the container.
- `state/` — durable notes + the events.sqlite db, bind-mounted
to `/state` inside the container.
- `state/` — durable notes, the events.sqlite db, and the
turn-stats sqlite db. Bind-mounted to `/agents/<name>/state`
inside the container (the manager still uses the legacy
`/state` mount point — same host path either way).
Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
applied repo. Tracks `flake.nix` (module-only boilerplate; never

View file

@ -201,6 +201,22 @@ not ours.
a managed container.
- `GET /api/agent-config/{name}` — read-only view of the applied
`agent.nix`.
- `GET /api/state-file?path=<host-or-container-path>` — bounded
text read of a file under the per-agent `state/` subtree or
the shared `/var/lib/hyperhive/shared/`. Accepts the
container-view forms (`/agents/<n>/state/...`, `/shared/...`)
and the host form. Canonicalises + verifies the path stays
inside the allow-list, refuses anything but a regular file,
refuses `/agents/<n>/claude` / `config` subtrees, truncates
bodies at 1 MiB. Backs the dashboard's inline path-link
preview (PATH_RE detects pointer strings in message bodies,
question/answer text, and the operator inbox; clicking
expands a `<details>` that lazy-fetches via this endpoint).
Trailing-slash matches (i.e. directory paths) are skipped on
the client side — only files linkify.
- `GET /api/reminders` — list pending reminders for the
dashboard's queued-reminders panel.
- `POST /cancel-reminder/{id}` — hard-delete a pending reminder.
- `GET /dashboard/stream` — unified live event channel:
broker `sent` / `delivered`, plus the mutation events listed
below. Each frame carries `seq`.
@ -223,21 +239,37 @@ payload):
queue + history mutations. Client mutates a derived store and
re-renders only the approvals section.
- `question_added` (id, asker, question, options, multi,
asked_at, deadline_at) / `question_resolved` (id, answer,
answerer, answered_at, cancelled) — operator-targeted
questions only (peer-to-peer questions never fire these). The
ttl watchdog fires `question_resolved` with
`answerer = "ttl-watchdog"` on expiry.
asked_at, deadline_at, target) / `question_resolved` (id,
answer, answerer, answered_at, cancelled, target) — both
operator-targeted and peer (agent-to-agent) threads fire
these. The dashboard's questions pane surfaces both, with
filter chips (all / @operator / @peer / per-participant) and
an `0V3RR1D3` button on peer rows so the operator can
answer when an agent is stuck. The ttl watchdog fires
`question_resolved` with `answerer = "ttl-watchdog"` on
expiry.
- `transient_set` (name, transient_kind, since_unix) /
`transient_cleared` (name) — lifecycle action spinners. The
client ticks the elapsed-seconds badge off `since_unix`
client-side, no polling.
- `container_state_changed` (container: ContainerView) /
`container_removed` (name) — per-row container mutations,
emitted by `Coordinator::rescan_containers_and_emit` from
every mutation site (`actions::approve` post-spawn,
`actions::destroy`, the lifecycle_action wrapper,
`auto_update::rebuild_agent`) and from the 10s
`crash_watch` poll. Client upserts/removes by name; the
pending overlay is read from `transientsState` since the
payload doesn't carry it.
`/api/state` still serves `approvals` / `approval_history` /
`questions` / `question_history` / `transients` for cold-start
on first page load and as a safety-net resync from the 5s poll;
the client maintains the same arrays in derived stores and
applies the events on top.
`/api/state` is **only fetched on cold-load and on the few
forms that mutate non-event-derived state** (PURG3 +
meta-update, since tombstones + meta_inputs aren't event-
shaped yet). Every other section — approvals, questions,
transients, containers, operator inbox, message flow —
derives from `/dashboard/stream` after the initial snapshot,
maintaining its own client-side store and applying events on
top. The 5s periodic poll is gone.
Generalised form helpers: `form[data-confirm="…"]` pops
`confirm()` before submit; `form[data-prompt="…"]` pops
@ -250,16 +282,34 @@ Layout, top to bottom:
- Banner (gradient shimmer while state=thinking).
- Title with `↑ DASHB04RD` back-link (new tab) + `↻ R3BU1LD`.
- Status section (online / needs login / login-in-progress).
- **State row**: state badge + model chip + last-turn timing +
cancel-turn button + new-session button.
- Status section: empty when online (alive-badge in the state
row carries the signal), populated with the login form /
OAuth URL when `status` is `needs_login_*`.
- **State row**: alive badge + state badge + model chip + ctx
badge + last-turn timing + cancel-turn button + new-session
button. Every chip carries a `title=...` tooltip with the
detailed breakdown.
- Alive badge: `● alive` (green) / `◌ needs login` (amber) /
`◌ logging in` / `○ offline` / `… connecting`. Driven by
`LiveEvent::StatusChanged`; replaces the old "harness alive
— turn loop running" paragraph so the state row carries
every reachability signal.
- State badge: `💤 idle` / `🧠 thinking` / `📦 compacting` /
`○ offline` / `… booting`, with an age suffix (`12s`,
`2m 14s`). Driven from `/api/state.turn_state` +
`turn_state_since`; SSE turn_start/turn_end still flip it
instantly between polls. Authoritative source is the
harness's `Bus::state_snapshot()`.
- Model chip: `model · <name>` (e.g. `model · haiku`).
`2m 14s`). Driven by `LiveEvent::TurnStateChanged`
(`{state, since_unix}`) — the bus emits on every
`Bus::set_state` so the badge updates without a /api/state
refetch. Cold-load via `/api/state.turn_state` +
`turn_state_since`.
- Model chip: `model · <name>` (e.g. `model · haiku`). Driven
by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
- Ctx badge: `ctx · 142k` — total prompt tokens in the
current context window (input + cache_read + cache_write),
mirroring claude code's bottom-right indicator. Hover for
the breakdown including output. Driven by
`LiveEvent::TokenUsageChanged`; emitted from
`Bus::record_usage` whenever the terminal `result` event
delivers a fresh usage block.
- Last-turn chip: `last turn 12.3s` appears after the first
turn ends, computed from the state-since deltas.
- `■ cancel turn` button: visible only while state=thinking,
@ -269,6 +319,11 @@ Layout, top to bottom:
arm a one-shot Bus flag — the next turn drops
`--continue`, starting a fresh claude session. Subsequent
turns resume normal `--continue`.
Polling: `/api/state` is fetched **once** on cold load, and
again while `status === 'needs_login_in_progress'` (login
session output isn't event-shaped yet). Every other badge
updates from SSE; no periodic refresh timer runs.
- Inbox `<details>` block (collapsed): `inbox · N` — last 30
messages addressed to this agent, fetched via
`AgentRequest::Recent { limit: 30 }`. (Separate from
@ -345,14 +400,38 @@ Unknown `/foo` shows an error row instead of being silently sent.
### Per-agent endpoints
All POSTs return 200 (no 303 redirects). The matching mutations
fire `LiveEvent` variants on the per-agent bus, so the client
doesn't refetch `/api/state` on submit — the SSE stream
delivers the new state faster anyway. Only the login flow still
polls (session output streams in updates that aren't event-
shaped).
- `POST /send` — operator-injected message into this agent's inbox.
- `POST /login/{start,code,cancel}` — claude OAuth login flow.
- `POST /api/cancel` — SIGINT the in-flight claude turn.
Start/cancel emit `LiveEvent::StatusChanged` to flip the
badge to/from `needs_login_in_progress`.
- `POST /api/cancel` — SIGINT the in-flight claude turn. Emits a
`LiveEvent::Note`.
- `POST /api/compact` — run `/compact` on the persistent session
(same MCP config + system prompt + allowed tools as a normal
turn — only the stdin payload differs).
turn — only the stdin payload differs). Flips state to
`Compacting` via `Bus::set_state`, which emits
`TurnStateChanged`.
- `POST /api/model` (`model=<name>`) — switch the model for
future turns.
future turns. `Bus::set_model` emits `ModelChanged`.
- `POST /api/new-session` — arm a one-shot for the next turn to
drop `--continue`.
drop `--continue`. Emits a `LiveEvent::Note`.
- `GET /events/history` — replay buffer for the terminal.
Bus events (new vocabulary on `/events/stream`):
- `status_changed { status }``online` /
`needs_login_idle` / `needs_login_in_progress`. Drives the
alive-badge.
- `model_changed { model }` — drives the model chip.
- `token_usage_changed { usage: TokenUsage }` — drives the
ctx-badge. Emitted from `Bus::record_usage` whenever the
stream-json `result` event delivers a fresh usage block.
- `turn_state_changed { state, since_unix }` — drives the
state badge (`idle`/`thinking`/`compacting`).