docs: turn_stats sink + event-driven agent badges + dashboard event vocabulary
This commit is contained in:
parent
e772182724
commit
d890509be3
3 changed files with 171 additions and 30 deletions
|
|
@ -41,17 +41,36 @@ One table:
|
|||
harness emits during turn loop execution.
|
||||
|
||||
The harness writes; the host vacuums. `hive-c0re::events_vacuum`
|
||||
runs hourly and sweeps every existing agent state dir, applying the
|
||||
same two-stage delete to each file: drop rows older than 7 days,
|
||||
then trim to the 2000 most-recent. Centralising retention on the
|
||||
host means a misbehaving harness can't disable its own vacuum and
|
||||
agents don't need any cleanup wiring of their own.
|
||||
runs hourly and sweeps every existing agent state dir, deleting
|
||||
rows older than 7 days. Age-only — no row cap — so a chatty turn
|
||||
doesn't lose history sooner than a quiet one; disk pressure on a
|
||||
sustained burst is the cheaper problem to have. Centralising
|
||||
retention on the host means a misbehaving harness can't disable
|
||||
its own vacuum and agents don't need any cleanup wiring of their
|
||||
own.
|
||||
|
||||
Path overridable via `HYPERHIVE_EVENTS_DB` (for dev / no-`/state`
|
||||
setups). On open failure the `Bus` falls back to no-store mode
|
||||
rather than crashing the harness — events still broadcast over SSE,
|
||||
just nothing persisted.
|
||||
|
||||
### `/state/hyperhive-turn-stats.sqlite` (per agent)
|
||||
|
||||
Per-turn analytics sink. One row per claude turn captures
|
||||
identity (`model`, `wake_from`, `result_kind`), timing
|
||||
(`started_at`, `ended_at`, `duration_ms`), cost (input / output /
|
||||
cache_read / cache_creation token counts), behaviour
|
||||
(`tool_call_count` + `tool_call_breakdown_json`), and post-turn
|
||||
snapshot metrics (`open_threads_count`,
|
||||
`open_reminders_count` — fetched via the same socket the harness
|
||||
already uses for `GetOpenThreads` + `CountPendingReminders`).
|
||||
Bin-loop helpers `build_row` + `record` land each row at
|
||||
`turn_end`; writes are best-effort, a sqlite hiccup logs + lets
|
||||
the turn loop continue.
|
||||
|
||||
No host-side vacuum yet — tracked in `TODO.md` under Telemetry
|
||||
(target retention ~90 days, age-only sweep like events_vacuum).
|
||||
|
||||
### `/state/hyperhive-model` (per agent)
|
||||
|
||||
Single-line text file holding the claude model name currently
|
||||
|
|
@ -68,8 +87,10 @@ Under `/var/lib/hyperhive/agents/<name>/`:
|
|||
- `config/` — the proposed nix repo (manager-editable).
|
||||
- `claude/` — claude OAuth credentials, bind-mounted RW to
|
||||
`/root/.claude` inside the container.
|
||||
- `state/` — durable notes + the events.sqlite db, bind-mounted
|
||||
to `/state` inside the container.
|
||||
- `state/` — durable notes, the events.sqlite db, and the
|
||||
turn-stats sqlite db. Bind-mounted to `/agents/<name>/state`
|
||||
inside the container (the manager still uses the legacy
|
||||
`/state` mount point — same host path either way).
|
||||
|
||||
Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
|
||||
applied repo. Tracks `flake.nix` (module-only boilerplate; never
|
||||
|
|
|
|||
123
docs/web-ui.md
123
docs/web-ui.md
|
|
@ -201,6 +201,22 @@ not ours.
|
|||
a managed container.
|
||||
- `GET /api/agent-config/{name}` — read-only view of the applied
|
||||
`agent.nix`.
|
||||
- `GET /api/state-file?path=<host-or-container-path>` — bounded
|
||||
text read of a file under the per-agent `state/` subtree or
|
||||
the shared `/var/lib/hyperhive/shared/`. Accepts the
|
||||
container-view forms (`/agents/<n>/state/...`, `/shared/...`)
|
||||
and the host form. Canonicalises + verifies the path stays
|
||||
inside the allow-list, refuses anything but a regular file,
|
||||
refuses `/agents/<n>/claude` / `config` subtrees, truncates
|
||||
bodies at 1 MiB. Backs the dashboard's inline path-link
|
||||
preview (PATH_RE detects pointer strings in message bodies,
|
||||
question/answer text, and the operator inbox; clicking
|
||||
expands a `<details>` that lazy-fetches via this endpoint).
|
||||
Trailing-slash matches (i.e. directory paths) are skipped on
|
||||
the client side — only files linkify.
|
||||
- `GET /api/reminders` — list pending reminders for the
|
||||
dashboard's queued-reminders panel.
|
||||
- `POST /cancel-reminder/{id}` — hard-delete a pending reminder.
|
||||
- `GET /dashboard/stream` — unified live event channel:
|
||||
broker `sent` / `delivered`, plus the mutation events listed
|
||||
below. Each frame carries `seq`.
|
||||
|
|
@ -223,21 +239,37 @@ payload):
|
|||
queue + history mutations. Client mutates a derived store and
|
||||
re-renders only the approvals section.
|
||||
- `question_added` (id, asker, question, options, multi,
|
||||
asked_at, deadline_at) / `question_resolved` (id, answer,
|
||||
answerer, answered_at, cancelled) — operator-targeted
|
||||
questions only (peer-to-peer questions never fire these). The
|
||||
ttl watchdog fires `question_resolved` with
|
||||
`answerer = "ttl-watchdog"` on expiry.
|
||||
asked_at, deadline_at, target) / `question_resolved` (id,
|
||||
answer, answerer, answered_at, cancelled, target) — both
|
||||
operator-targeted and peer (agent-to-agent) threads fire
|
||||
these. The dashboard's questions pane surfaces both, with
|
||||
filter chips (all / @operator / @peer / per-participant) and
|
||||
an `0V3RR1D3` button on peer rows so the operator can
|
||||
answer when an agent is stuck. The ttl watchdog fires
|
||||
`question_resolved` with `answerer = "ttl-watchdog"` on
|
||||
expiry.
|
||||
- `transient_set` (name, transient_kind, since_unix) /
|
||||
`transient_cleared` (name) — lifecycle action spinners. The
|
||||
client ticks the elapsed-seconds badge off `since_unix`
|
||||
client-side, no polling.
|
||||
- `container_state_changed` (container: ContainerView) /
|
||||
`container_removed` (name) — per-row container mutations,
|
||||
emitted by `Coordinator::rescan_containers_and_emit` from
|
||||
every mutation site (`actions::approve` post-spawn,
|
||||
`actions::destroy`, the lifecycle_action wrapper,
|
||||
`auto_update::rebuild_agent`) and from the 10s
|
||||
`crash_watch` poll. Client upserts/removes by name; the
|
||||
pending overlay is read from `transientsState` since the
|
||||
payload doesn't carry it.
|
||||
|
||||
`/api/state` still serves `approvals` / `approval_history` /
|
||||
`questions` / `question_history` / `transients` for cold-start
|
||||
on first page load and as a safety-net resync from the 5s poll;
|
||||
the client maintains the same arrays in derived stores and
|
||||
applies the events on top.
|
||||
`/api/state` is **only fetched on cold-load and on the few
|
||||
forms that mutate non-event-derived state** (PURG3 +
|
||||
meta-update, since tombstones + meta_inputs aren't event-
|
||||
shaped yet). Every other section — approvals, questions,
|
||||
transients, containers, operator inbox, message flow —
|
||||
derives from `/dashboard/stream` after the initial snapshot,
|
||||
maintaining its own client-side store and applying events on
|
||||
top. The 5s periodic poll is gone.
|
||||
|
||||
Generalised form helpers: `form[data-confirm="…"]` pops
|
||||
`confirm()` before submit; `form[data-prompt="…"]` pops
|
||||
|
|
@ -250,16 +282,34 @@ Layout, top to bottom:
|
|||
|
||||
- Banner (gradient shimmer while state=thinking).
|
||||
- Title with `↑ DASHB04RD` back-link (new tab) + `↻ R3BU1LD`.
|
||||
- Status section (online / needs login / login-in-progress).
|
||||
- **State row**: state badge + model chip + last-turn timing +
|
||||
cancel-turn button + new-session button.
|
||||
- Status section: empty when online (alive-badge in the state
|
||||
row carries the signal), populated with the login form /
|
||||
OAuth URL when `status` is `needs_login_*`.
|
||||
- **State row**: alive badge + state badge + model chip + ctx
|
||||
badge + last-turn timing + cancel-turn button + new-session
|
||||
button. Every chip carries a `title=...` tooltip with the
|
||||
detailed breakdown.
|
||||
- Alive badge: `● alive` (green) / `◌ needs login` (amber) /
|
||||
`◌ logging in` / `○ offline` / `… connecting`. Driven by
|
||||
`LiveEvent::StatusChanged`; replaces the old "harness alive
|
||||
— turn loop running" paragraph so the state row carries
|
||||
every reachability signal.
|
||||
- State badge: `💤 idle` / `🧠 thinking` / `📦 compacting` /
|
||||
`○ offline` / `… booting`, with an age suffix (`12s`,
|
||||
`2m 14s`). Driven from `/api/state.turn_state` +
|
||||
`turn_state_since`; SSE turn_start/turn_end still flip it
|
||||
instantly between polls. Authoritative source is the
|
||||
harness's `Bus::state_snapshot()`.
|
||||
- Model chip: `model · <name>` (e.g. `model · haiku`).
|
||||
`2m 14s`). Driven by `LiveEvent::TurnStateChanged`
|
||||
(`{state, since_unix}`) — the bus emits on every
|
||||
`Bus::set_state` so the badge updates without a /api/state
|
||||
refetch. Cold-load via `/api/state.turn_state` +
|
||||
`turn_state_since`.
|
||||
- Model chip: `model · <name>` (e.g. `model · haiku`). Driven
|
||||
by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
|
||||
- Ctx badge: `ctx · 142k` — total prompt tokens in the
|
||||
current context window (input + cache_read + cache_write),
|
||||
mirroring claude code's bottom-right indicator. Hover for
|
||||
the breakdown including output. Driven by
|
||||
`LiveEvent::TokenUsageChanged`; emitted from
|
||||
`Bus::record_usage` whenever the terminal `result` event
|
||||
delivers a fresh usage block.
|
||||
- Last-turn chip: `last turn 12.3s` appears after the first
|
||||
turn ends, computed from the state-since deltas.
|
||||
- `■ cancel turn` button: visible only while state=thinking,
|
||||
|
|
@ -269,6 +319,11 @@ Layout, top to bottom:
|
|||
arm a one-shot Bus flag — the next turn drops
|
||||
`--continue`, starting a fresh claude session. Subsequent
|
||||
turns resume normal `--continue`.
|
||||
|
||||
Polling: `/api/state` is fetched **once** on cold load, and
|
||||
again while `status === 'needs_login_in_progress'` (login
|
||||
session output isn't event-shaped yet). Every other badge
|
||||
updates from SSE; no periodic refresh timer runs.
|
||||
- Inbox `<details>` block (collapsed): `inbox · N` — last 30
|
||||
messages addressed to this agent, fetched via
|
||||
`AgentRequest::Recent { limit: 30 }`. (Separate from
|
||||
|
|
@ -345,14 +400,38 @@ Unknown `/foo` shows an error row instead of being silently sent.
|
|||
|
||||
### Per-agent endpoints
|
||||
|
||||
All POSTs return 200 (no 303 redirects). The matching mutations
|
||||
fire `LiveEvent` variants on the per-agent bus, so the client
|
||||
doesn't refetch `/api/state` on submit — the SSE stream
|
||||
delivers the new state faster anyway. Only the login flow still
|
||||
polls (session output streams in updates that aren't event-
|
||||
shaped).
|
||||
|
||||
- `POST /send` — operator-injected message into this agent's inbox.
|
||||
- `POST /login/{start,code,cancel}` — claude OAuth login flow.
|
||||
- `POST /api/cancel` — SIGINT the in-flight claude turn.
|
||||
Start/cancel emit `LiveEvent::StatusChanged` to flip the
|
||||
badge to/from `needs_login_in_progress`.
|
||||
- `POST /api/cancel` — SIGINT the in-flight claude turn. Emits a
|
||||
`LiveEvent::Note`.
|
||||
- `POST /api/compact` — run `/compact` on the persistent session
|
||||
(same MCP config + system prompt + allowed tools as a normal
|
||||
turn — only the stdin payload differs).
|
||||
turn — only the stdin payload differs). Flips state to
|
||||
`Compacting` via `Bus::set_state`, which emits
|
||||
`TurnStateChanged`.
|
||||
- `POST /api/model` (`model=<name>`) — switch the model for
|
||||
future turns.
|
||||
future turns. `Bus::set_model` emits `ModelChanged`.
|
||||
- `POST /api/new-session` — arm a one-shot for the next turn to
|
||||
drop `--continue`.
|
||||
drop `--continue`. Emits a `LiveEvent::Note`.
|
||||
- `GET /events/history` — replay buffer for the terminal.
|
||||
|
||||
Bus events (new vocabulary on `/events/stream`):
|
||||
|
||||
- `status_changed { status }` — `online` /
|
||||
`needs_login_idle` / `needs_login_in_progress`. Drives the
|
||||
alive-badge.
|
||||
- `model_changed { model }` — drives the model chip.
|
||||
- `token_usage_changed { usage: TokenUsage }` — drives the
|
||||
ctx-badge. Emitted from `Bus::record_usage` whenever the
|
||||
stream-json `result` event delivers a fresh usage block.
|
||||
- `turn_state_changed { state, since_unix }` — drives the
|
||||
state badge (`idle`/`thinking`/`compacting`).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue