docs: turn_stats sink + event-driven agent badges + dashboard event vocabulary
This commit is contained in:
parent
e772182724
commit
d890509be3
3 changed files with 171 additions and 30 deletions
43
CLAUDE.md
43
CLAUDE.md
|
|
@ -45,9 +45,19 @@ hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
|
||||||
src/crash_watch.rs poll every 10s; fire HelperEvent::ContainerCrash
|
src/crash_watch.rs poll every 10s; fire HelperEvent::ContainerCrash
|
||||||
when a previously-running container disappears
|
when a previously-running container disappears
|
||||||
without an operator-initiated transient
|
without an operator-initiated transient
|
||||||
|
src/container_view.rs ContainerView struct + build_all helper;
|
||||||
|
shared between dashboard.rs (cold-load via
|
||||||
|
/api/state) and coordinator.rs's
|
||||||
|
rescan_containers_and_emit
|
||||||
src/coordinator.rs shared state (broker/approvals/operator_questions/
|
src/coordinator.rs shared state (broker/approvals/operator_questions/
|
||||||
transient/sockets) + tombstone enumeration +
|
transient/sockets) + tombstone enumeration +
|
||||||
kick_agent + notify_agent (helper-event push)
|
kick_agent + notify_agent (helper-event push) +
|
||||||
|
last_containers cache + rescan_and_emit diff helper
|
||||||
|
src/open_threads.rs loose-ends aggregator (pending approvals +
|
||||||
|
unanswered questions) — for_agent (filtered) and
|
||||||
|
hive_wide (manager surface). Backs
|
||||||
|
AgentRequest::GetOpenThreads + ManagerRequest::
|
||||||
|
GetOpenThreads (the get_open_threads MCP tool).
|
||||||
src/actions.rs approve/deny/destroy (transient-aware)
|
src/actions.rs approve/deny/destroy (transient-aware)
|
||||||
src/auto_update.rs startup rebuild scan + ensure_manager +
|
src/auto_update.rs startup rebuild scan + ensure_manager +
|
||||||
meta::lock_update_hyperhive bump
|
meta::lock_update_hyperhive bump
|
||||||
|
|
@ -85,6 +95,9 @@ hive-ag3nt/ in-container harness crate; produces TWO binaries
|
||||||
src/client.rs generic JSON-line request/response over unix socket
|
src/client.rs generic JSON-line request/response over unix socket
|
||||||
src/web_ui.rs per-container axum HTTP page (incl /api/cancel,
|
src/web_ui.rs per-container axum HTTP page (incl /api/cancel,
|
||||||
/api/compact, /api/model, /events/history)
|
/api/compact, /api/model, /events/history)
|
||||||
|
src/turn_stats.rs per-turn analytics sink (one sqlite row per
|
||||||
|
turn at /state/hyperhive-turn-stats.sqlite);
|
||||||
|
schema + best-effort writer
|
||||||
src/events.rs LiveEvent + broadcast Bus + sqlite-backed history
|
src/events.rs LiveEvent + broadcast Bus + sqlite-backed history
|
||||||
(/state/hyperhive-events.sqlite) + TurnState +
|
(/state/hyperhive-events.sqlite) + TurnState +
|
||||||
model selection (persisted at /state/hyperhive-model)
|
model selection (persisted at /state/hyperhive-model)
|
||||||
|
|
@ -193,6 +206,34 @@ Prune freely.
|
||||||
domain tooling — the agent flake's `inputs` block pulls
|
domain tooling — the agent flake's `inputs` block pulls
|
||||||
the external flake, `agent.nix` references it via
|
the external flake, `agent.nix` references it via
|
||||||
`flakeInputs.<name>.packages.${pkgs.system}.default`.
|
`flakeInputs.<name>.packages.${pkgs.system}.default`.
|
||||||
|
- **Just landed:** per-turn analytics sink. New
|
||||||
|
`hive-ag3nt::turn_stats` writes one row per claude turn to
|
||||||
|
`/state/hyperhive-turn-stats.sqlite`: identity (model,
|
||||||
|
wake_from, result_kind), timing (started/ended_at,
|
||||||
|
duration_ms), cost (full token-usage breakdown), behaviour
|
||||||
|
(tool_call_count + per-tool JSON map), and post-turn snapshot
|
||||||
|
metrics (open_threads_count, open_reminders_count fetched via
|
||||||
|
the existing GetOpenThreads + new CountPendingReminders RPC).
|
||||||
|
Both ag3nt + m1nd bin loops capture, both Bus accumulates
|
||||||
|
tool_use blocks via observe_stream during the stdout pump.
|
||||||
|
Writes are best-effort. No host-side vacuum yet — TODO under
|
||||||
|
Telemetry; same shape as events_vacuum, target 90d retention.
|
||||||
|
- **Just landed:** agent web UI event-driven badges. New
|
||||||
|
`LiveEvent::StatusChanged / ModelChanged / TokenUsageChanged
|
||||||
|
/ TurnStateChanged` variants replace the per-agent page's
|
||||||
|
/api/state polling for the state row. Status/model/token/state
|
||||||
|
badges all update from SSE; /api/state only fetched on cold
|
||||||
|
load + during the login flow (session output isn't event-
|
||||||
|
shaped). Per-agent endpoints (`/api/cancel|compact|model|
|
||||||
|
new-session`, `/login/*`) all flip 303→200. New `alive-badge`
|
||||||
|
chip carries the harness reachability signal (replaces the
|
||||||
|
"● harness alive" paragraph); new `ctx-badge` mirrors Claude
|
||||||
|
Code's bottom-right "N tokens" indicator. Every chip carries
|
||||||
|
a `title=...` tooltip for hover detail.
|
||||||
|
- **Just landed:** events_vacuum simplified to age-only —
|
||||||
|
`KEEP_SECS = 7d`, no row cap. Chatty turn no longer evicts
|
||||||
|
a quiet day's history sooner than expected. Hourly sweep
|
||||||
|
unchanged.
|
||||||
- **Just landed:** Phase 6 container events. New
|
- **Just landed:** Phase 6 container events. New
|
||||||
`DashboardEvent::ContainerStateChanged { container }` +
|
`DashboardEvent::ContainerStateChanged { container }` +
|
||||||
`ContainerRemoved { name }` close the last refetch loop on the
|
`ContainerRemoved { name }` close the last refetch loop on the
|
||||||
|
|
|
||||||
|
|
@ -41,17 +41,36 @@ One table:
|
||||||
harness emits during turn loop execution.
|
harness emits during turn loop execution.
|
||||||
|
|
||||||
The harness writes; the host vacuums. `hive-c0re::events_vacuum`
|
The harness writes; the host vacuums. `hive-c0re::events_vacuum`
|
||||||
runs hourly and sweeps every existing agent state dir, applying the
|
runs hourly and sweeps every existing agent state dir, deleting
|
||||||
same two-stage delete to each file: drop rows older than 7 days,
|
rows older than 7 days. Age-only — no row cap — so a chatty turn
|
||||||
then trim to the 2000 most-recent. Centralising retention on the
|
doesn't lose history sooner than a quiet one; disk pressure on a
|
||||||
host means a misbehaving harness can't disable its own vacuum and
|
sustained burst is the cheaper problem to have. Centralising
|
||||||
agents don't need any cleanup wiring of their own.
|
retention on the host means a misbehaving harness can't disable
|
||||||
|
its own vacuum and agents don't need any cleanup wiring of their
|
||||||
|
own.
|
||||||
|
|
||||||
Path overridable via `HYPERHIVE_EVENTS_DB` (for dev / no-`/state`
|
Path overridable via `HYPERHIVE_EVENTS_DB` (for dev / no-`/state`
|
||||||
setups). On open failure the `Bus` falls back to no-store mode
|
setups). On open failure the `Bus` falls back to no-store mode
|
||||||
rather than crashing the harness — events still broadcast over SSE,
|
rather than crashing the harness — events still broadcast over SSE,
|
||||||
just nothing persisted.
|
just nothing persisted.
|
||||||
|
|
||||||
|
### `/state/hyperhive-turn-stats.sqlite` (per agent)
|
||||||
|
|
||||||
|
Per-turn analytics sink. One row per claude turn captures
|
||||||
|
identity (`model`, `wake_from`, `result_kind`), timing
|
||||||
|
(`started_at`, `ended_at`, `duration_ms`), cost (input / output /
|
||||||
|
cache_read / cache_creation token counts), behaviour
|
||||||
|
(`tool_call_count` + `tool_call_breakdown_json`), and post-turn
|
||||||
|
snapshot metrics (`open_threads_count`,
|
||||||
|
`open_reminders_count` — fetched via the same socket the harness
|
||||||
|
already uses for `GetOpenThreads` + `CountPendingReminders`).
|
||||||
|
Bin-loop helpers `build_row` + `record` land each row at
|
||||||
|
`turn_end`; writes are best-effort, a sqlite hiccup logs + lets
|
||||||
|
the turn loop continue.
|
||||||
|
|
||||||
|
No host-side vacuum yet — tracked in `TODO.md` under Telemetry
|
||||||
|
(target retention ~90 days, age-only sweep like events_vacuum).
|
||||||
|
|
||||||
### `/state/hyperhive-model` (per agent)
|
### `/state/hyperhive-model` (per agent)
|
||||||
|
|
||||||
Single-line text file holding the claude model name currently
|
Single-line text file holding the claude model name currently
|
||||||
|
|
@ -68,8 +87,10 @@ Under `/var/lib/hyperhive/agents/<name>/`:
|
||||||
- `config/` — the proposed nix repo (manager-editable).
|
- `config/` — the proposed nix repo (manager-editable).
|
||||||
- `claude/` — claude OAuth credentials, bind-mounted RW to
|
- `claude/` — claude OAuth credentials, bind-mounted RW to
|
||||||
`/root/.claude` inside the container.
|
`/root/.claude` inside the container.
|
||||||
- `state/` — durable notes + the events.sqlite db, bind-mounted
|
- `state/` — durable notes, the events.sqlite db, and the
|
||||||
to `/state` inside the container.
|
turn-stats sqlite db. Bind-mounted to `/agents/<name>/state`
|
||||||
|
inside the container (the manager still uses the legacy
|
||||||
|
`/state` mount point — same host path either way).
|
||||||
|
|
||||||
Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
|
Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
|
||||||
applied repo. Tracks `flake.nix` (module-only boilerplate; never
|
applied repo. Tracks `flake.nix` (module-only boilerplate; never
|
||||||
|
|
|
||||||
123
docs/web-ui.md
123
docs/web-ui.md
|
|
@ -201,6 +201,22 @@ not ours.
|
||||||
a managed container.
|
a managed container.
|
||||||
- `GET /api/agent-config/{name}` — read-only view of the applied
|
- `GET /api/agent-config/{name}` — read-only view of the applied
|
||||||
`agent.nix`.
|
`agent.nix`.
|
||||||
|
- `GET /api/state-file?path=<host-or-container-path>` — bounded
|
||||||
|
text read of a file under the per-agent `state/` subtree or
|
||||||
|
the shared `/var/lib/hyperhive/shared/`. Accepts the
|
||||||
|
container-view forms (`/agents/<n>/state/...`, `/shared/...`)
|
||||||
|
and the host form. Canonicalises + verifies the path stays
|
||||||
|
inside the allow-list, refuses anything but a regular file,
|
||||||
|
refuses `/agents/<n>/claude` / `config` subtrees, truncates
|
||||||
|
bodies at 1 MiB. Backs the dashboard's inline path-link
|
||||||
|
preview (PATH_RE detects pointer strings in message bodies,
|
||||||
|
question/answer text, and the operator inbox; clicking
|
||||||
|
expands a `<details>` that lazy-fetches via this endpoint).
|
||||||
|
Trailing-slash matches (i.e. directory paths) are skipped on
|
||||||
|
the client side — only files linkify.
|
||||||
|
- `GET /api/reminders` — list pending reminders for the
|
||||||
|
dashboard's queued-reminders panel.
|
||||||
|
- `POST /cancel-reminder/{id}` — hard-delete a pending reminder.
|
||||||
- `GET /dashboard/stream` — unified live event channel:
|
- `GET /dashboard/stream` — unified live event channel:
|
||||||
broker `sent` / `delivered`, plus the mutation events listed
|
broker `sent` / `delivered`, plus the mutation events listed
|
||||||
below. Each frame carries `seq`.
|
below. Each frame carries `seq`.
|
||||||
|
|
@ -223,21 +239,37 @@ payload):
|
||||||
queue + history mutations. Client mutates a derived store and
|
queue + history mutations. Client mutates a derived store and
|
||||||
re-renders only the approvals section.
|
re-renders only the approvals section.
|
||||||
- `question_added` (id, asker, question, options, multi,
|
- `question_added` (id, asker, question, options, multi,
|
||||||
asked_at, deadline_at) / `question_resolved` (id, answer,
|
asked_at, deadline_at, target) / `question_resolved` (id,
|
||||||
answerer, answered_at, cancelled) — operator-targeted
|
answer, answerer, answered_at, cancelled, target) — both
|
||||||
questions only (peer-to-peer questions never fire these). The
|
operator-targeted and peer (agent-to-agent) threads fire
|
||||||
ttl watchdog fires `question_resolved` with
|
these. The dashboard's questions pane surfaces both, with
|
||||||
`answerer = "ttl-watchdog"` on expiry.
|
filter chips (all / @operator / @peer / per-participant) and
|
||||||
|
an `0V3RR1D3` button on peer rows so the operator can
|
||||||
|
answer when an agent is stuck. The ttl watchdog fires
|
||||||
|
`question_resolved` with `answerer = "ttl-watchdog"` on
|
||||||
|
expiry.
|
||||||
- `transient_set` (name, transient_kind, since_unix) /
|
- `transient_set` (name, transient_kind, since_unix) /
|
||||||
`transient_cleared` (name) — lifecycle action spinners. The
|
`transient_cleared` (name) — lifecycle action spinners. The
|
||||||
client ticks the elapsed-seconds badge off `since_unix`
|
client ticks the elapsed-seconds badge off `since_unix`
|
||||||
client-side, no polling.
|
client-side, no polling.
|
||||||
|
- `container_state_changed` (container: ContainerView) /
|
||||||
|
`container_removed` (name) — per-row container mutations,
|
||||||
|
emitted by `Coordinator::rescan_containers_and_emit` from
|
||||||
|
every mutation site (`actions::approve` post-spawn,
|
||||||
|
`actions::destroy`, the lifecycle_action wrapper,
|
||||||
|
`auto_update::rebuild_agent`) and from the 10s
|
||||||
|
`crash_watch` poll. Client upserts/removes by name; the
|
||||||
|
pending overlay is read from `transientsState` since the
|
||||||
|
payload doesn't carry it.
|
||||||
|
|
||||||
`/api/state` still serves `approvals` / `approval_history` /
|
`/api/state` is **only fetched on cold-load and on the few
|
||||||
`questions` / `question_history` / `transients` for cold-start
|
forms that mutate non-event-derived state** (PURG3 +
|
||||||
on first page load and as a safety-net resync from the 5s poll;
|
meta-update, since tombstones + meta_inputs aren't event-
|
||||||
the client maintains the same arrays in derived stores and
|
shaped yet). Every other section — approvals, questions,
|
||||||
applies the events on top.
|
transients, containers, operator inbox, message flow —
|
||||||
|
derives from `/dashboard/stream` after the initial snapshot,
|
||||||
|
maintaining its own client-side store and applying events on
|
||||||
|
top. The 5s periodic poll is gone.
|
||||||
|
|
||||||
Generalised form helpers: `form[data-confirm="…"]` pops
|
Generalised form helpers: `form[data-confirm="…"]` pops
|
||||||
`confirm()` before submit; `form[data-prompt="…"]` pops
|
`confirm()` before submit; `form[data-prompt="…"]` pops
|
||||||
|
|
@ -250,16 +282,34 @@ Layout, top to bottom:
|
||||||
|
|
||||||
- Banner (gradient shimmer while state=thinking).
|
- Banner (gradient shimmer while state=thinking).
|
||||||
- Title with `↑ DASHB04RD` back-link (new tab) + `↻ R3BU1LD`.
|
- Title with `↑ DASHB04RD` back-link (new tab) + `↻ R3BU1LD`.
|
||||||
- Status section (online / needs login / login-in-progress).
|
- Status section: empty when online (alive-badge in the state
|
||||||
- **State row**: state badge + model chip + last-turn timing +
|
row carries the signal), populated with the login form /
|
||||||
cancel-turn button + new-session button.
|
OAuth URL when `status` is `needs_login_*`.
|
||||||
|
- **State row**: alive badge + state badge + model chip + ctx
|
||||||
|
badge + last-turn timing + cancel-turn button + new-session
|
||||||
|
button. Every chip carries a `title=...` tooltip with the
|
||||||
|
detailed breakdown.
|
||||||
|
- Alive badge: `● alive` (green) / `◌ needs login` (amber) /
|
||||||
|
`◌ logging in` / `○ offline` / `… connecting`. Driven by
|
||||||
|
`LiveEvent::StatusChanged`; replaces the old "harness alive
|
||||||
|
— turn loop running" paragraph so the state row carries
|
||||||
|
every reachability signal.
|
||||||
- State badge: `💤 idle` / `🧠 thinking` / `📦 compacting` /
|
- State badge: `💤 idle` / `🧠 thinking` / `📦 compacting` /
|
||||||
`○ offline` / `… booting`, with an age suffix (`12s`,
|
`○ offline` / `… booting`, with an age suffix (`12s`,
|
||||||
`2m 14s`). Driven from `/api/state.turn_state` +
|
`2m 14s`). Driven by `LiveEvent::TurnStateChanged`
|
||||||
`turn_state_since`; SSE turn_start/turn_end still flip it
|
(`{state, since_unix}`) — the bus emits on every
|
||||||
instantly between polls. Authoritative source is the
|
`Bus::set_state` so the badge updates without a /api/state
|
||||||
harness's `Bus::state_snapshot()`.
|
refetch. Cold-load via `/api/state.turn_state` +
|
||||||
- Model chip: `model · <name>` (e.g. `model · haiku`).
|
`turn_state_since`.
|
||||||
|
- Model chip: `model · <name>` (e.g. `model · haiku`). Driven
|
||||||
|
by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
|
||||||
|
- Ctx badge: `ctx · 142k` — total prompt tokens in the
|
||||||
|
current context window (input + cache_read + cache_write),
|
||||||
|
mirroring claude code's bottom-right indicator. Hover for
|
||||||
|
the breakdown including output. Driven by
|
||||||
|
`LiveEvent::TokenUsageChanged`; emitted from
|
||||||
|
`Bus::record_usage` whenever the terminal `result` event
|
||||||
|
delivers a fresh usage block.
|
||||||
- Last-turn chip: `last turn 12.3s` appears after the first
|
- Last-turn chip: `last turn 12.3s` appears after the first
|
||||||
turn ends, computed from the state-since deltas.
|
turn ends, computed from the state-since deltas.
|
||||||
- `■ cancel turn` button: visible only while state=thinking,
|
- `■ cancel turn` button: visible only while state=thinking,
|
||||||
|
|
@ -269,6 +319,11 @@ Layout, top to bottom:
|
||||||
arm a one-shot Bus flag — the next turn drops
|
arm a one-shot Bus flag — the next turn drops
|
||||||
`--continue`, starting a fresh claude session. Subsequent
|
`--continue`, starting a fresh claude session. Subsequent
|
||||||
turns resume normal `--continue`.
|
turns resume normal `--continue`.
|
||||||
|
|
||||||
|
Polling: `/api/state` is fetched **once** on cold load, and
|
||||||
|
again while `status === 'needs_login_in_progress'` (login
|
||||||
|
session output isn't event-shaped yet). Every other badge
|
||||||
|
updates from SSE; no periodic refresh timer runs.
|
||||||
- Inbox `<details>` block (collapsed): `inbox · N` — last 30
|
- Inbox `<details>` block (collapsed): `inbox · N` — last 30
|
||||||
messages addressed to this agent, fetched via
|
messages addressed to this agent, fetched via
|
||||||
`AgentRequest::Recent { limit: 30 }`. (Separate from
|
`AgentRequest::Recent { limit: 30 }`. (Separate from
|
||||||
|
|
@ -345,14 +400,38 @@ Unknown `/foo` shows an error row instead of being silently sent.
|
||||||
|
|
||||||
### Per-agent endpoints
|
### Per-agent endpoints
|
||||||
|
|
||||||
|
All POSTs return 200 (no 303 redirects). The matching mutations
|
||||||
|
fire `LiveEvent` variants on the per-agent bus, so the client
|
||||||
|
doesn't refetch `/api/state` on submit — the SSE stream
|
||||||
|
delivers the new state faster anyway. Only the login flow still
|
||||||
|
polls (session output streams in updates that aren't event-
|
||||||
|
shaped).
|
||||||
|
|
||||||
- `POST /send` — operator-injected message into this agent's inbox.
|
- `POST /send` — operator-injected message into this agent's inbox.
|
||||||
- `POST /login/{start,code,cancel}` — claude OAuth login flow.
|
- `POST /login/{start,code,cancel}` — claude OAuth login flow.
|
||||||
- `POST /api/cancel` — SIGINT the in-flight claude turn.
|
Start/cancel emit `LiveEvent::StatusChanged` to flip the
|
||||||
|
badge to/from `needs_login_in_progress`.
|
||||||
|
- `POST /api/cancel` — SIGINT the in-flight claude turn. Emits a
|
||||||
|
`LiveEvent::Note`.
|
||||||
- `POST /api/compact` — run `/compact` on the persistent session
|
- `POST /api/compact` — run `/compact` on the persistent session
|
||||||
(same MCP config + system prompt + allowed tools as a normal
|
(same MCP config + system prompt + allowed tools as a normal
|
||||||
turn — only the stdin payload differs).
|
turn — only the stdin payload differs). Flips state to
|
||||||
|
`Compacting` via `Bus::set_state`, which emits
|
||||||
|
`TurnStateChanged`.
|
||||||
- `POST /api/model` (`model=<name>`) — switch the model for
|
- `POST /api/model` (`model=<name>`) — switch the model for
|
||||||
future turns.
|
future turns. `Bus::set_model` emits `ModelChanged`.
|
||||||
- `POST /api/new-session` — arm a one-shot for the next turn to
|
- `POST /api/new-session` — arm a one-shot for the next turn to
|
||||||
drop `--continue`.
|
drop `--continue`. Emits a `LiveEvent::Note`.
|
||||||
- `GET /events/history` — replay buffer for the terminal.
|
- `GET /events/history` — replay buffer for the terminal.
|
||||||
|
|
||||||
|
Bus events (new vocabulary on `/events/stream`):
|
||||||
|
|
||||||
|
- `status_changed { status }` — `online` /
|
||||||
|
`needs_login_idle` / `needs_login_in_progress`. Drives the
|
||||||
|
alive-badge.
|
||||||
|
- `model_changed { model }` — drives the model chip.
|
||||||
|
- `token_usage_changed { usage: TokenUsage }` — drives the
|
||||||
|
ctx-badge. Emitted from `Bus::record_usage` whenever the
|
||||||
|
stream-json `result` event delivers a fresh usage block.
|
||||||
|
- `turn_state_changed { state, since_unix }` — drives the
|
||||||
|
state badge (`idle`/`thinking`/`compacting`).
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue