diff --git a/CLAUDE.md b/CLAUDE.md index be0efec..252ab92 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,6 +1,9 @@ -# hyperhive — developer reference +# hyperhive — claude entry point -Operator + dev notes: conventions, gotchas, per-subsystem design. +Hey claude. This is your starting page. The detailed docs live in +[`docs/`](docs/) and are written for humans + you both — read them +when you need depth on a subsystem. This file is the index + +scratchpad. - High-level project intro: **[README.md](README.md)**. - Open work + backlog: **[TODO.md](TODO.md)**. @@ -11,7 +14,7 @@ Operator + dev notes: conventions, gotchas, per-subsystem design. hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched) src/main.rs clap setup; serve / spawn / kill / rebuild / list / pending / approve / deny / destroy [--purge] / - request-spawn; periodic broker vacuum task + request-spawn; periodic vacuum tasks src/server.rs host admin socket (HostRequest → dispatch) src/client.rs admin-socket client src/manager_server.rs manager-privileged socket (ManagerRequest) @@ -20,6 +23,8 @@ hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched) hourly vacuum of delivered>30d src/approvals.rs sqlite Approval queue + kinds src/operator_questions.rs sqlite question queue backing `ask_operator` + src/events_vacuum.rs host-side hourly sweep of every agent's + /state/hyperhive-events.sqlite src/coordinator.rs shared state (broker/approvals/questions/transient/ sockets) + tombstone enumeration src/actions.rs approve/deny/destroy (transient-aware) @@ -34,7 +39,7 @@ hive-ag3nt/ in-container harness crate; produces TWO binaries src/web_ui.rs per-container axum HTTP page (incl /api/cancel, /api/compact, /events/history) src/events.rs LiveEvent + broadcast Bus + sqlite-backed history - (/state/hyperhive-events.sqlite) + hourly vacuum + (/state/hyperhive-events.sqlite) src/turn.rs claude --print + stream-json pump; --compact retry src/mcp.rs embedded MCP server (rmcp): AgentServer + ManagerServer src/login.rs probe /root/.claude/ for a valid session @@ -55,404 +60,59 @@ nix/ templates/harness-base.nix shared scaffolding for sub-agents + manager templates/agent-base.nix sub-agent nixosConfiguration templates/manager.nix manager nixosConfiguration + +docs/ + conventions.md naming, identity=socket, async forms, commit style + gotchas.md NixOS / nspawn lessons learned the hard way + web-ui.md dashboard + per-agent page layouts and endpoints + turn-loop.md claude invocation, wake prompt, MCP tool surface + approvals.md approval flow, manager policy, helper events + persistence.md sqlite dbs, retention, state dir layout ``` -## Conventions +## Reading paths -- **Naming.** Containers are length-bounded (`nixos-container` ≤ 11 chars). - Sub-agents are `h-` with `` ≤ 9 chars; the manager is `hm1nd`. - `MAX_AGENT_NAME` enforces the cap in `lifecycle.rs`. Per-agent web UI port = - `WEB_PORT_BASE + FNV1a(name) % WEB_PORT_RANGE` (8100..8999); manager fixed - at 8000; dashboard `cfg.dashboardPort` (default 7000). -- **Identity = socket.** No auth/tokens on the per-agent sockets. The socket - *path* identifies the principal; perms come from "who has the bind-mount." -- **Wire protocol.** JSON line-delimited over unix sockets in both directions - (host admin / manager / agent). `/messages/stream` is `text/event-stream`. -- **Commit messages.** Short, lowercase, no Co-Authored-By trailer. -- **Commit before test.** Stage and commit when work *looks* ready, then run - validation (`cargo check`, `nix flake check`, real lpt2 deploy). Failures get - a follow-up commit rather than an amend. -- **`rebuild` is the reconcile verb.** Idempotently rewrites - `/etc/nixos-containers/.conf` (`PRIVATE_NETWORK=0`, clears - HOST_ADDRESS/LOCAL_ADDRESS, sets `EXTRA_NSPAWN_FLAGS`), regenerates - `applied//flake.nix`, writes the systemd limits drop-in, then - `nixos-container update` + stop + start. Anything that changes per-container - state on the host should be re-applied here. -- **Actions are factored.** `approve` / `deny` / `destroy` live in - `actions.rs`; the admin socket and the dashboard POST handlers both call - into them so the two surfaces never drift. -- **Async forms.** Dashboard + per-agent mutating forms carry `data-async`; - a delegated `submit` listener in `assets/app.js` intercepts, shows a - spinner, POSTs with `application/x-www-form-urlencoded` (axum `Form` - extractor rejects multipart), calls `refreshState()` on success. New - mutating forms should add `data-async` and optionally `data-confirm`. +Pick the doc that matches your task. None depend on the others — +read them à la carte. -## Gotchas / lessons learned +- **"What does the dashboard look like?"** → + [`docs/web-ui.md`](docs/web-ui.md). +- **"How does claude get its prompt and what tools does it have?"** → + [`docs/turn-loop.md`](docs/turn-loop.md). +- **"How do config changes flow from manager to operator to + container?"** → [`docs/approvals.md`](docs/approvals.md). +- **"What state survives destroy / purge / restart?"** → + [`docs/persistence.md`](docs/persistence.md). +- **"Naming, commit style, wire protocol, the `data-async` + pattern."** → [`docs/conventions.md`](docs/conventions.md). +- **"Why does the nspawn flag look like that?"** → + [`docs/gotchas.md`](docs/gotchas.md). -- **`nixos-container` doesn't expose `--bind` on the CLI.** Path is via - `EXTRA_NSPAWN_FLAGS` in `/etc/nixos-containers/.conf` — the start - script (`/nix/store/.../container_-start`) expands it unquoted into the - `systemd-nspawn` invocation. We rewrite this line in `set_nspawn_flags()`. -- **`/run/systemd/nspawn/*.nspawn` overrides are *ignored*** by - `nixos-container`'s start script (it builds the nspawn cmd line directly). -- **`boot.isNspawnContainer = true`**, not `boot.isContainer = true`. Renamed - in nixos-25.11+. -- **`nixos-container create` auto-assigns `HOST_ADDRESS`/`LOCAL_ADDRESS`** in - the `.conf`. The start script's `if HOST_ADDRESS set → --network-veth` - branch then forces a private netns — which is silently fatal for our web - UIs (the bind is invisible from the host). We force-clear those vars (and - `HOST_ADDRESS6` / `LOCAL_ADDRESS6` / `HOST_BRIDGE`) plus set - `PRIVATE_NETWORK=0`. -- **systemd service PATH ≠ host PATH.** The hive-c0re service sets - `path = [ pkgs.git "/run/current-system/sw" ]`. In-container harness - services do the same so anything an agent adds to its own `agent.nix` - (`environment.systemPackages`) is visible to claude's Bash tool without - editing the service definition. `environment.HYPERHIVE_GIT` bakes git's - absolute path in (read by `lifecycle::git_command()`) for the host. -- **`RuntimeDirectoryPreserve = "yes"`** keeps `/run/hyperhive/` (and the - per-agent sub-dirs) across `hive-c0re` restarts. Without it, every restart - wipes bind sources and existing containers can't be started. -- **`register_agent` is idempotent** — drops any prior socket task before - rebinding. Required so a `hive-c0re` restart followed by `rebuild alice` - recreates the agent's socket without needing a clean reinstall. -- **`claude-code` is unfree.** `harness-base.nix` allow-list's it - specifically. The flake pins it to **nixpkgs-unstable** via - `overlays.claude-unstable` (stable lags too far). The overlay imports - unstable with its own `allowUnfreePredicate` so the access inside the - overlay doesn't itself trip. -- **Claude credentials are per-agent.** `/var/lib/hyperhive/agents//claude/` - bind-mounts to `/root/.claude` (RW). Sharing one dir across agents is NOT - viable — OAuth refresh tokens rotate, so any sibling refresh invalidates - all the others. Login flow runs from the per-agent web UI; creds persist - across `destroy`/recreate. -- **Persistent notes dir per agent.** `/var/lib/hyperhive/agents//state/` - bind-mounts to `/state` (RW). System prompts tell agents to keep durable - knowledge here (`/state/notes.md`, anything else under `/state/`). - Survives destroy/recreate alongside the claude dir. -- **Orphan approvals.** If state dirs are wiped out from under a pending - approval (test scripts, manual `rm -rf`), the dashboard's next render - marks them `failed` with note `"agent state dir missing"` so they fall out - of `pending`. They stay in sqlite for audit. +## Quick reminders -## Web UI shape +- **Commit before test.** Stage and commit when work *looks* + ready, then run validation. Failures get a follow-up commit + rather than an amend. +- **Commit messages: short, lowercase, no `Co-Authored-By` + trailer.** Imperative mood. +- **`rebuild` is the reconcile verb.** Anything that changes + per-container state on the host should be re-applied there so + the dashboard's `↻ R3BU1LD` is sufficient to recover. +- **Identity = socket.** No auth tokens — the socket path + identifies the principal. +- **Actions are factored** between admin socket and dashboard via + `actions.rs` and `dashboard.rs::lifecycle_action`, so the two + surfaces never drift. -Both the dashboard (port 7000) and the per-agent web UIs (8000 / -8100-8999) are SPAs with the same skeleton: +## Scratchpad -- `GET /` → static `assets/index.html` (placeholders for state-driven - sections). -- `GET /static/*.css` + `GET /static/*.js` → static assets shipped via - `include_str!` so there's no runtime file dependency. -- `GET /api/state` → JSON snapshot the JS app renders into the DOM. -- `GET /events/stream` (per-agent) and `GET /messages/stream` (dashboard) - are `text/event-stream` SSE for live updates. +In-flight or recent context that hasn't earned a section yet. +Prune freely. -Per-agent endpoints: `POST /send`, `POST /login/{start,code,cancel}`, -`POST /api/cancel`, `POST /api/compact`, `GET /events/history`. - -Dashboard endpoints: `POST /{approve,deny}/{id}`, `POST -/{rebuild,kill,restart,start,destroy}/{name}`, `POST -/purge-tombstone/{name}`, `POST /answer-question/{id}`, `POST -/request-spawn`, `POST /update-all`. - -The JS app handles all `form[data-async]` submissions via a delegated -listener: read `data-confirm`, swap the button to a spinner, POST -`application/x-www-form-urlencoded` (axum's `Form` extractor rejects -multipart), then on success re-enable the button (refreshState often -keeps the form mounted) and call `refreshState()` (re-fetch -`/api/state` and re-render). No full-page reloads. - -Per-agent + dashboard state shapes live in `dashboard.rs::StateSnapshot` -and `web_ui.rs::StateSnapshot`. When adding new state fields, plumb -through the snapshot struct and the relevant `assets/app.js` render -function — never reach for server-side HTML rendering again. - -## Agent MCP surface + turn loop - -The harness ships an embedded MCP server (rmcp 1.7) that claude launches as -a stdio child via `--mcp-config`. Subcommand: `hive-ag3nt mcp` (or -`hive-m1nd mcp` for the manager surface). - -Sub-agent tools: -- `mcp__hyperhive__send(to, body)` — message a peer or the operator. -- `mcp__hyperhive__recv()` — drain one inbox message. - -Manager additionally: -- `mcp__hyperhive__request_spawn(name)` — queue Spawn approval. -- `mcp__hyperhive__kill(name)` — graceful stop. No approval. -- `mcp__hyperhive__start(name)` — start a stopped sub-agent. No approval. -- `mcp__hyperhive__restart(name)` — stop + start. No approval. -- `mcp__hyperhive__request_apply_commit(agent, commit_ref)` — submit a - config change for any agent (including `hm1nd` for self-mods). -- `mcp__hyperhive__ask_operator(question, options?)` — non-blocking; - queues a question on the dashboard, returns the question id. Operator's - answer arrives later as a `HelperEvent::OperatorAnswered` in the - manager inbox. - -The shared per-turn plumbing lives in `hive_ag3nt::turn::{write_mcp_config, -write_settings, write_system_prompt, run_turn, drive_turn, emit_turn_end, -wait_for_login}` so the two binaries can't drift. - -On harness boot, three files get dropped next to the mcp socket at -`/run/hive/`: - -- `claude-mcp-config.json` — re-invokes the running binary as `mcp` child. -- `claude-settings.json` — `--settings` blob (auto-compact/auto-memory - off, effortLevel medium). -- `claude-system-prompt.md` — rendered from `prompts/{agent,manager}.md` - with `{label}` substituted. Passed via `--system-prompt-file`. - -Each turn: - -``` -claude --print --verbose --output-format stream-json --model haiku \ - --continue --settings /run/hive/claude-settings.json \ - --system-prompt-file /run/hive/claude-system-prompt.md \ - --mcp-config /run/hive/claude-mcp-config.json --strict-mcp-config \ - --tools --allowedTools -# wake prompt piped over stdin — minimal, just from/body + optional unread hint -``` - -`--continue` keeps a persistent session per agent (claude stores sessions in -`~/.claude/projects/`, which is bind-mounted persistently). Auto-compact and -auto-memory are disabled because hyperhive owns compaction (`/compact` on -overflow, retry once). - -**Loop control.** The harness pops one inbox message per cycle (the wake -signal — Recv long-polls server-side for up to 30s waking instantly on a new -broker `Sent` event for this agent), peeks the remaining inbox depth with -`Status`, and emits `TurnStart { from, body, unread }`. The wake prompt -piped to claude includes a one-line `({unread} more pending — drain via …)` -hint when `unread > 0`. Claude drives any further `recv`/`send` itself. - -**Tool envelope** (`mcp::run_tool_envelope`): every MCP tool handler logs -the request, runs the body, logs the result. Pre-/post-log only; the old -`[status] N unread message(s)` appendage was removed once unread moved -into the wake prompt + UI header. New tools call this helper. - -**Tool whitelist** (`mcp::ALLOWED_BUILTIN_TOOLS`): -- Allowed built-ins: `Bash`, `Edit`, `Glob`, `Grep`, `Read`, `TodoWrite`, - `Write`. -- Denied by omission: `WebFetch`, `WebSearch`, `Task`, `NotebookEdit`. -- Allowed MCP tools: as listed above per flavor. - -`Bash` is on the allow-list pending a finer-grained pattern allow-list -(`Bash(git *)`-style) — see [TODO.md](TODO.md). - -**Live view.** Each agent runs an `events::Bus` (broadcast channel + -sqlite-backed history at `/state/hyperhive-events.sqlite`). The harness -emits `TurnStart { from, body, unread }`, `Stream(value)` (one per -parsed stream-json line), `Note`, `TurnEnd { ok, note }`. The web UI: - -- fetches `GET /events/history` on page load and replays the last - 2000 events (oldest first, `.no-anim` so they don't stagger), -- then subscribes to `GET /events/stream` (SSE) for live tail, -- shows a granular state badge above the terminal (`💤 idle / 🧠 - thinking / ○ offline · `) driven from `turn_start`/`turn_end`, - with a flash animation on transition, -- sticky-bottom auto-scroll: scrolling up parks the view; new rows - surface a "↓ N new" pill instead of yanking. Scrolling back to - bottom clears the counter, -- terminal-themed: phosphor mauve glow, Crust bg, backdrop-filter - blur, row fade-in slide-up, banner gradient shimmer while - state=thinking. - -Per-tool rendering: - -- `TurnStart` → `◆ TURN ← · N unread` header + indented body. -- `Stream` `tool_use` → `→ Read /path` / `→ Bash $ cmd` / - `→ send → operator: "..."` etc., per-tool pretty rather than raw JSON. -- `Stream` `tool_result` short → flat `← ...`; long → collapsed - `
` `▸ ← Nl · headline` (click to expand full body). -- `Stream` `thinking` → shows the thinking text if claude provided one, - otherwise the bare `· thinking …` indicator. -- `Stream` `system init`, `result`, `rate_limit_event` are dropped — too - noisy and `TurnEnd` already says the turn finished. -- `Note` → `· text`. -- `TurnEnd` → `✓ turn ok` / `✗ turn fail — note` and triggers a - `refreshState()` so the page form view reflects state transitions - (e.g. login just landed). - -The operator input lives *inside* the terminal-wrap as a prompt-style -textarea below the live tail: multi-line (Enter sends, Shift+Enter -newlines), tab-completes slash commands. Available slash commands: - -- `/help` — list commands locally -- `/clear` — wipe the visible terminal (server history kept) -- `/cancel` — POST `/api/cancel` (host shellouts `pkill -INT - claude`, emits a Note); also surfaces as a `■ cancel turn` button - in the state row while state=thinking -- `/compact` — POST `/api/compact` (host spawns - `turn::compact_session` in the background; output streams into the - live panel) - -Unknown `/foo` shows an error row instead of being silently sent. - -## Manager (hm1nd) is hive-c0re-managed - -The manager container runs through the **same lifecycle as sub-agents**. -On `hive-c0re serve` startup, if `hm1nd` is missing, hive-c0re creates it. -The manager's flake lives at `/var/lib/hyperhive/applied/hm1nd/`; its -proposed config at `/var/lib/hyperhive/agents/hm1nd/config/`. Manager can -edit its own `agent.nix` (visible inside the container at -`/agents/hm1nd/config/`) and submit `request-apply-commit hm1nd ` for -operator approval. - -Differences from sub-agents: -- `flake.nix` extends `hyperhive.nixosConfigurations.manager` - (vs `agent-base`). -- Container name is `hm1nd` (no `h-` prefix). -- Fixed web UI port (`MANAGER_PORT = 8000`). -- `set_nspawn_flags` adds an extra bind: `/var/lib/hyperhive/agents` → - `/agents` (RW), so the manager can edit per-agent proposed repos. -- First-deploy spawn bypasses the approval queue (manager is required - infrastructure). -- Per-agent socket lives at `/run/hyperhive/manager/`, owned by - `manager_server::start`. - -**Migration note (for older hosts):** drop any `containers.hm1nd = { ... }` -block from your host NixOS config. hyperhive creates and updates the -manager itself now. - -**Manager policy** (from `prompts/manager.md`): the manager does NOT -rubber-stamp sub-agent config requests. It verifies (role match, package -legitimacy, cheaper alternative, blast radius) before committing + -calling `request_apply_commit`. For ambiguous cases or anything that -needs human signal, the manager calls `ask_operator(question, options?)` -which queues the question on the dashboard and returns the id -immediately; the operator's answer arrives later as -`HelperEvent::OperatorAnswered` in the manager inbox. Store at -`hive-c0re::operator_questions` (sqlite); answer flow: -`POST /answer-question/{id}` → `OperatorQuestions::answer` → -`notify_manager(OperatorAnswered { ... })`. - -## Helper events to the manager - -`Coordinator::notify_manager(&HelperEvent)` enqueues an inbox message -from sender `system` with the event JSON in the body. The manager -harness no longer short-circuits these — they drive a regular claude -turn so the manager can react. Variants -(`hive_sh4re::HelperEvent`): - -- `ApprovalResolved { id, agent, commit_ref, status, note }` — fired by - `actions::approve` + `actions::deny` whenever an approval transitions - to its terminal state. -- `Spawned { agent, ok, note }` — `actions::approve` (Spawn-kind) + - admin `HostRequest::Spawn`. -- `Rebuilt { agent, ok, note }` — `auto_update::rebuild_agent` (covers - startup scan + manual `/rebuild` from dashboard) + `actions::approve` - (ApplyCommit). -- `Killed { agent }` — admin `HostRequest::Kill` + dashboard `/kill`. -- `Destroyed { agent }` — `actions::destroy`. -- `OperatorAnswered { id, question, answer }` — `dashboard::post_answer_question` - fires this after the operator submits the answer form for a question - the manager queued via `ask_operator`. - -To add a new event: new `HelperEvent` variant + call sites + update -`prompts/manager.md` so the manager knows the new shape. - -## Auto-update on startup - -`hive-c0re serve` runs `auto_update::run` in a background task right after -opening the coordinator. It enumerates managed containers and rebuilds any -whose recorded hyperhive rev differs from the current one — sub-agents and -manager go through the same `lifecycle::rebuild` path. "Rev" = canonical -filesystem path of `cfg.hyperhiveFlake`. Marker file: -`/var/lib/hyperhive/applied/..hyperhive-rev`. - -If the flake input has no canonical path (e.g. a `github:` URL), -auto-update is a no-op — rebuild manually. - -The dashboard surfaces pending updates per agent: a clickable "needs update -↻" badge appears whenever the marker differs from current rev. The badge -POSTs `/rebuild/`, calling the same `auto_update::rebuild_agent` -path so manual triggers and the startup scan can't drift. When at least -one container is stale, a top-level `↻ UPD4TE 4LL` button appears that -loops over every stale container. - -## Dashboard action surface - -Page sections (top to bottom): - -1. **C0NTAINERS** — live containers with their action surface (below). -2. **K3PT ST4T3** — destroyed-but-state-kept tombstones (size + - age + claude-creds badge). Two actions: `⊕ R3V1V3` (queues a - Spawn approval; existing state is reused), `PURG3` (wipes - state + applied dirs; `POST /purge-tombstone/{name}`). -3. **M1ND H4S QU3STI0NS** — pending `ask_operator` questions - (amber pulsing border). Always renders a free-text fallback - alongside any option list; `multi=true` renders options as - checkboxes; submit merges selections + free text comma-joined. -4. **0PER4T0R 1NB0X** — recent messages addressed to `operator` - (last 50, from the broker). -5. **P3NDING APPR0VALS** — the queue. The R3QU3ST SP4WN form - lives at the top of this section since submitting it immediately - queues an approval that lands directly below. -6. **MESS4GE FL0W** — live broker SSE tail. - -Container row (two-line layout, `assets/app.js::renderContainers`): - -- Line 1: agent name (link → new tab), m1nd/ag3nt chip, `needs - login` / `needs update` warning badges, in-flight `◐ pending-state…` - pill (replaces buttons during start/stop/restart/rebuild/destroy), - container name + port. -- Line 2: action buttons — `↻ R3BU1LD` always, `DESTR0Y` + `PURG3` - on sub-agents, `↺ R3ST4RT` + (sub-agents) `■ ST0P` when running, - `▶ ST4RT` when stopped. Buttons dim + disable while a transient - lifecycle action is in flight. - -`↻ UPD4TE 4LL` button appears above the containers list when any -agent is stale. - -Banner pulses on each broker SSE event (`pulseBanner` with a 4s -grace timer). - -## Approval flow - -End-to-end: manager edits per-agent `proposed` repo → commits → submits commit -sha → user approves on host CLI or dashboard button → `hive-c0re` reads the -file at that sha from `proposed`, applies into `applied`, commits there, runs -`nixos-container update`. Helper-event JSON (`ApprovalResolved`) lands in the -manager's inbox. - -Two separate git repos per agent: - -``` -/var/lib/hyperhive/agents//config/ # proposed — manager edits, hive-c0re reads only -└── agent.nix # the only file the manager can change - # (initial commit by hive-c0re on first spawn, - # never touched by hive-c0re again) - -/var/lib/hyperhive/applied// # applied — hive-c0re-only; container builds here -├── flake.nix # auto-generated; references hyperhive_flake -└── agent.nix # overwritten by approve from the proposed commit -``` - -The container's `--flake` ref is `#default`. The flake extends -`hyperhive.nixosConfigurations.{agent-base|manager}` with `./agent.nix` plus -an inline module setting `programs.git.config.user` (committer identity = -the agent's name) and `systemd.services..environment` (HIVE_PORT, -HIVE_LABEL, HIVE_DASHBOARD_PORT). - -## Persistence + retention - -Two sqlite files; both autovacuum on a 1h tokio task: - -- **`/var/lib/hyperhive/broker.sqlite`** (host) — `messages` + - `approvals` + `operator_questions` tables. `Broker::vacuum_delivered` - drops delivered messages older than 30 days; undelivered rows are - always kept. Approvals + questions are kept indefinitely (auditable). -- **`/state/hyperhive-events.sqlite`** (per-container, bind-mounted - from `/var/lib/hyperhive/agents//state/`) — every `LiveEvent` - emitted on the per-agent `Bus`. Hourly vacuum drops rows older than - 7 days, then trims to the most recent 2000. Path overridable via - `HYPERHIVE_EVENTS_DB` (for dev / no-`/state` setups; on open failure - the Bus falls back to no-store mode rather than crashing the - harness). Survives destroy/recreate; gone on `--purge`. - -State dirs (per agent, under `/var/lib/hyperhive/agents//`): -`config/` (proposed nix repo), `claude/` (creds, bind-mounted RW to -`/root/.claude`), `state/` (durable notes + events db, bind-mounted to -`/state`). Wiped only on explicit `--purge`. Tombstones (state dir -without a live container) surface in the dashboard's K3PT ST4T3 -section so the operator can either revive or purge. +- Loop session 2026-05-15: shipped state badge, /cancel + /compact, + tombstones, multi-select ask_operator, broker + events vacuum. +- After loop session 2026-05-15: docs split into `docs/` (this + page slimmed to index + scratchpad). Cleanups landed: vacuum + host-side, `lifecycle_action` helper, `api_state` split. +- Next likely focus: telemetry/charts (still queued from earlier + triage) + server-side state badge. diff --git a/docs/approvals.md b/docs/approvals.md new file mode 100644 index 0000000..af80169 --- /dev/null +++ b/docs/approvals.md @@ -0,0 +1,144 @@ +# Approvals + manager + helper events + +The approval queue is hyperhive's pivot: nothing that changes the +shape of an agent (its config, whether it exists) happens without an +operator click. The manager (`hm1nd`) is the policy gate in front of +that queue; helper events are how it stays informed about what +happens after a decision lands. + +## End-to-end approval flow + +1. Manager edits `/agents//config/agent.nix` (bind-mounted + from the host's per-agent `proposed` repo) and commits. +2. Manager submits the commit sha via `request_apply_commit(agent, + commit_ref)`. +3. Operator sees the diff on the dashboard, clicks ◆ APPR0VE (or + `hive-c0re approve ` on the CLI). +4. hive-c0re reads the file at that sha from `proposed`, applies + into `applied`, commits there, runs `nixos-container update`. +5. `HelperEvent::ApprovalResolved` lands in the manager's inbox. + +`Spawn` approvals follow the same shape but skip the commit-diff +step — the operator just sees the name. On approve, hive-c0re +creates the container in a background task while the dashboard +shows a spinner. + +## Two repos per agent + +``` +/var/lib/hyperhive/agents//config/ proposed +└── agent.nix # the only file the + # manager can change + # (initial commit by + # hive-c0re on first + # spawn, never touched + # again). + +/var/lib/hyperhive/applied// applied — hive-c0re-only +├── flake.nix # auto-generated +└── agent.nix # overwritten by approve + # from the proposed commit +``` + +The container's `--flake` ref is `#default`. The flake +extends `hyperhive.nixosConfigurations.{agent-base|manager}` with +`./agent.nix` plus an inline module setting +`programs.git.config.user` (committer identity = the agent's name) +and `systemd.services..environment` (`HIVE_PORT`, +`HIVE_LABEL`, `HIVE_DASHBOARD_PORT`). + +## Manager (`hm1nd`) is hive-c0re-managed + +The manager container runs through the **same lifecycle as +sub-agents**. On `hive-c0re serve` startup, if `hm1nd` is missing, +hive-c0re creates it. The manager's flake lives at +`/var/lib/hyperhive/applied/hm1nd/`; its proposed config at +`/var/lib/hyperhive/agents/hm1nd/config/`. Manager can edit its own +`agent.nix` (visible inside the container at `/agents/hm1nd/config/`) +and submit `request_apply_commit("hm1nd", )` for operator +approval. + +Differences from sub-agents: + +- `flake.nix` extends `hyperhive.nixosConfigurations.manager` + (vs `agent-base`). +- Container name is `hm1nd` (no `h-` prefix). +- Fixed web UI port (`MANAGER_PORT = 8000`). +- `set_nspawn_flags` adds an extra bind: + `/var/lib/hyperhive/agents` → `/agents` (RW), so the manager can + edit per-agent proposed repos. +- First-deploy spawn bypasses the approval queue (manager is + required infrastructure). +- Per-agent socket lives at `/run/hyperhive/manager/`, owned by + `manager_server::start`. + +**Migration note** (for older hosts): drop any `containers.hm1nd = +{ ... }` block from your host NixOS config. hyperhive creates and +updates the manager itself. + +## Manager policy + +From `hive-ag3nt/prompts/manager.md`: the manager does NOT +rubber-stamp sub-agent config requests. It verifies (role match, +package legitimacy, cheaper alternative, blast radius) before +committing and calling `request_apply_commit`. + +For ambiguous cases or anything that needs human signal, the +manager calls `ask_operator(question, options?, multi?)` — queues +the question on the dashboard and returns the id immediately. The +operator's answer arrives later as +`HelperEvent::OperatorAnswered` in the manager inbox. Storage is +`hive-c0re::operator_questions` (sqlite); the answer flow is: + +``` +POST /answer-question/{id} + → OperatorQuestions::answer + → notify_manager(OperatorAnswered { id, question, answer }) +``` + +## Helper events to the manager + +`Coordinator::notify_manager(&HelperEvent)` enqueues an inbox +message from sender `system` with the event JSON in the body. The +manager harness no longer short-circuits these — they drive a +regular claude turn so the manager can react. Variants +(`hive_sh4re::HelperEvent`): + +- `ApprovalResolved { id, agent, commit_ref, status, note }` — + fired by `actions::approve` + `actions::deny` whenever an + approval transitions to its terminal state. +- `Spawned { agent, ok, note }` — `actions::approve` (Spawn-kind) + + admin `HostRequest::Spawn`. +- `Rebuilt { agent, ok, note }` — `auto_update::rebuild_agent` + (covers startup scan + manual `/rebuild` from dashboard) + + `actions::approve` (ApplyCommit). +- `Killed { agent }` — admin `HostRequest::Kill` + dashboard + `/kill` + manager `Kill` MCP tool. +- `Destroyed { agent }` — `actions::destroy`. +- `OperatorAnswered { id, question, answer }` — dashboard + `/answer-question/{id}` after the operator submits the answer + form. + +To add a new event: new `HelperEvent` variant + call sites + update +`prompts/manager.md` so the manager knows the new shape. + +## Auto-update on startup + +`hive-c0re serve` runs `auto_update::run` in a background task right +after opening the coordinator. It enumerates managed containers and +rebuilds any whose recorded hyperhive rev differs from the current +one — sub-agents and manager go through the same `lifecycle::rebuild` +path. + +"Rev" = canonical filesystem path of `cfg.hyperhiveFlake`. Marker +file: `/var/lib/hyperhive/applied/..hyperhive-rev`. If the +flake input has no canonical path (e.g. a `github:` URL), +auto-update is a no-op — rebuild manually. + +The dashboard surfaces pending updates per agent: a clickable +"needs update ↻" badge appears whenever the marker differs from +current rev. The badge POSTs `/rebuild/`, calling the same +`auto_update::rebuild_agent` path so manual triggers and the +startup scan can't drift. When at least one container is stale, a +top-level `↻ UPD4TE 4LL` button appears that loops over every +stale container. diff --git a/docs/conventions.md b/docs/conventions.md new file mode 100644 index 0000000..2f20c15 --- /dev/null +++ b/docs/conventions.md @@ -0,0 +1,69 @@ +# Conventions + +Code-style and process expectations across the workspace. Most of these +exist because something already went wrong without them. + +## Naming + +- Containers are length-bounded by `nixos-container` (≤ 11 chars). +- Sub-agents are `h-` with `` ≤ 9 chars. +- The manager is `hm1nd` (no `h-` prefix, fixed name). +- `MAX_AGENT_NAME` in `lifecycle.rs` enforces the cap. +- Per-agent web UI port = `WEB_PORT_BASE + FNV1a(name) % WEB_PORT_RANGE` + (8100..8999); manager fixed at 8000; dashboard `cfg.dashboardPort` + (default 7000). + +## Identity = socket + +There are no auth tokens on the per-agent unix sockets. The socket +*path* identifies the principal; perms come from "who has the +bind-mount." A sub-agent only sees its own `/run/hive/mcp.sock`; the +manager has access to its privileged socket; hive-c0re owns the host +admin socket. + +## Wire protocol + +JSON line-delimited over unix sockets in both directions (host admin +/ manager / agent). SSE streams (`/messages/stream`, +`/events/stream`) are `text/event-stream`. Request/response types +live in `hive-sh4re` — change them in one place. + +## Async forms + +Dashboard + per-agent mutating forms carry `data-async`; a delegated +`submit` listener in `assets/app.js` intercepts, shows a spinner, +POSTs `application/x-www-form-urlencoded` (axum's `Form` extractor +rejects multipart), calls `refreshState()` on success. New mutating +forms should add `data-async` and optionally `data-confirm` for a +JS-side confirmation prompt. + +## `rebuild` is the reconcile verb + +`lifecycle::rebuild` idempotently rewrites +`/etc/nixos-containers/.conf` (`PRIVATE_NETWORK=0`, clears +`HOST_ADDRESS` / `LOCAL_ADDRESS`, sets `EXTRA_NSPAWN_FLAGS`), +regenerates `applied//flake.nix`, writes the systemd limits +drop-in, then `nixos-container update` + stop + start. + +Anything that changes per-container state on the host should be +re-applied here so a manual `↻ R3BU1LD` from the dashboard is +sufficient to recover. + +## Actions are factored + +`approve` / `deny` / `destroy` (and the lifecycle helper) live in +`actions.rs` / `dashboard.rs`. The admin socket and the dashboard +POST handlers both call into them so the two surfaces never drift. + +## Commit messages + +Short, lowercase, no `Co-Authored-By` trailer. Imperative mood, no +period. Body explains *why* if non-obvious; otherwise the subject +alone is fine. Wrap at ~72 cols. + +## Commit before test + +Stage and commit when work *looks* ready, then run validation +(`cargo check`, `nix flake check`, real deploy). Failures get a +follow-up commit rather than an amend. The commit history is the +work log; rewriting it loses signal. diff --git a/docs/gotchas.md b/docs/gotchas.md new file mode 100644 index 0000000..e34863d --- /dev/null +++ b/docs/gotchas.md @@ -0,0 +1,83 @@ +# Gotchas + +NixOS + nspawn quirks and lessons we hit the hard way. If something +here looks unmotivated in the code, there's usually a story underneath. + +## `nixos-container` doesn't expose `--bind` on the CLI + +The CLI doesn't accept `--bind`. Path is via `EXTRA_NSPAWN_FLAGS` in +`/etc/nixos-containers/.conf` — the start script +(`/nix/store/.../container_-start`) expands it unquoted into the +`systemd-nspawn` invocation. `lifecycle::set_nspawn_flags()` rewrites +this line. + +## `/run/systemd/nspawn/*.nspawn` overrides are ignored + +`nixos-container`'s start script builds the nspawn command line +directly. Dropping a `.nspawn` file under `/run/systemd/nspawn/` +looks like the obvious extension point and does nothing. Use +`EXTRA_NSPAWN_FLAGS` (above). + +## `boot.isNspawnContainer = true` + +Not `boot.isContainer = true`. Renamed in nixos-25.11+. + +## `nixos-container create` auto-assigns `HOST_ADDRESS` / `LOCAL_ADDRESS` + +…in the `.conf`. The start script's `if HOST_ADDRESS set → +--network-veth` branch then forces a private netns — silently fatal +for our web UIs (the bind is invisible from the host). We +force-clear `HOST_ADDRESS` / `LOCAL_ADDRESS` / `HOST_ADDRESS6` / +`LOCAL_ADDRESS6` / `HOST_BRIDGE` and set `PRIVATE_NETWORK=0`. + +## systemd service PATH ≠ host PATH + +The hive-c0re service sets `path = [ pkgs.git "/run/current-system/sw" ]`. +In-container harness services do the same so anything an agent adds +to its own `agent.nix` (`environment.systemPackages`) is visible to +claude's Bash tool without editing the service definition. +`environment.HYPERHIVE_GIT` bakes git's absolute path in (read by +`lifecycle::git_command()`) for the host. + +## `RuntimeDirectoryPreserve = "yes"` + +…keeps `/run/hyperhive/` (and the per-agent sub-dirs) across +hive-c0re restarts. Without it, every restart wipes bind sources and +existing containers can't be started. + +## `register_agent` is idempotent + +Drops any prior socket task before rebinding. Required so a +hive-c0re restart followed by `rebuild alice` recreates the agent's +socket without needing a clean reinstall. + +## `claude-code` is unfree + +`harness-base.nix` allow-list's it specifically. The flake pins it to +**nixpkgs-unstable** via `overlays.claude-unstable` (stable lags too +far). The overlay imports unstable with its own +`allowUnfreePredicate` so the access inside the overlay doesn't +itself trip. + +## Claude credentials are per-agent + +`/var/lib/hyperhive/agents//claude/` bind-mounts to +`/root/.claude` (RW). Sharing one dir across agents is NOT viable — +OAuth refresh tokens rotate, so any sibling refresh invalidates all +the others. Login flow runs from the per-agent web UI; creds persist +across `destroy`/recreate (`--purge` wipes them). + +## Persistent notes dir per agent + +`/var/lib/hyperhive/agents//state/` bind-mounts to `/state` +(RW). System prompts tell agents to keep durable knowledge here +(`/state/notes.md`, anything else under `/state/`). The harness also +writes its events log here (`/state/hyperhive-events.sqlite`). +Survives `destroy`/recreate alongside the claude dir. + +## Orphan approvals + +If state dirs are wiped out from under a pending approval (test +scripts, manual `rm -rf`), the dashboard's next render marks them +`failed` with note `"agent state dir missing"` so they fall out of +`pending`. They stay in sqlite for audit. diff --git a/docs/persistence.md b/docs/persistence.md new file mode 100644 index 0000000..bb5a38d --- /dev/null +++ b/docs/persistence.md @@ -0,0 +1,95 @@ +# Persistence + retention + +Where state lives, what survives what, and how it's bounded. + +## Two sqlite databases + +### `/var/lib/hyperhive/broker.sqlite` (host) + +Three tables, all in one file: + +- `messages` — every inter-agent / operator-bound message. + `sender / recipient / body / sent_at / delivered_at`. +- `approvals` — the queue. `agent / kind (apply_commit | spawn) / + commit_ref / requested_at / status / resolved_at / note`. +- `operator_questions` — `ask_operator` queue. + `asker / question / options_json / multi / asked_at / + answered_at / answer`. + +Retention: + +- `Broker::vacuum_delivered` runs hourly via a tokio task in + `hive-c0re::main`. Drops delivered rows older than 30 days. + Undelivered rows are always kept (still in flight). +- Approvals and questions are kept indefinitely — both are + audit trails. `actions::destroy` and answered questions stay + visible to anything that queries by id. + +### `/state/hyperhive-events.sqlite` (per agent) + +Lives inside each container's bind-mounted `/state/` dir (host +path: `/var/lib/hyperhive/agents//state/hyperhive-events.sqlite`). +One table: + +- `events(id, ts, kind, payload_json)` — every `LiveEvent` the + harness emits during turn loop execution. + +The harness writes; the host vacuums. `hive-c0re::events_vacuum` +runs hourly and sweeps every existing agent state dir, applying the +same two-stage delete to each file: drop rows older than 7 days, +then trim to the 2000 most-recent. Centralising retention on the +host means a misbehaving harness can't disable its own vacuum and +agents don't need any cleanup wiring of their own. + +Path overridable via `HYPERHIVE_EVENTS_DB` (for dev / no-`/state` +setups). On open failure the `Bus` falls back to no-store mode +rather than crashing the harness — events still broadcast over SSE, +just nothing persisted. + +## State dirs (per agent) + +Under `/var/lib/hyperhive/agents//`: + +- `config/` — the proposed nix repo (manager-editable). +- `claude/` — claude OAuth credentials, bind-mounted RW to + `/root/.claude` inside the container. +- `state/` — durable notes + the events.sqlite db, bind-mounted + to `/state` inside the container. + +Under `/var/lib/hyperhive/applied//` — the hive-c0re-only +applied repo (`flake.nix` + `agent.nix`) that the container +actually builds from. + +## Destroy vs purge + +- `DESTR0Y` (default) — stops + removes the nspawn container, + drops the systemd drop-in, fails any pending approvals. State + dirs stay put; the agent appears in the dashboard's K3PT ST4T3 + section as a tombstone with `⊕ R3V1V3` and `PURG3` actions. + `R3V1V3` queues a Spawn approval that reuses the kept state on + approve (no re-login). +- `PURG3` (opt-in via the dashboard button or + `hive-c0re destroy --purge `) — DESTR0Y plus wipes + `/var/lib/hyperhive/{agents,applied}//`. Config history, + claude creds, /state/ notes, and the events db are all gone. + No undo. + +The manager is non-destroyable from both paths (declarative +container; would fight with the host's NixOS config). + +## Run-time dirs + +`/run/hyperhive/` is tmpfs-backed (systemd `RuntimeDirectory=`) but +preserved across hive-c0re restarts via `RuntimeDirectoryPreserve=yes`. +Without that, every restart wipes bind sources and existing +containers can't be started. + +- `/run/hyperhive/host.sock` — admin socket (host-side CLI). +- `/run/hyperhive/manager/mcp.sock` — manager-privileged socket. +- `/run/hyperhive/agents//mcp.sock` — per-sub-agent socket + (bind-mounted into the container as `/run/hive/mcp.sock`). + +On startup, `Coordinator::register_agent` drops any prior socket +task before rebinding — idempotent so a hive-c0re restart followed +by `rebuild alice` recreates the agent's socket without a clean +reinstall. diff --git a/docs/turn-loop.md b/docs/turn-loop.md new file mode 100644 index 0000000..33cf78f --- /dev/null +++ b/docs/turn-loop.md @@ -0,0 +1,118 @@ +# Turn loop + MCP + +How the harness wakes up, what it asks claude to do, and what tools +claude has access to in return. + +## The loop + +Each agent harness (`hive-ag3nt serve` or `hive-m1nd serve`) runs: + +1. Long-poll `Recv` on its socket. The host-side broker + (`broker.rs::recv_blocking`) returns immediately if there's a + pending message, otherwise waits up to 30 s for a broker `Sent` + event for this recipient. +2. Pop one message. Peek the remaining inbox depth with `Status`. +3. Emit `LiveEvent::TurnStart { from, body, unread }` onto the SSE + bus. +4. Spawn claude (one process per turn) and pipe the wake prompt + over stdin. +5. Stream stdout (JSON lines) into the bus as + `LiveEvent::Stream(value)`. Pump stderr as `Note`. +6. Wait for claude to exit. On `Prompt is too long`, run `/compact` + on the session once and retry the turn. +7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid + tight loops on transient failures. + +## The claude invocation + +``` +claude --print --verbose --output-format stream-json --model haiku \ + --continue --settings /run/hive/claude-settings.json \ + --system-prompt-file /run/hive/claude-system-prompt.md \ + --mcp-config /run/hive/claude-mcp-config.json --strict-mcp-config \ + --tools --allowedTools +# wake prompt piped over stdin +``` + +`--continue` keeps a persistent session per agent (claude stores +sessions in `~/.claude/projects/`, which is bind-mounted +persistently). Auto-compact and auto-memory are disabled via +`--settings` because hyperhive owns compaction (`/compact` on +overflow, retry once; operator can also force one via `/api/compact`). + +The wake prompt is intentionally minimal: just the popped message's +`from`/`body`, plus an inline `({unread} more pending — drain via +…)` hint when `unread > 0`. Claude drives any further `recv`/`send` +itself via the embedded MCP server. + +### On-boot files + +`hive_ag3nt::turn::write_*` writes three files next to the per-agent +socket at `/run/hive/` once at startup: + +- `claude-mcp-config.json` — re-invokes the running binary as `mcp` + child (so the same binary serves as harness + as claude's MCP + child process). +- `claude-settings.json` — the `--settings` blob (auto-compact and + auto-memory off, effortLevel medium). +- `claude-system-prompt.md` — rendered from + `hive-ag3nt/prompts/{agent,manager}.md` with `{label}` + substituted. Passed via `--system-prompt-file`. + +The shared per-turn plumbing lives in `hive_ag3nt::turn::{write_mcp_config, +write_settings, write_system_prompt, run_turn, drive_turn, +emit_turn_end, wait_for_login, compact_session}` so the two binaries +can't drift. + +## MCP surface + +The harness ships an embedded MCP server (rmcp 1.7). Claude launches +it as a stdio child via `--mcp-config`. The hyperhive socket name is +`hyperhive`, so the tools land in claude as `mcp__hyperhive__`. + +### Sub-agent tools + +- `send(to, body)` — message a peer (logical agent name), another + agent, or the operator (recipient `operator`, surfaces in the + dashboard inbox). +- `recv()` — drain one inbox message. + +### Manager tools (in addition to send/recv) + +- `request_spawn(name)` — queue a Spawn approval for a brand-new + sub-agent (≤9 char name). Operator approves on the dashboard. +- `kill(name)` — graceful stop. No approval required. +- `start(name)` — start a stopped sub-agent. No approval. +- `restart(name)` — stop + start. No approval. +- `request_apply_commit(agent, commit_ref)` — submit a config + change for any agent (`hm1nd` for the manager's own config) for + operator approval. +- `ask_operator(question, options?, multi?)` — surface a question + on the dashboard. Non-blocking — returns the queued question id; + the operator's answer arrives later as + `HelperEvent::OperatorAnswered` in the manager inbox. Options + always render alongside a free-text fallback; `multi=true` + renders options as checkboxes. + +The boundary: lifecycle ops on *existing* sub-agents +(`kill`/`start`/`restart`) are at the manager's discretion — no +operator approval. Creating a new agent (`request_spawn`) and +changing any agent's config (`request_apply_commit`) still go +through the approval queue. + +### Tool envelope + +`mcp::run_tool_envelope`: every MCP tool handler logs the request, +runs the body, logs the result. Pre-/post-log only — the inbox +status hint moved to the wake prompt + UI header. + +### Tool whitelist (`mcp::ALLOWED_BUILTIN_TOOLS`) + +- Allowed built-ins: `Bash`, `Edit`, `Glob`, `Grep`, `Read`, + `TodoWrite`, `Write`. +- Denied by omission: `WebFetch`, `WebSearch`, `Task`, + `NotebookEdit`. +- Allowed MCP tools: as listed above per flavor. + +`Bash` is on the allow-list pending a finer-grained pattern allow-list +(`Bash(git *)`-style) — see [TODO](../TODO.md). diff --git a/docs/web-ui.md b/docs/web-ui.md new file mode 100644 index 0000000..0e6a5cc --- /dev/null +++ b/docs/web-ui.md @@ -0,0 +1,143 @@ +# Web UI + +Two web surfaces share the same skeleton: the dashboard (port 7000) +and the per-agent UIs (manager on :8000, sub-agents on a hashed +:8100-8999). Both are SPAs — `GET /` returns a static shell, +`/api/state` returns JSON, JS renders. No full-page reloads. + +## Shape (shared by both) + +- `GET /` → `assets/index.html` (placeholders for state-driven + sections, shipped via `include_str!` so the binary has no runtime + file dependency). +- `GET /static/*.css` + `GET /static/*.js` → static assets. +- `GET /api/state` → JSON snapshot the JS app renders into the DOM. +- `GET /events/stream` (per-agent) / `GET /messages/stream` + (dashboard) → `text/event-stream` SSE for live updates. + +The JS app handles all `form[data-async]` submissions via a delegated +listener: read `data-confirm`, swap the button to a spinner, POST +`application/x-www-form-urlencoded`, re-enable the button on success +(refreshState may keep the form mounted, so we don't rely on a +re-render), call `refreshState()`. State shapes live in +`dashboard.rs::StateSnapshot` and `web_ui.rs::StateSnapshot` — +when adding state fields, plumb through the snapshot struct and the +relevant `assets/app.js` render function. + +## Dashboard sections (top to bottom) + +1. **C0NTAINERS** — live containers with their action surface. +2. **K3PT ST4T3** — destroyed-but-state-kept tombstones (size + + age + claude-creds badge). Two actions: `⊕ R3V1V3` (queues a + Spawn approval; existing state is reused), `PURG3` (wipes + state + applied dirs; `POST /purge-tombstone/{name}`). +3. **M1ND H4S QU3STI0NS** — pending `ask_operator` questions + (amber pulsing border). Always renders a free-text fallback + alongside any option list; `multi=true` renders options as + checkboxes; submit merges selections + free text comma-joined. +4. **0PER4T0R 1NB0X** — recent messages addressed to `operator` + (last 50, from the broker). +5. **P3NDING APPR0VALS** — the queue. The R3QU3ST SP4WN form + lives at the top of this section since submitting it immediately + queues an approval that lands directly below. +6. **MESS4GE FL0W** — live broker SSE tail. + +### Container row + +Two-line layout (`assets/app.js::renderContainers`): + +- Line 1: agent name (link → new tab), m1nd/ag3nt chip, `needs + login` / `needs update` warning badges, in-flight `◐ pending-state…` + pill (replaces buttons during start/stop/restart/rebuild/destroy), + container name + port. +- Line 2: action buttons — `↻ R3BU1LD` always, `DESTR0Y` + `PURG3` + on sub-agents, `↺ R3ST4RT` + (sub-agents) `■ ST0P` when running, + `▶ ST4RT` when stopped. Buttons dim + disable while a transient + lifecycle action is in flight. + +`↻ UPD4TE 4LL` button appears above the containers list when any +agent is stale. + +Banner pulses on each broker SSE event (`pulseBanner` with a 4 s +grace timer). + +### Dashboard endpoints + +- `POST /{approve,deny}/{id}` — approve/deny a pending approval. +- `POST /{rebuild,kill,restart,start,destroy}/{name}` — lifecycle. +- `POST /purge-tombstone/{name}` — wipe a tombstone's state dirs. +- `POST /answer-question/{id}` — answer a pending operator question. +- `POST /request-spawn` — queue a Spawn approval. +- `POST /update-all` — rebuild every stale container. + +## Per-agent page + +Layout, top to bottom: + +- Banner (gradient shimmer while state=thinking). +- Title with `↻ R3BU1LD` button. +- Status section (online / needs login / login-in-progress). +- State badge row (`💤 idle / 🧠 thinking / ○ offline · `) + + `■ cancel turn` button visible while state=thinking. +- Terminal-wrap: live event tail (with sticky-bottom auto-scroll + and a `↓ N new` pill when not at bottom) followed by an + operator-input textarea acting as a prompt. + +### Live view + +Each agent runs an `events::Bus`: a `tokio::sync::broadcast` +plus a sqlite-backed history at `/state/hyperhive-events.sqlite`. +The harness emits `TurnStart { from, body, unread }`, +`Stream(value)` (one per parsed stream-json line), `Note`, +`TurnEnd { ok, note }`. The web UI: + +- fetches `GET /events/history` on page load and replays the last + 2000 events (oldest first, with `.no-anim` so they don't stagger); +- then subscribes to `GET /events/stream` (SSE) for live tail; +- shows a granular state badge above the terminal, driven from + `turn_start`/`turn_end`, with a flash animation on transition; +- sticky-bottom auto-scroll: scrolling up parks the view; new rows + surface a "↓ N new" pill instead of yanking; +- terminal-themed: phosphor mauve glow, Crust bg, backdrop-filter + blur, row fade-in slide-up. + +Per-stream rendering: + +- `Stream` `tool_use` → `→ Read /path` / `→ Bash $ cmd` / `→ send → + operator: "..."` etc., per-tool pretty rather than raw JSON. +- `Stream` `tool_result` short → flat `← ...`; long → collapsed + `
` `▸ ← Nl · headline` (click to expand full body). +- `Stream` `thinking` → text content if claude provided one, + otherwise the bare `· thinking …` indicator. +- `Stream` `system init`, `result`, `rate_limit_event` are dropped + — too noisy. +- `Note` → `· text`. +- `TurnEnd` → `✓ turn ok` / `✗ turn fail — note`, triggers a + `refreshState()`. + +### Terminal-embedded prompt + +The operator input lives *inside* the terminal-wrap as a +prompt-style textarea below the live tail: multi-line (Enter sends, +Shift+Enter newlines), tab-completes slash commands. + +Slash commands today: + +- `/help` — list commands locally +- `/clear` — wipe the local terminal view (server history kept) +- `/cancel` — `POST /api/cancel` → host shellouts `pkill -INT + claude`, emits a Note. Also surfaces as a `■ cancel turn` button + in the state row while state=thinking. +- `/compact` — `POST /api/compact` → host spawns + `turn::compact_session` in the background; output streams into + the live panel. + +Unknown `/foo` shows an error row instead of being silently sent. + +### Per-agent endpoints + +- `POST /send` — operator-injected message into this agent's inbox. +- `POST /login/{start,code,cancel}` — claude OAuth login flow. +- `POST /api/cancel` — SIGINT the in-flight claude turn. +- `POST /api/compact` — run `/compact` on the persistent session. +- `GET /events/history` — replay buffer for the terminal.