hyperhive/CLAUDE.md

341 lines
19 KiB
Markdown

# hyperhive — claude entry point
Hey claude. This is your starting page. The detailed docs live in
[`docs/`](docs/) and are written for humans + you both — read them
when you need depth on a subsystem. This file is the index +
scratchpad.
- High-level project intro: **[README.md](README.md)**.
- Open work + backlog: **[TODO.md](TODO.md)**.
## File map
```
hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
src/main.rs clap setup; serve / spawn / kill / rebuild / list /
pending / approve / deny / destroy [--purge] /
request-spawn; periodic vacuum tasks
src/server.rs host admin socket (HostRequest → dispatch)
src/client.rs admin-socket client
src/manager_server.rs manager-privileged socket (ManagerRequest)
src/agent_server.rs per-sub-agent socket listener (long-poll Recv)
src/broker.rs sqlite Message store + intra-process broadcast
channel (`MessageEvent`) for `recv_blocking` +
the dashboard forwarder; hourly vacuum of
delivered>30d
src/dashboard_events.rs unified wire-facing event channel feeding
`/dashboard/stream`. Carries broker `Sent` /
`Delivered` (mirrored by the forwarder task
in main.rs) + mutation events
(`ApprovalAdded` / `ApprovalResolved`,
`QuestionAdded` / `QuestionResolved`,
`TransientSet` / `TransientCleared`). Each
frame carries a monotonic per-process `seq`
clients use to dedupe against snapshot reads.
src/approvals.rs sqlite Approval queue + kinds
src/operator_questions.rs sqlite question queue backing `ask` /
`answer` (both operator + agent-to-agent)
src/questions.rs shared dispatch for `Ask` / `Answer` —
used by both agent + manager surfaces
src/reminder_scheduler.rs 5s poll loop: drains due reminders,
resolves file_path container→host, persists
payload + delivers pointer string
src/events_vacuum.rs host-side hourly sweep of every agent's
/state/hyperhive-events.sqlite
src/crash_watch.rs poll every 10s; fire HelperEvent::ContainerCrash
when a previously-running container disappears
without an operator-initiated transient
src/container_view.rs ContainerView struct + build_all helper;
shared between dashboard.rs (cold-load via
/api/state) and coordinator.rs's
rescan_containers_and_emit
src/coordinator.rs shared state (broker/approvals/operator_questions/
transient/sockets) + tombstone enumeration +
kick_agent + notify_agent (helper-event push) +
last_containers cache + rescan_and_emit diff helper
src/loose_ends.rs loose-ends aggregator (pending approvals +
unanswered questions + pending reminders) —
for_agent (filtered) and hive_wide (manager
surface). Backs AgentRequest::GetLooseEnds +
ManagerRequest::GetLooseEnds (the
get_loose_ends MCP tool).
src/actions.rs approve/deny/destroy (transient-aware)
src/auto_update.rs startup rebuild scan + ensure_manager +
meta::lock_update_hyperhive bump
src/lifecycle.rs `nixos-container` shellouts; per-agent applied
+ proposed git repo seeding; tag plumbing
src/meta.rs single hive-c0re-owned flake at /var/lib/
hyperhive/meta/ — sync_agents, two-phase
prepare/finalize/abort, lock_update_*
src/migrate.rs startup auto-migration from pre-meta layout
(idempotent, marker-guarded phase 4)
src/dashboard.rs axum HTTP: static shell + /api/state JSON + actions
+ journald viewer + bind-with-retry (SO_REUSEADDR)
+ deployed_sha chip per container +
/dashboard/{stream,history} subscribing to the
unified DashboardEvent channel
assets/ index.html, dashboard.css, app.js (include_str!)
hive-fr0nt/ shared frontend-assets crate (browser only).
src/lib.rs pub const BASE_CSS / TERMINAL_CSS / TERMINAL_JS
re-exports; both binaries `include_str!` them
and prepend to their per-page serving routes.
assets/base.css Catppuccin palette + body typography (one source
of truth, no per-page redeclaration).
assets/terminal.css `.terminal-wrap` + `.live` + `.tail-pill` +
`.row` / `details.row` styling for both
pages' lit log panes.
assets/terminal.js `window.HiveTerminal.create(opts)`: scroll-
sticky log + "↓ N new" pill + history
backfill + SSE subscribe-buffer-snapshot-
dedupe dance. Pages register a kind→renderer
map; the terminal owns the lifecycle.
hive-ag3nt/ in-container harness crate; produces TWO binaries
src/lib.rs re-exports + DEFAULT_SOCKET, DEFAULT_WEB_PORT
src/client.rs generic JSON-line request/response over unix socket
src/web_ui.rs per-container axum HTTP page (incl /api/cancel,
/api/compact, /api/model, /events/history)
src/turn_stats.rs per-turn analytics sink (one sqlite row per
turn at /state/hyperhive-turn-stats.sqlite);
schema + best-effort writer
src/events.rs LiveEvent + broadcast Bus + sqlite-backed history
(/state/hyperhive-events.sqlite) + TurnState +
model selection (persisted at /state/hyperhive-model)
src/turn.rs claude --print + stream-json pump; --compact retry
src/mcp.rs embedded MCP server (rmcp): AgentServer + ManagerServer
src/login.rs probe /root/.claude/ for a valid session
src/login_session.rs drives `claude auth login` over stdio pipes
src/bin/hive-ag3nt.rs sub-agent main (Serve + Mcp subcommands)
src/bin/hive-m1nd.rs manager main (Serve + Mcp subcommands)
assets/ index.html, agent.css, app.js (include_str!)
prompts/ static role/tools/settings for claude (include_str!):
agent.md — sub-agent system prompt
manager.md — manager system prompt
claude-settings.json — --settings JSON
hive-sh4re/ wire types (HostRequest/Response, AgentRequest/Response,
ManagerRequest/Response, Message, Approval, HelperEvent)
nix/
modules/hive-c0re.nix systemd service + firewall + git wiring
templates/harness-base.nix shared scaffolding for sub-agents + manager
templates/agent-base.nix sub-agent nixosConfiguration
templates/manager.nix manager nixosConfiguration
docs/
conventions.md naming, identity=socket, async forms, commit style
gotchas.md NixOS / nspawn lessons learned the hard way
web-ui.md dashboard + per-agent page layouts and endpoints
turn-loop.md claude invocation, wake prompt, MCP tool surface
approvals.md approval flow, manager policy, helper events
persistence.md sqlite dbs, retention, state dir layout
```
## Reading paths
Pick the doc that matches your task. None depend on the others —
read them à la carte.
- **"What does the dashboard look like?"** →
[`docs/web-ui.md`](docs/web-ui.md).
- **"How does the per-agent terminal classify + colour
events?"** → [`docs/terminal-rendering.md`](docs/terminal-rendering.md)
(taxonomy + known inconsistencies + a proposed coherence pass).
- **"How does claude get its prompt and what tools does it have?"** →
[`docs/turn-loop.md`](docs/turn-loop.md).
- **"How do config changes flow from manager to operator to
container?"** → [`docs/approvals.md`](docs/approvals.md).
- **"What state survives destroy / purge / restart?"** →
[`docs/persistence.md`](docs/persistence.md).
- **"Naming, commit style, wire protocol, the `data-async`
pattern."** → [`docs/conventions.md`](docs/conventions.md).
- **"Why does the nspawn flag look like that?"** →
[`docs/gotchas.md`](docs/gotchas.md).
## Quick reminders
- **Commit before test.** Stage and commit when work *looks*
ready, then run validation. Failures get a follow-up commit
rather than an amend.
- **Commit messages: short, lowercase, no `Co-Authored-By`
trailer.** Imperative mood.
- **`rebuild` is the reconcile verb.** Anything that changes
per-container state on the host should be re-applied there so
the dashboard's `↻ R3BU1LD` is sufficient to recover.
- **Identity = socket.** No auth tokens — the socket path
identifies the principal.
- **Actions are factored** between admin socket and dashboard via
`actions.rs` and `dashboard.rs::lifecycle_action`, so the two
surfaces never drift.
## Scratchpad
In-flight or recent context that hasn't earned a section yet.
Prune freely.
- **Just landed:** meta-flake overhaul. Each agent's applied
repo is a module-only flake (forwards every `inputs.*`
through to `agent.nix` as the `flakeInputs` module arg —
manager edits `inputs` to pull in external flakes like an
MCP server's own flake; the new sha lands in the agent's
own `flake.lock` and rolls up to meta's). A single
hive-c0re-owned repo at `/var/lib/hyperhive/meta/`
declares one input per agent and one
`nixosConfigurations.<n>` output, wrapping the agent's
`nixosModules.default` with identity + `HIVE_PORT` /
`HIVE_LABEL` / `HIVE_DASHBOARD_PORT` /
`HIVE_OPERATOR_PRONOUNS`. Containers run against
`meta#<n>`. Every approve uses two-phase staging
(prepare → build → finalize/abort) so meta's git log only
records successful deploys; failures + denials live as
annotated tags in applied. All meta operations
serialize behind a tokio mutex; stale `.git/index.lock`
is cleared on hive-c0re startup. Manager has `/applied`
+ `/meta` RO-bound + the `applied` remote pre-wired in
every proposed repo. Migration runs idempotently on
startup (`HIVE_SKIP_META_MIGRATION=1` skips). Operator
pronouns are a NixOS module option
(`services.hive-c0re.operatorPronouns`, default
`"she/her"`); the harness substitutes them into the
system prompt at boot.
- **Just landed:** per-agent extra MCP servers via the
`hyperhive.extraMcpServers.<key>` NixOS option in
`agent.nix`. Declares `{ command, args, env,
allowedTools }`; the module writes the whole map to
`/etc/hyperhive/extra-mcp.json`; the harness reads that
file and merges each entry into both `--mcp-config`
and `--allowedTools` (mapped to `mcp__<key>__<pattern>`).
Unblocks matrix / bitburner / any agent with rich
domain tooling — the agent flake's `inputs` block pulls
the external flake, `agent.nix` references it via
`flakeInputs.<name>.packages.${pkgs.system}.default`.
- **Just landed:** per-turn analytics sink. New
`hive-ag3nt::turn_stats` writes one row per claude turn to
`/state/hyperhive-turn-stats.sqlite`: identity (model,
wake_from, result_kind), timing (started/ended_at,
duration_ms), cost (full token-usage breakdown), behaviour
(tool_call_count + per-tool JSON map), and post-turn snapshot
metrics (open_threads_count, open_reminders_count fetched via
the existing GetOpenThreads + new CountPendingReminders RPC).
Both ag3nt + m1nd bin loops capture, both Bus accumulates
tool_use blocks via observe_stream during the stdout pump.
Writes are best-effort. No host-side vacuum yet — TODO under
Telemetry; same shape as events_vacuum, target 90d retention.
- **Just landed:** agent web UI event-driven badges. New
`LiveEvent::StatusChanged / ModelChanged / TokenUsageChanged
/ TurnStateChanged` variants replace the per-agent page's
/api/state polling for the state row. Status/model/token/state
badges all update from SSE; /api/state only fetched on cold
load + during the login flow (session output isn't event-
shaped). Per-agent endpoints (`/api/cancel|compact|model|
new-session`, `/login/*`) all flip 303→200. New `alive-badge`
chip carries the harness reachability signal (replaces the
"● harness alive" paragraph); new `ctx-badge` mirrors Claude
Code's bottom-right "N tokens" indicator. Every chip carries
a `title=...` tooltip for hover detail.
- **Just landed:** events_vacuum simplified to age-only —
`KEEP_SECS = 7d`, no row cap. Chatty turn no longer evicts
a quiet day's history sooner than expected. Hourly sweep
unchanged.
- **Just landed:** Phase 6 container events. New
`DashboardEvent::ContainerStateChanged { container }` +
`ContainerRemoved { name }` close the last refetch loop on the
dashboard side. `Coordinator::rescan_containers_and_emit` builds a
fresh `container_view::build_all` snapshot, diffs it against a
cached `last_containers` map, and fires per-row events for the
delta. Called from every mutation site: `actions::approve`
(post-spawn), `actions::destroy`, the `lifecycle_action` wrapper
in `dashboard.rs` (start/stop/restart/rebuild), `auto_update::
rebuild_agent`, and the existing 10s `crash_watch` poll loop.
`ContainerView` extracted to its own module so coordinator +
dashboard can both build it. Dashboard endpoints (`/restart`,
`/destroy`, `/kill`, `/rebuild`, `/start`, `/update-all`,
`/meta-update`, `/purge-tombstone`) now return 200; matching
forms carry `data-no-refresh` where the event coverage is
complete (purge + meta-update keep the refetch since tombstones
+ meta_inputs aren't event-derived yet). Client drops the 5s
periodic `/api/state` poll entirely — initial cold load + SSE
for everything afterwards; pending overlay reads from
`transientsState` since the new event payload doesn't carry it.
- **Just landed:** dashboard event refactor. New `hive-fr0nt`
workspace crate hosts shared frontend assets (palette + terminal
CSS + `window.HiveTerminal.create` JS) so both the dashboard and
the per-agent web UI render their live panes through the same
code; the dashboard's `#msgflow` now feels like the agent's
terminal (sticky-bottom + pill + lit chrome). New unified
`DashboardEvent` channel on `Coordinator` (replaces the
broker-only `/messages/stream`); a background forwarder mirrors
broker traffic onto it as `Sent` / `Delivered` variants, and
the mutation-event variants
(`ApprovalAdded` / `ApprovalResolved`, `QuestionAdded` /
`QuestionResolved`, `TransientSet` / `TransientCleared`) cover
every in-process state change the dashboard cares about. Each
frame carries a monotonic per-process `seq`; snapshot endpoints
return their seq alongside the state, and the terminal's
open-buffer-then-fetch-history dance drops any buffered frame
with `seq <= history_seq` so an event landing between subscribe
and history-fetch is neither shown twice nor lost. Operator
inbox + approvals + questions + transients are now derived
client-side from the event stream (cold-loaded from
`/api/state` for first paint, mutated live from SSE
thereafter); `/op-send` + per-agent `/send` return 200 instead
of 303-and-refetch. Container-list events still pending —
`ContainerView` is sourced from external `nixos-container list`,
so the 5s `/api/state` poll continues to drive the containers
section. Approval diffs are now raw unified-diff text on the
wire (per-line classification happens in JS) so they fit in
SSE payloads without HTML escaping. Bug fix: `LiveEvent::Note`
was a newtype variant that serde silently failed to serialize
— converted to `Note { text: String }` (wire shape matches what
the JS already read).
- **Just landed:** `ask_operator``ask` rename + optional
`to: <agent>` param for agent-to-agent structured Q&A.
Recipient defaults to the operator (dashboard); peer
questions land in the target's inbox as `QuestionAsked`
events and the recipient replies via new `answer(id,
answer)` tool. Answer always flows back as
`QuestionAnswered { id, question, answer, answerer }`
(renamed from `OperatorAnswered`; `answerer` distinguishes
operator vs peer vs `ttl-watchdog`). Authorisation:
operator-targeted questions can only be answered by the
operator; agent-targeted by the named target (or the
operator as override). Self-ask rejected. Shared dispatch
lives in `hive-c0re/src/questions.rs`. Dashboard's
`pending()` filters on `target IS NULL` so peer questions
never leak into the operator's queue.
- **Just landed:** dashboard now has a terminal-style
compose textbox under the message-flow stream — `@name`
picks the recipient (sticky in localStorage, auto-
completed from `containers[]`), POSTs `/op-send`. New
per-agent `↻ new session` button drops `--continue` for
one turn. Claude spawns with `cwd = /state` so relative
paths in tool calls land in the durable dir.
- **Just landed (prior overhaul still underneath):** tag-
driven config-apply. Two-repo split (proposed = manager
RW, applied = core-only); `request_apply_commit` fetches
the manager's commit into applied and pins it as
`proposal/<id>`; approve / deny / build walk through
tags on the same commit; `applied/main` only fast-
forwards on `deployed/`. `failed/` + `denied/` are
annotated. See `docs/approvals.md`.
- **Recent (since last compaction):** inline +/- diffs on
Write/Edit, send full body via collapsed details, operator
cancel + ttl on questions, deny-with-reason, dashboard
back-link + last-turn timing + model chip, per-agent inbox
view, bind-retry + SO_REUSEADDR, journald viewer,
agent.nix viewer, server-side TurnState, recv(wait_seconds)
max 180s, runtime /model switch + persistence to /state,
crash watcher + ContainerCrash / NeedsLogin / LoggedIn /
NeedsUpdate events, manager `update` tool, pure-hash
agent_web_port + collision banner + spawn/rebuild preflight,
browser notifications, focus-preserving refresh, generalised
<details data-restore-key> survival, prompt-on-submit pattern.
- **Open threads:** custom per-agent MCP tools (groundwork for
moving bitburner-agent into hyperhive), two-step spawn,
per-agent send allow-list, telemetry/charts, notes
compaction, unprivileged containers, Bash allow-list,
xterm.js. **Known bug** (in TODO.md): question id=5 was
queued but didn't render — likely a `pending()` row-decode
error swallowed by `unwrap_or_default`; investigate by curl
/api/state | jq '.questions' + browser console.