hyperhive/CLAUDE.md

18 KiB

hyperhive — claude entry point

Hey claude. This is your starting page. The detailed docs live in docs/ and are written for humans + you both — read them when you need depth on a subsystem. This file is the index + scratchpad.

File map

hive-c0re/         host daemon + CLI (one binary, subcommand-dispatched)
  src/main.rs           clap setup; serve / spawn / kill / rebuild / list /
                         pending / approve / deny / destroy [--purge] /
                         request-spawn; periodic vacuum tasks
  src/server.rs         host admin socket (HostRequest → dispatch)
  src/client.rs         admin-socket client
  src/manager_server.rs manager-privileged socket (ManagerRequest)
  src/agent_server.rs   per-sub-agent socket listener (long-poll Recv)
  src/broker.rs         sqlite Message store + intra-process broadcast
                         channel (`MessageEvent`) for `recv_blocking` +
                         the dashboard forwarder; hourly vacuum of
                         delivered>30d
  src/dashboard_events.rs unified wire-facing event channel feeding
                         `/dashboard/stream`. Carries broker `Sent` /
                         `Delivered` (mirrored by the forwarder task
                         in main.rs) + mutation events
                         (`ApprovalAdded` / `ApprovalResolved`,
                         `QuestionAdded` / `QuestionResolved`,
                         `TransientSet` / `TransientCleared`). Each
                         frame carries a monotonic per-process `seq`
                         clients use to dedupe against snapshot reads.
  src/approvals.rs      sqlite Approval queue + kinds
  src/operator_questions.rs  sqlite question queue backing `ask` /
                         `answer` (both operator + agent-to-agent)
  src/questions.rs      shared dispatch for `Ask` / `Answer` —
                         used by both agent + manager surfaces
  src/reminder_scheduler.rs  5s poll loop: drains due reminders,
                         resolves file_path container→host, persists
                         payload + delivers pointer string
  src/events_vacuum.rs  host-side hourly sweep of every agent's
                         /state/hyperhive-events.sqlite
  src/crash_watch.rs    poll every 10s; fire HelperEvent::ContainerCrash
                         when a previously-running container disappears
                         without an operator-initiated transient
  src/container_view.rs ContainerView struct + build_all helper;
                         shared between dashboard.rs (cold-load via
                         /api/state) and coordinator.rs's
                         rescan_containers_and_emit
  src/coordinator.rs    shared state (broker/approvals/operator_questions/
                         transient/sockets) + tombstone enumeration +
                         kick_agent + notify_agent (helper-event push) +
                         last_containers cache + rescan_and_emit diff helper
  src/open_threads.rs   loose-ends aggregator (pending approvals +
                         unanswered questions) — for_agent (filtered) and
                         hive_wide (manager surface). Backs
                         AgentRequest::GetOpenThreads + ManagerRequest::
                         GetOpenThreads (the get_open_threads MCP tool).
  src/actions.rs        approve/deny/destroy (transient-aware)
  src/auto_update.rs    startup rebuild scan + ensure_manager +
                         meta::lock_update_hyperhive bump
  src/lifecycle.rs      `nixos-container` shellouts; per-agent applied
                         + proposed git repo seeding; tag plumbing
  src/meta.rs           single hive-c0re-owned flake at /var/lib/
                         hyperhive/meta/ — sync_agents, two-phase
                         prepare/finalize/abort, lock_update_*
  src/migrate.rs        startup auto-migration from pre-meta layout
                         (idempotent, marker-guarded phase 4)
  src/dashboard.rs      axum HTTP: static shell + /api/state JSON + actions
                         + journald viewer + bind-with-retry (SO_REUSEADDR)
                         + deployed_sha chip per container +
                         /dashboard/{stream,history} subscribing to the
                         unified DashboardEvent channel
  assets/               index.html, dashboard.css, app.js (include_str!)

hive-fr0nt/           shared frontend-assets crate (browser only).
  src/lib.rs            pub const BASE_CSS / TERMINAL_CSS / TERMINAL_JS
                         re-exports; both binaries `include_str!` them
                         and prepend to their per-page serving routes.
  assets/base.css       Catppuccin palette + body typography (one source
                         of truth, no per-page redeclaration).
  assets/terminal.css   `.terminal-wrap` + `.live` + `.tail-pill` +
                         `.row` / `details.row` styling for both
                         pages' lit log panes.
  assets/terminal.js    `window.HiveTerminal.create(opts)`: scroll-
                         sticky log + "↓ N new" pill + history
                         backfill + SSE subscribe-buffer-snapshot-
                         dedupe dance. Pages register a kind→renderer
                         map; the terminal owns the lifecycle.

hive-ag3nt/        in-container harness crate; produces TWO binaries
  src/lib.rs            re-exports + DEFAULT_SOCKET, DEFAULT_WEB_PORT
  src/client.rs         generic JSON-line request/response over unix socket
  src/web_ui.rs         per-container axum HTTP page (incl /api/cancel,
                         /api/compact, /api/model, /events/history)
  src/turn_stats.rs     per-turn analytics sink (one sqlite row per
                         turn at /state/hyperhive-turn-stats.sqlite);
                         schema + best-effort writer
  src/events.rs         LiveEvent + broadcast Bus + sqlite-backed history
                         (/state/hyperhive-events.sqlite) + TurnState +
                         model selection (persisted at /state/hyperhive-model)
  src/turn.rs           claude --print + stream-json pump; --compact retry
  src/mcp.rs            embedded MCP server (rmcp): AgentServer + ManagerServer
  src/login.rs          probe /root/.claude/ for a valid session
  src/login_session.rs  drives `claude auth login` over stdio pipes
  src/bin/hive-ag3nt.rs sub-agent main (Serve + Mcp subcommands)
  src/bin/hive-m1nd.rs  manager main (Serve + Mcp subcommands)
  assets/               index.html, agent.css, app.js (include_str!)
  prompts/              static role/tools/settings for claude (include_str!):
                          agent.md  — sub-agent system prompt
                          manager.md — manager system prompt
                          claude-settings.json — --settings JSON

hive-sh4re/        wire types (HostRequest/Response, AgentRequest/Response,
                   ManagerRequest/Response, Message, Approval, HelperEvent)

nix/
  modules/hive-c0re.nix         systemd service + firewall + git wiring
  templates/harness-base.nix    shared scaffolding for sub-agents + manager
  templates/agent-base.nix      sub-agent nixosConfiguration
  templates/manager.nix         manager nixosConfiguration

docs/
  conventions.md       naming, identity=socket, async forms, commit style
  gotchas.md           NixOS / nspawn lessons learned the hard way
  web-ui.md            dashboard + per-agent page layouts and endpoints
  turn-loop.md         claude invocation, wake prompt, MCP tool surface
  approvals.md         approval flow, manager policy, helper events
  persistence.md       sqlite dbs, retention, state dir layout

Reading paths

Pick the doc that matches your task. None depend on the others — read them à la carte.

Quick reminders

  • Commit before test. Stage and commit when work looks ready, then run validation. Failures get a follow-up commit rather than an amend.
  • Commit messages: short, lowercase, no Co-Authored-By trailer. Imperative mood.
  • rebuild is the reconcile verb. Anything that changes per-container state on the host should be re-applied there so the dashboard's ↻ R3BU1LD is sufficient to recover.
  • Identity = socket. No auth tokens — the socket path identifies the principal.
  • Actions are factored between admin socket and dashboard via actions.rs and dashboard.rs::lifecycle_action, so the two surfaces never drift.

Scratchpad

In-flight or recent context that hasn't earned a section yet. Prune freely.

  • Just landed: meta-flake overhaul. Each agent's applied repo is a module-only flake (forwards every inputs.* through to agent.nix as the flakeInputs module arg — manager edits inputs to pull in external flakes like an MCP server's own flake; the new sha lands in the agent's own flake.lock and rolls up to meta's). A single hive-c0re-owned repo at /var/lib/hyperhive/meta/ declares one input per agent and one nixosConfigurations.<n> output, wrapping the agent's nixosModules.default with identity + HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT / HIVE_OPERATOR_PRONOUNS. Containers run against meta#<n>. Every approve uses two-phase staging (prepare → build → finalize/abort) so meta's git log only records successful deploys; failures + denials live as annotated tags in applied. All meta operations serialize behind a tokio mutex; stale .git/index.lock is cleared on hive-c0re startup. Manager has /applied
    • /meta RO-bound + the applied remote pre-wired in every proposed repo. Migration runs idempotently on startup (HIVE_SKIP_META_MIGRATION=1 skips). Operator pronouns are a NixOS module option (services.hive-c0re.operatorPronouns, default "she/her"); the harness substitutes them into the system prompt at boot.
  • Just landed: per-agent extra MCP servers via the hyperhive.extraMcpServers.<key> NixOS option in agent.nix. Declares { command, args, env, allowedTools }; the module writes the whole map to /etc/hyperhive/extra-mcp.json; the harness reads that file and merges each entry into both --mcp-config and --allowedTools (mapped to mcp__<key>__<pattern>). Unblocks matrix / bitburner / any agent with rich domain tooling — the agent flake's inputs block pulls the external flake, agent.nix references it via flakeInputs.<name>.packages.${pkgs.system}.default.
  • Just landed: per-turn analytics sink. New hive-ag3nt::turn_stats writes one row per claude turn to /state/hyperhive-turn-stats.sqlite: identity (model, wake_from, result_kind), timing (started/ended_at, duration_ms), cost (full token-usage breakdown), behaviour (tool_call_count + per-tool JSON map), and post-turn snapshot metrics (open_threads_count, open_reminders_count fetched via the existing GetOpenThreads + new CountPendingReminders RPC). Both ag3nt + m1nd bin loops capture, both Bus accumulates tool_use blocks via observe_stream during the stdout pump. Writes are best-effort. No host-side vacuum yet — TODO under Telemetry; same shape as events_vacuum, target 90d retention.
  • Just landed: agent web UI event-driven badges. New LiveEvent::StatusChanged / ModelChanged / TokenUsageChanged / TurnStateChanged variants replace the per-agent page's /api/state polling for the state row. Status/model/token/state badges all update from SSE; /api/state only fetched on cold load + during the login flow (session output isn't event- shaped). Per-agent endpoints (/api/cancel|compact|model| new-session, /login/*) all flip 303→200. New alive-badge chip carries the harness reachability signal (replaces the "● harness alive" paragraph); new ctx-badge mirrors Claude Code's bottom-right "N tokens" indicator. Every chip carries a title=... tooltip for hover detail.
  • Just landed: events_vacuum simplified to age-only — KEEP_SECS = 7d, no row cap. Chatty turn no longer evicts a quiet day's history sooner than expected. Hourly sweep unchanged.
  • Just landed: Phase 6 container events. New DashboardEvent::ContainerStateChanged { container } + ContainerRemoved { name } close the last refetch loop on the dashboard side. Coordinator::rescan_containers_and_emit builds a fresh container_view::build_all snapshot, diffs it against a cached last_containers map, and fires per-row events for the delta. Called from every mutation site: actions::approve (post-spawn), actions::destroy, the lifecycle_action wrapper in dashboard.rs (start/stop/restart/rebuild), auto_update:: rebuild_agent, and the existing 10s crash_watch poll loop. ContainerView extracted to its own module so coordinator + dashboard can both build it. Dashboard endpoints (/restart, /destroy, /kill, /rebuild, /start, /update-all, /meta-update, /purge-tombstone) now return 200; matching forms carry data-no-refresh where the event coverage is complete (purge + meta-update keep the refetch since tombstones
    • meta_inputs aren't event-derived yet). Client drops the 5s periodic /api/state poll entirely — initial cold load + SSE for everything afterwards; pending overlay reads from transientsState since the new event payload doesn't carry it.
  • Just landed: dashboard event refactor. New hive-fr0nt workspace crate hosts shared frontend assets (palette + terminal CSS + window.HiveTerminal.create JS) so both the dashboard and the per-agent web UI render their live panes through the same code; the dashboard's #msgflow now feels like the agent's terminal (sticky-bottom + pill + lit chrome). New unified DashboardEvent channel on Coordinator (replaces the broker-only /messages/stream); a background forwarder mirrors broker traffic onto it as Sent / Delivered variants, and the mutation-event variants (ApprovalAdded / ApprovalResolved, QuestionAdded / QuestionResolved, TransientSet / TransientCleared) cover every in-process state change the dashboard cares about. Each frame carries a monotonic per-process seq; snapshot endpoints return their seq alongside the state, and the terminal's open-buffer-then-fetch-history dance drops any buffered frame with seq <= history_seq so an event landing between subscribe and history-fetch is neither shown twice nor lost. Operator inbox + approvals + questions + transients are now derived client-side from the event stream (cold-loaded from /api/state for first paint, mutated live from SSE thereafter); /op-send + per-agent /send return 200 instead of 303-and-refetch. Container-list events still pending — ContainerView is sourced from external nixos-container list, so the 5s /api/state poll continues to drive the containers section. Approval diffs are now raw unified-diff text on the wire (per-line classification happens in JS) so they fit in SSE payloads without HTML escaping. Bug fix: LiveEvent::Note was a newtype variant that serde silently failed to serialize — converted to Note { text: String } (wire shape matches what the JS already read).
  • Just landed: ask_operatorask rename + optional to: <agent> param for agent-to-agent structured Q&A. Recipient defaults to the operator (dashboard); peer questions land in the target's inbox as QuestionAsked events and the recipient replies via new answer(id, answer) tool. Answer always flows back as QuestionAnswered { id, question, answer, answerer } (renamed from OperatorAnswered; answerer distinguishes operator vs peer vs ttl-watchdog). Authorisation: operator-targeted questions can only be answered by the operator; agent-targeted by the named target (or the operator as override). Self-ask rejected. Shared dispatch lives in hive-c0re/src/questions.rs. Dashboard's pending() filters on target IS NULL so peer questions never leak into the operator's queue.
  • Just landed: dashboard now has a terminal-style compose textbox under the message-flow stream — @name picks the recipient (sticky in localStorage, auto- completed from containers[]), POSTs /op-send. New per-agent ↻ new session button drops --continue for one turn. Claude spawns with cwd = /state so relative paths in tool calls land in the durable dir.
  • Just landed (prior overhaul still underneath): tag- driven config-apply. Two-repo split (proposed = manager RW, applied = core-only); request_apply_commit fetches the manager's commit into applied and pins it as proposal/<id>; approve / deny / build walk through tags on the same commit; applied/main only fast- forwards on deployed/. failed/ + denied/ are annotated. See docs/approvals.md.
  • Recent (since last compaction): inline +/- diffs on Write/Edit, send full body via collapsed details, operator cancel + ttl on questions, deny-with-reason, dashboard back-link + last-turn timing + model chip, per-agent inbox view, bind-retry + SO_REUSEADDR, journald viewer, agent.nix viewer, server-side TurnState, recv(wait_seconds) max 180s, runtime /model switch + persistence to /state, crash watcher + ContainerCrash / NeedsLogin / LoggedIn / NeedsUpdate events, manager update tool, pure-hash agent_web_port + collision banner + spawn/rebuild preflight, browser notifications, focus-preserving refresh, generalised
    survival, prompt-on-submit pattern.
  • Open threads: custom per-agent MCP tools (groundwork for moving bitburner-agent into hyperhive), two-step spawn, per-agent send allow-list, telemetry/charts, notes compaction, unprivileged containers, Bash allow-list, xterm.js. Known bug (in TODO.md): question id=5 was queued but didn't render — likely a pending() row-decode error swallowed by unwrap_or_default; investigate by curl /api/state | jq '.questions' + browser console.