new mcp tool on the manager surface that queues a question on the
dashboard and returns the question id immediately. operator submits an
answer via /answer-question/<id>; the dashboard fires
HelperEvent::OperatorAnswered { id, question, answer } into the manager
inbox so the next turn picks it up.
also: fix async-form button stuck on spinner after successful submit
(refreshState skipped re-rendering, so the button was never re-enabled).
19 KiB
hyperhive — developer reference
Operator + dev notes: conventions, gotchas, per-subsystem design.
File map
hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
src/main.rs clap setup; serve / spawn / kill / rebuild / list /
pending / approve / deny / destroy / request-spawn
src/server.rs host admin socket (HostRequest → dispatch)
src/client.rs admin-socket client
src/manager_server.rs manager-privileged socket (ManagerRequest)
src/agent_server.rs per-sub-agent socket listener (long-poll Recv)
src/broker.rs sqlite Message store + broadcast channel for SSE
src/approvals.rs sqlite Approval queue + kinds
src/coordinator.rs shared state (broker/approvals/transient/sockets)
src/actions.rs approve/deny/destroy
src/auto_update.rs startup rebuild scan + ensure_manager
src/lifecycle.rs `nixos-container` shellouts, per-agent flake generator
src/dashboard.rs axum HTTP: static shell + /api/state JSON + actions
assets/ index.html, dashboard.css, app.js (include_str!)
hive-ag3nt/ in-container harness crate; produces TWO binaries
src/lib.rs re-exports + DEFAULT_SOCKET, DEFAULT_WEB_PORT
src/client.rs generic JSON-line request/response over unix socket
src/web_ui.rs per-container axum HTTP page
src/events.rs LiveEvent + broadcast Bus for the SSE stream
src/turn.rs claude --print + stream-json pump; --compact retry
src/mcp.rs embedded MCP server (rmcp): AgentServer + ManagerServer
src/login.rs probe /root/.claude/ for a valid session
src/login_session.rs drives `claude auth login` over stdio pipes
src/bin/hive-ag3nt.rs sub-agent main
src/bin/hive-m1nd.rs manager main
assets/ index.html, agent.css, app.js (include_str!)
prompts/ static role/tools/settings for claude (include_str!):
agent.md — sub-agent system prompt
manager.md — manager system prompt
claude-settings.json — --settings JSON
hive-sh4re/ wire types (HostRequest/Response, AgentRequest/Response,
ManagerRequest/Response, Message, Approval, HelperEvent)
nix/
modules/hive-c0re.nix systemd service + firewall + git wiring
templates/harness-base.nix shared scaffolding for sub-agents + manager
templates/agent-base.nix sub-agent nixosConfiguration
templates/manager.nix manager nixosConfiguration
Conventions
- Naming. Containers are length-bounded (
nixos-container≤ 11 chars). Sub-agents areh-<name>with<name>≤ 9 chars; the manager ishm1nd.MAX_AGENT_NAMEenforces the cap inlifecycle.rs. Per-agent web UI port =WEB_PORT_BASE + FNV1a(name) % WEB_PORT_RANGE(8100..8999); manager fixed at 8000; dashboardcfg.dashboardPort(default 7000). - Identity = socket. No auth/tokens on the per-agent sockets. The socket path identifies the principal; perms come from "who has the bind-mount."
- Wire protocol. JSON line-delimited over unix sockets in both directions
(host admin / manager / agent).
/messages/streamistext/event-stream. - Commit messages. Short, lowercase, no Co-Authored-By trailer.
- Commit before test. Stage and commit when work looks ready, then run
validation (
cargo check,nix flake check, real lpt2 deploy). Failures get a follow-up commit rather than an amend. rebuildis the reconcile verb. Idempotently rewrites/etc/nixos-containers/<C>.conf(PRIVATE_NETWORK=0, clears HOST_ADDRESS/LOCAL_ADDRESS, setsEXTRA_NSPAWN_FLAGS), regeneratesapplied/<name>/flake.nix, writes the systemd limits drop-in, thennixos-container update+ stop + start. Anything that changes per-container state on the host should be re-applied here.- Actions are factored.
approve/deny/destroylive inactions.rs; the admin socket and the dashboard POST handlers both call into them so the two surfaces never drift. - Async forms. Dashboard + per-agent mutating forms carry
data-async; a delegatedsubmitlistener inassets/app.jsintercepts, shows a spinner, POSTs withapplication/x-www-form-urlencoded(axumFormextractor rejects multipart), callsrefreshState()on success. New mutating forms should adddata-asyncand optionallydata-confirm.
Gotchas / lessons learned
nixos-containerdoesn't expose--bindon the CLI. Path is viaEXTRA_NSPAWN_FLAGSin/etc/nixos-containers/<NAME>.conf— the start script (/nix/store/.../container_-start) expands it unquoted into thesystemd-nspawninvocation. We rewrite this line inset_nspawn_flags()./run/systemd/nspawn/*.nspawnoverrides are ignored bynixos-container's start script (it builds the nspawn cmd line directly).boot.isNspawnContainer = true, notboot.isContainer = true. Renamed in nixos-25.11+.nixos-container createauto-assignsHOST_ADDRESS/LOCAL_ADDRESSin the.conf. The start script'sif HOST_ADDRESS set → --network-vethbranch then forces a private netns — which is silently fatal for our web UIs (the bind is invisible from the host). We force-clear those vars (andHOST_ADDRESS6/LOCAL_ADDRESS6/HOST_BRIDGE) plus setPRIVATE_NETWORK=0.- systemd service PATH ≠ host PATH. The hive-c0re service sets
path = [ pkgs.git "/run/current-system/sw" ]. In-container harness services do the same so anything an agent adds to its ownagent.nix(environment.systemPackages) is visible to claude's Bash tool without editing the service definition.environment.HYPERHIVE_GITbakes git's absolute path in (read bylifecycle::git_command()) for the host. RuntimeDirectoryPreserve = "yes"keeps/run/hyperhive/(and the per-agent sub-dirs) acrosshive-c0rerestarts. Without it, every restart wipes bind sources and existing containers can't be started.register_agentis idempotent — drops any prior socket task before rebinding. Required so ahive-c0rerestart followed byrebuild alicerecreates the agent's socket without needing a clean reinstall.claude-codeis unfree.harness-base.nixallow-list's it specifically. The flake pins it to nixpkgs-unstable viaoverlays.claude-unstable(stable lags too far). The overlay imports unstable with its ownallowUnfreePredicateso the access inside the overlay doesn't itself trip.- Claude credentials are per-agent.
/var/lib/hyperhive/agents/<name>/claude/bind-mounts to/root/.claude(RW). Sharing one dir across agents is NOT viable — OAuth refresh tokens rotate, so any sibling refresh invalidates all the others. Login flow runs from the per-agent web UI; creds persist acrossdestroy/recreate. - Persistent notes dir per agent.
/var/lib/hyperhive/agents/<name>/state/bind-mounts to/state(RW). System prompts tell agents to keep durable knowledge here (/state/notes.md, anything else under/state/). Survives destroy/recreate alongside the claude dir. - Orphan approvals. If state dirs are wiped out from under a pending
approval (test scripts, manual
rm -rf), the dashboard's next render marks themfailedwith note"agent state dir missing"so they fall out ofpending. They stay in sqlite for audit.
Web UI shape
Both the dashboard (port 7000) and the per-agent web UIs (8000 / 8100-8999) are SPAs with the same skeleton:
GET /→ staticassets/index.html(placeholders for state-driven sections).GET /static/*.css+GET /static/*.js→ static assets shipped viainclude_str!so there's no runtime file dependency.GET /api/state→ JSON snapshot the JS app renders into the DOM.POST /<action>(approve, deny, kill, restart, rebuild, destroy, request-spawn, update-all, send, login/*) → idempotent action endpoints.GET /events/stream(per-agent) andGET /messages/stream(dashboard) aretext/event-streamSSE for live updates.
The JS app handles all form[data-async] submissions via a delegated
listener: read data-confirm, swap the button to a spinner, POST
application/x-www-form-urlencoded (axum's Form extractor rejects
multipart), then on success call refreshState() (re-fetch /api/state
and re-render). No full-page reloads.
Per-agent + dashboard state shapes live in dashboard.rs::StateSnapshot
and web_ui.rs::StateSnapshot. When adding new state fields, plumb
through the snapshot struct and the relevant assets/app.js render
function — never reach for server-side HTML rendering again.
Agent MCP surface + turn loop
The harness ships an embedded MCP server (rmcp 1.7) that claude launches as
a stdio child via --mcp-config. Subcommand: hive-ag3nt mcp (or
hive-m1nd mcp for the manager surface).
Sub-agent tools:
mcp__hyperhive__send(to, body)— message a peer or the operator.mcp__hyperhive__recv()— drain one inbox message.
Manager additionally:
mcp__hyperhive__request_spawn(name)— queue Spawn approval.mcp__hyperhive__kill(name)— graceful stop.mcp__hyperhive__request_apply_commit(agent, commit_ref)— submit a config change for any agent (includinghm1ndfor self-mods).mcp__hyperhive__ask_operator(question, options?)— non-blocking; queues a question on the dashboard, returns the question id. Operator's answer arrives later as aHelperEvent::OperatorAnsweredin the manager inbox.
The shared per-turn plumbing lives in hive_ag3nt::turn::{write_mcp_config, write_settings, write_system_prompt, run_turn, drive_turn, emit_turn_end, wait_for_login} so the two binaries can't drift.
On harness boot, three files get dropped next to the mcp socket at
/run/hive/:
claude-mcp-config.json— re-invokes the running binary asmcpchild.claude-settings.json—--settingsblob (auto-compact/auto-memory off, effortLevel medium).claude-system-prompt.md— rendered fromprompts/{agent,manager}.mdwith{label}substituted. Passed via--system-prompt-file.
Each turn:
claude --print --verbose --output-format stream-json --model haiku \
--continue --settings /run/hive/claude-settings.json \
--system-prompt-file /run/hive/claude-system-prompt.md \
--mcp-config /run/hive/claude-mcp-config.json --strict-mcp-config \
--tools <builtins> --allowedTools <builtins+mcp>
# wake prompt piped over stdin — minimal, just from/body + optional unread hint
--continue keeps a persistent session per agent (claude stores sessions in
~/.claude/projects/, which is bind-mounted persistently). Auto-compact and
auto-memory are disabled because hyperhive owns compaction (/compact on
overflow, retry once).
Loop control. The harness pops one inbox message per cycle (the wake
signal — Recv long-polls server-side for up to 30s waking instantly on a new
broker Sent event for this agent), peeks the remaining inbox depth with
Status, and emits TurnStart { from, body, unread }. The wake prompt
piped to claude includes a one-line ({unread} more pending — drain via …)
hint when unread > 0. Claude drives any further recv/send itself.
Tool envelope (mcp::run_tool_envelope): every MCP tool handler logs
the request, runs the body, logs the result. Pre-/post-log only; the old
[status] N unread message(s) appendage was removed once unread moved
into the wake prompt + UI header. New tools call this helper.
Tool whitelist (mcp::ALLOWED_BUILTIN_TOOLS):
- Allowed built-ins:
Bash,Edit,Glob,Grep,Read,TodoWrite,Write. - Denied by omission:
WebFetch,WebSearch,Task,NotebookEdit. - Allowed MCP tools: as listed above per flavor.
Bash is on the allow-list pending a finer-grained pattern allow-list
(Bash(git *)-style) — see TODO.md.
Live view. Each agent runs an events::Bus (a
tokio::sync::broadcast<LiveEvent> wrapper). The harness emits
TurnStart { from, body, unread }, Stream(value) (one per parsed
stream-json line), Note, TurnEnd { ok, note }. The web UI subscribes
via /events/stream (SSE) and a JS panel (terminal-themed: Crust bg, inset
shadow, monospace) renders rows:
TurnStart→◆ TURN ← <from> · N unreadheader + indented body.Streamtool_use→→ Read /path/→ Bash $ cmd/→ send → operator: "..."etc., per-tool pretty rather than raw JSON.Streamtool_resultshort → flat← ...; long → collapsed<details>▸ ← Nl · headline(click to expand full body).Streamthinking→ shows the thinking text if claude provided one, otherwise the bare· thinking …indicator.Streamsystem init,result,rate_limit_eventare dropped — too noisy andTurnEndalready says the turn finished.Note→· text.TurnEnd→✓ turn ok/✗ turn fail — noteand triggers arefreshState()so the page form view reflects state transitions (e.g. login just landed).
The operator send form sits below the live panel, so the tail is what you read first.
Manager (hm1nd) is hive-c0re-managed
The manager container runs through the same lifecycle as sub-agents.
On hive-c0re serve startup, if hm1nd is missing, hive-c0re creates it.
The manager's flake lives at /var/lib/hyperhive/applied/hm1nd/; its
proposed config at /var/lib/hyperhive/agents/hm1nd/config/. Manager can
edit its own agent.nix (visible inside the container at
/agents/hm1nd/config/) and submit request-apply-commit hm1nd <sha> for
operator approval.
Differences from sub-agents:
flake.nixextendshyperhive.nixosConfigurations.manager(vsagent-base).- Container name is
hm1nd(noh-prefix). - Fixed web UI port (
MANAGER_PORT = 8000). set_nspawn_flagsadds an extra bind:/var/lib/hyperhive/agents→/agents(RW), so the manager can edit per-agent proposed repos.- First-deploy spawn bypasses the approval queue (manager is required infrastructure).
- Per-agent socket lives at
/run/hyperhive/manager/, owned bymanager_server::start.
Migration note (for older hosts): drop any containers.hm1nd = { ... }
block from your host NixOS config. hyperhive creates and updates the
manager itself now.
Manager policy (from prompts/manager.md): the manager does NOT
rubber-stamp sub-agent config requests. It verifies (role match, package
legitimacy, cheaper alternative, blast radius) before committing +
calling request_apply_commit. For ambiguous cases or anything that
needs human signal, the manager calls ask_operator(question, options?)
which queues the question on the dashboard and returns the id
immediately; the operator's answer arrives later as
HelperEvent::OperatorAnswered in the manager inbox. Store at
hive-c0re::operator_questions (sqlite); answer flow:
POST /answer-question/{id} → OperatorQuestions::answer →
notify_manager(OperatorAnswered { ... }).
Helper events to the manager
Coordinator::notify_manager(&HelperEvent) enqueues an inbox message
from sender system with the event JSON in the body. The manager
harness no longer short-circuits these — they drive a regular claude
turn so the manager can react. Variants
(hive_sh4re::HelperEvent):
ApprovalResolved { id, agent, commit_ref, status, note }— fired byactions::approve+actions::denywhenever an approval transitions to its terminal state.Spawned { agent, ok, note }—actions::approve(Spawn-kind) + adminHostRequest::Spawn.Rebuilt { agent, ok, note }—auto_update::rebuild_agent(covers startup scan + manual/rebuildfrom dashboard) +actions::approve(ApplyCommit).Killed { agent }— adminHostRequest::Kill+ dashboard/kill.Destroyed { agent }—actions::destroy.OperatorAnswered { id, question, answer }—dashboard::post_answer_questionfires this after the operator submits the answer form for a question the manager queued viaask_operator.
To add a new event: new HelperEvent variant + call sites + update
prompts/manager.md so the manager knows the new shape.
Auto-update on startup
hive-c0re serve runs auto_update::run in a background task right after
opening the coordinator. It enumerates managed containers and rebuilds any
whose recorded hyperhive rev differs from the current one — sub-agents and
manager go through the same lifecycle::rebuild path. "Rev" = canonical
filesystem path of cfg.hyperhiveFlake. Marker file:
/var/lib/hyperhive/applied/.<name>.hyperhive-rev.
If the flake input has no canonical path (e.g. a github: URL),
auto-update is a no-op — rebuild manually.
The dashboard surfaces pending updates per agent: a clickable "needs update
↻" badge appears whenever the marker differs from current rev. The badge
POSTs /rebuild/<name>, calling the same auto_update::rebuild_agent
path so manual triggers and the startup scan can't drift. When at least
one container is stale, a top-level ↻ UPD4TE 4LL button appears that
loops over every stale container.
Dashboard action surface
Container row buttons (rendered per-state by assets/app.js):
- Always:
↻ R3BU1LD(callslifecycle::rebuild), and for sub-agentsDESTR0Y(container removed, state + creds kept). - Running:
↺ R3ST4RT+ (sub-agents only)■ ST0P. - Stopped:
▶ ST4RT. - Stale marker: clickable
needs update ↻badge (same target as rebuild but only shown when out of date).
Top of the containers list: ↻ UPD4TE 4LL (when any stale) + the
"R3QU3ST SP4WN" form for queuing a new agent through the approval flow.
Approval flow
End-to-end: manager edits per-agent proposed repo → commits → submits commit
sha → user approves on host CLI or dashboard button → hive-c0re reads the
file at that sha from proposed, applies into applied, commits there, runs
nixos-container update. Helper-event JSON (ApprovalResolved) lands in the
manager's inbox.
Two separate git repos per agent:
/var/lib/hyperhive/agents/<name>/config/ # proposed — manager edits, hive-c0re reads only
└── agent.nix # the only file the manager can change
# (initial commit by hive-c0re on first spawn,
# never touched by hive-c0re again)
/var/lib/hyperhive/applied/<name>/ # applied — hive-c0re-only; container builds here
├── flake.nix # auto-generated; references hyperhive_flake
└── agent.nix # overwritten by approve from the proposed commit
The container's --flake ref is <applied_dir>#default. The flake extends
hyperhive.nixosConfigurations.{agent-base|manager} with ./agent.nix plus
an inline module setting programs.git.config.user (committer identity =
the agent's name) and systemd.services.<harness>.environment (HIVE_PORT,
HIVE_LABEL, HIVE_DASHBOARD_PORT).