13 KiB
hyperhive — developer reference
Operator + dev notes: conventions, gotchas, per-subsystem design.
File map
hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
src/main.rs clap setup; serve / spawn / kill / rebuild / list /
pending / approve / deny / destroy / request-spawn
src/server.rs host admin socket (HostRequest → dispatch)
src/client.rs admin-socket client
src/manager_server.rs manager-privileged socket (ManagerRequest)
src/agent_server.rs per-sub-agent socket listener (long-poll Recv)
src/broker.rs sqlite Message store + broadcast channel for SSE
src/approvals.rs sqlite Approval queue + kinds
src/coordinator.rs shared state (broker/approvals/transient/sockets)
src/actions.rs approve/deny/destroy
src/auto_update.rs startup rebuild scan + ensure_manager
src/lifecycle.rs `nixos-container` shellouts, per-agent flake generator
src/dashboard.rs axum HTTP UI: containers, approvals, async-form actions
assets/ CSS + JS shipped via include_str!
hive-ag3nt/ in-container harness crate; produces TWO binaries
src/lib.rs re-exports + DEFAULT_SOCKET, DEFAULT_WEB_PORT
src/client.rs generic JSON-line request/response over unix socket
src/web_ui.rs per-container axum HTTP page
src/events.rs LiveEvent + broadcast Bus for the SSE stream
src/turn.rs claude --print + stream-json pump; --compact retry
src/mcp.rs embedded MCP server (rmcp): AgentServer + ManagerServer
src/login.rs probe /root/.claude/ for a valid session
src/login_session.rs drives `claude auth login` over stdio pipes
src/bin/hive-ag3nt.rs sub-agent main
src/bin/hive-m1nd.rs manager main
assets/ CSS + JS for the per-agent UI
hive-sh4re/ wire types (HostRequest/Response, AgentRequest/Response,
ManagerRequest/Response, Message, Approval, HelperEvent)
nix/
modules/hive-c0re.nix systemd service + firewall + git wiring
templates/harness-base.nix shared scaffolding for sub-agents + manager
templates/agent-base.nix sub-agent nixosConfiguration
templates/manager.nix manager nixosConfiguration
Conventions
- Naming. Containers are length-bounded (
nixos-container≤ 11 chars). Sub-agents areh-<name>with<name>≤ 9 chars; the manager ishm1nd.MAX_AGENT_NAMEenforces the cap inlifecycle.rs. Per-agent web UI port =WEB_PORT_BASE + FNV1a(name) % WEB_PORT_RANGE(8100..8999); manager fixed at 8000; dashboardcfg.dashboardPort(default 7000). - Identity = socket. No auth/tokens on the per-agent sockets. The socket path identifies the principal; perms come from "who has the bind-mount."
- Wire protocol. JSON line-delimited over unix sockets in both directions
(host admin / manager / agent).
/messages/streamistext/event-stream. - Commit messages. Short, lowercase, no Co-Authored-By trailer.
- Commit before test. Stage and commit when work looks ready, then run
validation (
cargo check,nix flake check, real lpt2 deploy). Failures get a follow-up commit rather than an amend. rebuildis the reconcile verb. Idempotently rewrites/etc/nixos-containers/<C>.conf(PRIVATE_NETWORK=0, clears HOST_ADDRESS/LOCAL_ADDRESS, setsEXTRA_NSPAWN_FLAGS), regeneratesapplied/<name>/flake.nix, writes the systemd limits drop-in, thennixos-container update+ stop + start. Anything that changes per-container state on the host should be re-applied here.- Actions are factored.
approve/deny/destroylive inactions.rs; the admin socket and the dashboard POST handlers both call into them so the two surfaces never drift. - Async forms. Dashboard mutating forms carry
data-async; theassets/async_forms.jshelper intercepts submit, shows a spinner, and fetches withapplication/x-www-form-urlencoded(axumFormextractor rejects multipart). New mutating forms should adddata-asyncand optionallydata-confirm.
Gotchas / lessons learned
nixos-containerdoesn't expose--bindon the CLI. Path is viaEXTRA_NSPAWN_FLAGSin/etc/nixos-containers/<NAME>.conf— the start script (/nix/store/.../container_-start) expands it unquoted into thesystemd-nspawninvocation. We rewrite this line inset_nspawn_flags()./run/systemd/nspawn/*.nspawnoverrides are ignored bynixos-container's start script (it builds the nspawn cmd line directly).boot.isNspawnContainer = true, notboot.isContainer = true. Renamed in nixos-25.11+.nixos-container createauto-assignsHOST_ADDRESS/LOCAL_ADDRESSin the.conf. The start script'sif HOST_ADDRESS set → --network-vethbranch then forces a private netns — which is silently fatal for our web UIs (the bind is invisible from the host). We force-clear those vars (andHOST_ADDRESS6/LOCAL_ADDRESS6/HOST_BRIDGE) plus setPRIVATE_NETWORK=0.- systemd service PATH ≠ host PATH. The hive-c0re service sets
path = [ pkgs.git "/run/current-system/sw" ]. In-container harness services do the same so anything an agent adds to its ownagent.nix(environment.systemPackages) is visible to claude's Bash tool without editing the service definition.environment.HYPERHIVE_GITbakes git's absolute path in (read bylifecycle::git_command()) for the host. RuntimeDirectoryPreserve = "yes"keeps/run/hyperhive/(and the per-agent sub-dirs) acrosshive-c0rerestarts. Without it, every restart wipes bind sources and existing containers can't be started.register_agentis idempotent — drops any prior socket task before rebinding. Required so ahive-c0rerestart followed byrebuild alicerecreates the agent's socket without needing a clean reinstall.claude-codeis unfree.harness-base.nixallow-list's it specifically. The flake pins it to nixpkgs-unstable viaoverlays.claude-unstable(stable lags too far). The overlay imports unstable with its ownallowUnfreePredicateso the access inside the overlay doesn't itself trip.- Claude credentials are per-agent.
/var/lib/hyperhive/agents/<name>/claude/bind-mounts to/root/.claude(RW). Sharing one dir across agents is NOT viable — OAuth refresh tokens rotate, so any sibling refresh invalidates all the others. Login flow runs from the per-agent web UI; creds persist acrossdestroy/recreate. - Orphan approvals. If state dirs are wiped out from under a pending
approval (test scripts, manual
rm -rf), the dashboard's next render marks themfailedwith note"agent state dir missing"so they fall out ofpending. They stay in sqlite for audit.
Agent MCP surface + turn loop
The harness ships an embedded MCP server (rmcp 1.7) that claude launches as
a stdio child via --mcp-config. Subcommand: hive-ag3nt mcp (or
hive-m1nd mcp for the manager surface).
Sub-agent tools:
mcp__hyperhive__send(to, body)— message a peer or the operator.mcp__hyperhive__recv()— drain one inbox message.
Manager additionally:
mcp__hyperhive__request_spawn(name)— queue Spawn approval.mcp__hyperhive__kill(name)— graceful stop.mcp__hyperhive__request_apply_commit(agent, commit_ref)— submit a config change for any agent (includinghm1ndfor self-mods).
The shared per-turn plumbing lives in hive_ag3nt::turn::{write_mcp_config, run_turn, drive_turn, emit_turn_end, wait_for_login} so the two binaries
can't drift.
Each turn:
claude --print --verbose --output-format stream-json --model haiku \
--continue --settings '{"autoCompactEnabled":false,"autoMemoryEnabled":false}' \
--mcp-config <path> --strict-mcp-config \
--tools <builtins> --allowedTools <builtins+mcp>
# prompt piped over stdin
--continue keeps a persistent session per agent (claude stores sessions in
~/.claude/projects/, which is bind-mounted persistently). Auto-compact and
auto-memory are disabled because hyperhive owns compaction (/compact on
overflow, retry once).
Loop control. The harness pops one inbox message per cycle (the wake
signal — Recv long-polls server-side for up to 30s waking instantly on a new
broker Sent event for this agent) and hands claude a prompt naming the
agent, the sender, the body, and the MCP tools. Claude drives any further
recv/send itself.
Tool envelope (mcp::run_tool_envelope): every MCP tool handler logs
the request, runs the body, appends a status line (e.g.
[status] 3 unread message(s) in inbox from a non-mutating Status peek),
logs the result. New tools call this helper.
Tool whitelist (mcp::ALLOWED_BUILTIN_TOOLS):
- Allowed built-ins:
Bash,Edit,Glob,Grep,Read,TodoWrite,Write. - Denied by omission:
WebFetch,WebSearch,Task,NotebookEdit. - Allowed MCP tools: as listed above per flavor.
Bash is on the allow-list pending a finer-grained pattern allow-list
(Bash(git *)-style) — see TODO.md.
Live view. Each agent runs an events::Bus (a
tokio::sync::broadcast<LiveEvent> wrapper). The harness emits
TurnStart, Stream(value) (one per parsed stream-json line), Note,
TurnEnd. The web UI subscribes via /events/stream (SSE) and a JS panel
appends rows. No full-page reload — operator input stays put.
Manager (hm1nd) is hive-c0re-managed
The manager container runs through the same lifecycle as sub-agents.
On hive-c0re serve startup, if hm1nd is missing, hive-c0re creates it.
The manager's flake lives at /var/lib/hyperhive/applied/hm1nd/; its
proposed config at /var/lib/hyperhive/agents/hm1nd/config/. Manager can
edit its own agent.nix (visible inside the container at
/agents/hm1nd/config/) and submit request-apply-commit hm1nd <sha> for
operator approval.
Differences from sub-agents:
flake.nixextendshyperhive.nixosConfigurations.manager(vsagent-base).- Container name is
hm1nd(noh-prefix). - Fixed web UI port (
MANAGER_PORT = 8000). set_nspawn_flagsadds an extra bind:/var/lib/hyperhive/agents→/agents(RW), so the manager can edit per-agent proposed repos.- First-deploy spawn bypasses the approval queue (manager is required infrastructure).
- Per-agent socket lives at
/run/hyperhive/manager/, owned bymanager_server::start.
Migration note (for older hosts): drop any containers.hm1nd = { ... }
block from your host NixOS config. hyperhive creates and updates the
manager itself now.
Auto-update on startup
hive-c0re serve runs auto_update::run in a background task right after
opening the coordinator. It enumerates managed containers and rebuilds any
whose recorded hyperhive rev differs from the current one — sub-agents and
manager go through the same lifecycle::rebuild path. "Rev" = canonical
filesystem path of cfg.hyperhiveFlake. Marker file:
/var/lib/hyperhive/applied/.<name>.hyperhive-rev.
If the flake input has no canonical path (e.g. a github: URL),
auto-update is a no-op — rebuild manually.
The dashboard surfaces pending updates per agent: a clickable "needs update
↻" badge appears whenever the marker differs from current rev. The badge
POSTs /rebuild/<name>, calling the same auto_update::rebuild_agent
path so manual triggers and the startup scan can't drift.
Approval flow
End-to-end: manager edits per-agent proposed repo → commits → submits commit
sha → user approves on host CLI or dashboard button → hive-c0re reads the
file at that sha from proposed, applies into applied, commits there, runs
nixos-container update. Helper-event JSON (ApprovalResolved) lands in the
manager's inbox.
Two separate git repos per agent:
/var/lib/hyperhive/agents/<name>/config/ # proposed — manager edits, hive-c0re reads only
└── agent.nix # the only file the manager can change
# (initial commit by hive-c0re on first spawn,
# never touched by hive-c0re again)
/var/lib/hyperhive/applied/<name>/ # applied — hive-c0re-only; container builds here
├── flake.nix # auto-generated; references hyperhive_flake
└── agent.nix # overwritten by approve from the proposed commit
The container's --flake ref is <applied_dir>#default. The flake extends
hyperhive.nixosConfigurations.{agent-base|manager} with ./agent.nix plus
an inline module setting programs.git.config.user (committer identity =
the agent's name) and systemd.services.<harness>.environment (HIVE_PORT,
HIVE_LABEL, HIVE_DASHBOARD_PORT).