21 KiB
hyperhive
Multi-Claude-Code-agent orchestration on nixos-containers. A host-side Rust
daemon (hive-c0re) spawns nspawn-isolated agent containers and brokers
messages between them. A manager agent (hm1nd) coordinates the swarm and
gates lifecycle changes on user approval via git commits, surfaced through a
vibec0re-styled HTTP dashboard with live SSE message-flow.
PLAN.md is the living design doc. Read it for the why and the phase roadmap; this file is the operator/developer reference for the how.
Architecture
host (NixOS, hive-c0re.service)
│
├── hive-c0re (Rust daemon — coordinator + dashboard + CLI)
│ ├── lifecycle — nixos-container CRUD (spawn/kill/rebuild/list)
│ ├── broker — sqlite message store + broadcast channel
│ ├── approvals — sqlite approval queue
│ ├── coordinator — shared state (broker/approvals/agent sockets)
│ ├── actions — approve/deny (shared between admin socket & dashboard)
│ ├── server — host admin socket (JSON line protocol)
│ ├── manager_server — manager-only privileged socket
│ ├── agent_server — per-sub-agent sockets
│ ├── dashboard — axum HTTP UI + SSE message-flow + approve/deny + T4LK
│ └── client — admin-socket client (powers `hive-c0re spawn|kill|…`)
│
├── /run/hyperhive/
│ ├── host.sock — admin CLI ↔ daemon
│ ├── manager.sock → hm1nd container at /run/hive/mcp.sock
│ └── agents/<name>/mcp.sock → h-<name> container at /run/hive/mcp.sock
│
├── /var/lib/hyperhive/
│ ├── broker.sqlite — messages + approvals tables
│ ├── agents/<name>/config/ — proposed repo (manager-editable, RO to hive-c0re)
│ └── applied/<name>/ — applied repo (hive-c0re-only, container builds here)
│
└── nixos-containers
├── h-<name> (sub-agents, hive-ag3nt binary)
└── hm1nd (manager, hive-m1nd binary)
Crates / file map
hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
src/main.rs clap setup; serve / spawn / kill / rebuild / list /
pending / approve / deny
src/server.rs host admin socket (HostRequest → dispatch)
src/client.rs admin-socket client
src/manager_server.rs manager-privileged socket (ManagerRequest)
src/agent_server.rs per-sub-agent socket listener
src/broker.rs sqlite Message store + broadcast channel for SSE
src/approvals.rs sqlite Approval queue
src/coordinator.rs shared state (broker/approvals/agent_flake/sockets)
src/actions.rs approve/deny (admin socket + dashboard both call in)
src/lifecycle.rs `nixos-container` shellouts, per-agent flake generator,
systemd drop-ins, git helpers, agent_web_port hash
src/dashboard.rs axum HTTP UI: containers list, T4LK form, approvals
(diff + Approve/Deny buttons), SSE message flow
hive-ag3nt/ in-container harness crate; produces TWO binaries
src/lib.rs DEFAULT_SOCKET, DEFAULT_WEB_PORT, re-exports
src/client.rs generic JSON-line request/response over unix socket
src/web_ui.rs per-container axum HTTP page (label + placeholder)
src/bin/hive-ag3nt.rs sub-agent CLI (serve/send/recv); turn loop + web UI
src/bin/hive-m1nd.rs manager CLI (serve/send/recv/spawn/kill/
request-apply-commit); recognises HelperEvent
hive-sh4re/ wire types (HostRequest/Response, AgentRequest/Response,
ManagerRequest/Response, Message, Approval, HelperEvent)
nix/
modules/hive-c0re.nix systemd service + firewall + git path wiring
templates/agent-base.nix sub-agent nixos-container template
templates/manager.nix manager nixos-container template
tests/roundtrip.sh Phase 3 messaging round-trip
tests/approval.sh Phase 5 end-to-end approval flow
tests/dashboard.sh Phase 6+7 HTTP dashboard + SSE + orphan GC
docs/damocles-migration.md options for moving damocles onto hyperhive
Conventions
- Naming. Containers are length-bounded (
nixos-container≤ 11 chars). Sub-agents areh-<name>with<name>≤ 9 chars; the manager ishm1nd.MAX_AGENT_NAMEenforces the cap inlifecycle.rs. Per-agent web UI port =WEB_PORT_BASE + FNV1a(name) % WEB_PORT_RANGE(8100..8999); manager fixed at 8000; dashboardcfg.dashboardPort(default 7000). - Identity = socket. No auth/tokens on the per-agent sockets. The socket path identifies the principal; perms come from "who has the bind-mount."
- Wire protocol. JSON line-delimited over unix sockets in both directions
(host admin / manager / agent).
/messages/streamistext/event-stream. - Commit messages. Short, lowercase, no Co-Authored-By trailer.
- Commit before test. Stage and commit when work looks ready, then run
validation (
cargo check,nix flake check, real lpt2 deploy). Failures get a follow-up commit rather than an amend. rebuildis the reconcile verb. Idempotently rewrites/etc/nixos-containers/<C>.conf(PRIVATE_NETWORK=0, clears HOST_ADDRESS/LOCAL_ADDRESS, setsEXTRA_NSPAWN_FLAGS), regeneratesapplied/<name>/flake.nix, writes the systemd limits drop-in, thennixos-container update+ stop + start. Anything that changes per-container state on the host should be re-applied here.- Actions are factored.
approve/denylive inactions.rs; the admin socket and the dashboard POST handlers both call into them, so the two surfaces never drift.
Gotchas / lessons learned
nixos-containerdoesn't expose--bindon the CLI. Path is viaEXTRA_NSPAWN_FLAGSin/etc/nixos-containers/<NAME>.conf— the start script (/nix/store/.../container_-start) expands it unquoted into thesystemd-nspawninvocation. We rewrite this line inset_nspawn_flags()./run/systemd/nspawn/*.nspawnoverrides are ignored bynixos-container's start script (it builds the nspawn cmd line directly).boot.isNspawnContainer = true, notboot.isContainer = true. Renamed in nixos-25.11+.nixos-container createauto-assignsHOST_ADDRESS/LOCAL_ADDRESSin the.conf. The start script'sif HOST_ADDRESS set → --network-vethbranch then forces a private netns — which is silently fatal for our web UIs (the bind is invisible from the host). We force-clear those vars (andHOST_ADDRESS6/LOCAL_ADDRESS6/HOST_BRIDGE) plus setPRIVATE_NETWORK=0.- systemd service PATH ≠ host PATH. Our service explicitly sets
path = [ pkgs.git "/run/current-system/sw" ]. Additionally,environment.HYPERHIVE_GIT = "${pkgs.git}/bin/git"bakes the absolute path in (read bylifecycle::git_command()) so git resolution doesn't depend on PATH plumbing at all. RuntimeDirectoryPreserve = "yes"keeps/run/hyperhive/(and the per-agent sub-dirs) acrosshive-c0rerestarts. Without it, every restart wipes bind sources and existing containers can't be started.register_agentis idempotent — drops any prior socket task before rebinding. Required so ahive-c0rerestart followed byrebuild alicerecreates the agent's socket without needing a clean reinstall.claude-codeis unfree.agent-base.nixallow-list's it specifically. The flake pins it to nixpkgs-unstable viaoverlays.claude-unstable(stable lags too far). The overlay imports unstable with its ownallowUnfreePredicateso the access inside the overlay doesn't itself trip.- Claude credentials are stateful and per-container. No
ANTHROPIC_API_KEYenv var path. Today's stopgap:nixos-container root-login h-<name>→claude(interactive) → log in once. The harness falls back to echo replies whenclaude --printfails. Phase 8 moves this to a per-agent persistent dir at/var/lib/hyperhive/agents/<name>/claude/bind-mounted into the container, with the interactive login driven from the agent's web UI. Sharing one~/.claudeacross agents is NOT viable — OAuth refresh tokens rotate, so any sibling refresh invalidates all the others. - Echo guard.
hive-ag3nt serveskips auto-reply when the incoming body starts with"echo: ". Prevents ping-pong loops when both sides fall back to echo. Real conversations between claude-backed agents will runaway — bounding them is the manager's job. - Orphan approvals. If state dirs are wiped out from under a pending
approval (test scripts, manual
rm -rf), the dashboard's next render marks themfailedwith note"agent state dir missing"so they fall out ofpending. They stay in sqlite for audit.
Agent MCP surface
The harness ships an embedded MCP server (rmcp 1.7) that claude can launch
via --mcp-config. Subcommand: hive-ag3nt mcp. Tools:
send(to, body)— message a peer or the operator.recv()— drain one inbox message.
Both translate to AgentRequest::Send/Recv against the agent's own
/run/hive/mcp.sock (the existing hyperhive socket). The MCP surface is
just claude's view of that socket — same authority, friendlier protocol.
Manager will get its own subcommand later with request_spawn, kill,
request_apply_commit added to the TOOLS list.
Manager (hm1nd) is hive-c0re-managed
The manager container runs through the same lifecycle as sub-agents —
no separate code path. On hive-c0re serve startup, if nixos-container list doesn't include hm1nd, hive-c0re creates it. The manager's flake
lives at /var/lib/hyperhive/applied/hm1nd/; its proposed (manager-editable)
config at /var/lib/hyperhive/agents/hm1nd/config/. Manager can edit its
own agent.nix (visible inside the container at /agents/hm1nd/config/),
commit, and submit request-apply-commit hm1nd <sha> for operator
approval — same flow as for sub-agents.
Differences from sub-agents:
flake.nixextendshyperhive.nixosConfigurations.manager(vsagent-base).- Container name is
hm1nd(noh-prefix). - Fixed web UI port (
MANAGER_PORT = 8000). set_nspawn_flagsadds an extra bind:/var/lib/hyperhive/agents→/agents(RW), so the manager can edit per-agent proposed repos.- First-deploy spawn bypasses the approval queue (manager is required infrastructure).
- Per-agent socket is the manager socket at
/run/hyperhive/manager/, owned bymanager_server::start.coordinator::ensure_runtimereturns that path for manager and the usual/run/hyperhive/agents/<name>/for the rest.
Migration note: drop any containers.hm1nd = { ... } block from your
host NixOS config. hyperhive creates and updates the manager itself now.
Auto-update on startup
hive-c0re serve runs auto_update::run in a background task right after
opening the coordinator. It enumerates managed containers and rebuilds any
whose recorded hyperhive rev differs from the current one:
- Sub-agents rebuild via
lifecycle::rebuild(regeneratesapplied/<name>/flake.nix, sets nspawn flags,nixos-container update --flake). - Manager runs
nixos-container update hm1nd(no--flake). The manager's config lives in the host's NixOS module; this is belt-and-braces on top of NixOS's own container activation. Idempotent when nothing has actually changed.
"Rev" = canonical filesystem path of cfg.hyperhiveFlake (so /etc/hyperhive
resolving to a new /nix/store/...-source triggers a rebuild). Marker file:
/var/lib/hyperhive/applied/.<name>.hyperhive-rev. If the flake input has
no canonical path (e.g. a github: URL), auto-update is a no-op — rebuild
manually. The task is async and never blocks the admin socket; failures are
logged and don't take the daemon down.
The dashboard surfaces pending updates per agent: a clickable "needs update
↻" badge appears whenever the marker differs from current rev. The badge
POSTs /rebuild/<name>, calling the same auto_update::rebuild_agent /
rebuild_manager path so manual triggers and the startup scan can't drift.
Build / deploy / test
# inside the repo (devshell first; no global cargo)
nix develop -c cargo check
nix develop -c cargo clippy --workspace --all-targets -- -D warnings
nix develop -c cargo build
# evaluate everything (incl. rust+nix+toml fmt + clippy)
nix flake check
# build only the workspace package
nix build .#default
./result/bin/{hive-c0re,hive-ag3nt,hive-m1nd}
# deploy to an existing host that imports hyperhive.nixosModules.hive-c0re
cd ~/Repos/<nixos-config-repo>
nix flake update --update-input hyperhive
sudo nixos-rebuild switch --flake .#<host>
sudo systemctl restart hive-c0re # if only env/options changed
# end-to-end tests (each idempotent; runs as root)
sudo bash tests/roundtrip.sh # alice ↔ bob echo round-trip
sudo bash tests/approval.sh # manager edit → request → user approve → rebuilt
sudo bash tests/dashboard.sh # HTTP UI, approve POST, SSE, orphan GC
The host config also needs hyperhive.overlays.default applied — the module's
default package = pkgs.hyperhive requires the overlay to bring the package
in. The claude-unstable overlay is applied internally to per-agent flakes
already.
Phase status
- ✅ Phase 0 — repo + Cargo workspace + flake + agent-base template
- ✅ Phase 1 — container lifecycle;
nixos-container updatehot-reload works under the patch stack (validated on muede-lpt2) - ✅ Phase 2 — per-agent sockets, in-memory broker, agent harness round-trips
- ✅ Phase 3 — sqlite broker (durable) + claude-or-echo turn loop
- ✅ Phase 4 —
hm1ndmanager binary + manager socket + declarativecontainers.hm1nd - ✅ Phase 5 — git-commit approval flow
- 5a — sqlite approval queue (
request_apply_commit/pending/approve/deny) - 5b — per-agent config flakes
- 5c — manager edits
proposed, hive-c0re writes-onlyapplied; container builds fromapplied. Approve = readagent.nixat the approved commit fromproposed, copy intoapplied, commit + rebuild. Manager cannot moveapplied/mainon its own.
- 5a — sqlite approval queue (
- ✅ Phase 6 — per-container web UIs (
HIVE_PORTdeterministic-hash) + hive-c0re dashboard (default 7000, vibec0re aesthetic, deep-linked) - ✅ Phase 7 — polish:
- 7a — dashboard Approve/Deny buttons + unified diff (
similarcrate) - 7b — broker broadcast +
/messages/streamSSE + live message-flow panel - 7c —
ApprovalResolvedhelper events into manager inbox - 7d —
MemoryMax=2G+CPUQuota=50%systemd drop-in per container - 7e — damocles migration plan (
docs/damocles-migration.md)
- 7a — dashboard Approve/Deny buttons + unified diff (
- ✅ Phase 7 follow-ups:
- Dashboard T4LK form — operator can send messages from the browser
(
POST /send, becomesfrom: "operator"broker message) - Orphan-approval GC on dashboard render (stale entries auto-failed)
PRIVATE_NETWORK=0+HOST_ADDRESS=/LOCAL_ADDRESS=cleared inset_nspawn_flagsso sub-agent web UI ports are reachable on the hostHYPERHIVE_GITenv var (absolute path) bypasses PATH ambiguity
- Dashboard T4LK form — operator can send messages from the browser
(
Phase 8 — real claude in containers + login UX (in progress)
See PLAN.md → "Phase 8" for the full design. Summary:
- Per-agent persistent creds dir. Bind
/var/lib/hyperhive/agents/<name>/claude/→/root/.claude(RW) inset_nspawn_flags. One OAuth lineage per agent; refresh rotations stay contained to that agent. - State dirs persist by default.
destroykeeps/var/lib/hyperhive/agents/<name>/unless the operator passes an explicit wipe flag. Recreating an agent of the same name reuses prior creds. - First spawn is approval-gated. New agent names go through the same
approval queue as config edits. Manager calls
RequestSpawn(CLI:hive-m1nd request-spawn <name>); operator can also queue from the dashboard orhive-c0re request-spawn <name>. The host's directhive-c0re spawn <name>still works as a privileged bypass for tests. Approve runslifecycle::spawnin a background task; the dashboard polls via<meta refresh>and renders a spinner row whilenixos-container create+update+startis in flight. - "needs login" partial-run state. No valid session in
~/.claude/→ harness binds the web UI but does NOT start the turn loop. The harness polls the dir; as soon as a login lands it transitions into the turn loop without a restart. Dashboard surfaces the state per-agent via aneeds loginbadge in the container list. "Valid session" today is a heuristic (any regular file inside/root/.claude/); we may refine once the filename layout claude writes is locked in. - Login from the per-agent web UI. Spawn
claude auth loginwith plain stdio pipes (no PTY initially), surface the OAuth URL from stdout on the page, accept the resulting code via a paste field, write it to the process stdin. Once~/.claude/populates, the existing needs-login polling loop flips state to Online and starts the turn loop — no separate signaling needed. The exact command is overridable viaHYPERHIVE_LOGIN_CMDso we can adjust without rebuilding. If pipes turn out to be insufficient (claude refuses without a TTY, raw-mode input, ANSI-only output) we redo the backend with a PTY (e.g.portable-pty).
Implementation order: bind-mount/dir creation → approval-gated spawn + spinner → "needs login" partial run → PTY login endpoint. The login UI has nowhere to live until the partial-run mode exists, so don't ship it earlier.
Approval flow
End-to-end: manager edits per-agent proposed repo → commits → submits commit
sha → user approves on host CLI or dashboard button → hive-c0re reads the
file at that sha from proposed, applies into applied, commits there, runs
nixos-container update. Helper-event JSON lands in the manager's inbox.
# Inside the hm1nd container (manager has /agents bind-mounted RW):
cd /agents/alice/config
$EDITOR agent.nix # e.g. environment.systemPackages = [ pkgs.htop ];
git commit -am "add htop"
SHA=$(git rev-parse HEAD)
hive-m1nd request-apply-commit alice $SHA
exit
# On the host (CLI):
sudo hive-c0re pending # shows queued approval with id N
sudo hive-c0re approve N # validates, applies, rebuilds
sudo nixos-container run h-alice -- which htop
# Or on the dashboard (browser):
http://<host>:7000/ # ◆ APPR0VE button next to the diff
Per-agent layout — two separate git repos:
/var/lib/hyperhive/agents/<name>/config/ # proposed — manager edits, hive-c0re reads only
├── .git/
└── agent.nix # the only file the manager can change
# (initial commit by hive-c0re on first spawn,
# never touched by hive-c0re again)
/var/lib/hyperhive/applied/<name>/ # applied — hive-c0re-only; container builds here
├── .git/
├── flake.nix # hive-c0re-managed; references hyperhive_flake
└── agent.nix # overwritten by approve from the proposed commit
The container's --flake ref is <applied_dir>#default. The flake's
nixosConfigurations.default extends hyperhive.nixosConfigurations.agent-base
with ./agent.nix plus an inline module that sets
environment.etc."gitconfig".text (committer identity = the agent's name) and
systemd.services.hive-ag3nt.environment.HIVE_PORT/HIVE_LABEL.
Polish backlog
Not phased — pick when relevant:
- Operator inbox view — drain replies addressed to
operatorand show in the dashboard (today they accumulate in sqlite unread). - Per-agent UI substance — show last N inbox messages, last turn timing, link back to dashboard.
- xterm.js terminal — embed in each per-container UI, attach to a PTY exposed by the harness.
destroyverb — currentlynixos-container destroy+ manualrm -rf. Should be one hive-c0re verb that also purges approvals + state dirs.- Bounded broker — cap rows per recipient or auto-vacuum delivered messages older than a threshold.
- Container crash events — watch
container@*.servicevia D-Bus, pushHelperEvent::ContainerCrashto the manager.
Inspirations
~/Repos/bitburner-agent— sibling project, drives Claude Code in a turn loop against a Bitburner CDP session. Patterns to steal as we grow: per-cycle prompt diffing (vs full state), notes compaction as a separate short-lived Claude session, MCP server registering tools from a singleTOOLSarray, dashboard with SSE + xterm.js + sqlite stats sampler, opaque "terminal event" stream that unifies tool-call / sleep / op-notice / etc.