8.9 KiB
hyperhive
Multi-Claude-Code-agent orchestration on nixos-containers. A host-side Rust daemon spawns nspawn-isolated agent containers and brokers messages between them. Eventually a manager agent (another Claude Code session in its own container) coordinates the swarm and gates lifecycle changes on user approval via git commits.
PLAN.md is the living design doc. Read it for the why and the phase roadmap; this file is the operator/developer reference for the how.
Architecture
host
├── hive-c0re (Rust daemon, NixOS service)
│ ├── lifecycle — nixos-container CRUD
│ ├── broker — sqlite message store (/var/lib/hyperhive/broker.sqlite)
│ ├── server — host admin socket (JSON line protocol)
│ └── agent_server — per-agent MCP-ish sockets
│
├── /run/hyperhive/
│ ├── host.sock — admin CLI ↔ daemon
│ └── agents/<name>/mcp.sock — bind-mounted into each container at /run/hive
│
└── nixos-containers
├── h-<name> (sub-agents, hive-ag3nt binary)
└── hm1nd (manager, hive-m1nd binary — Phase 4+)
Crates / file map
hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
main.rs clap setup; serve vs spawn/kill/rebuild/list
server.rs host admin socket
client.rs host admin socket client (for spawn/kill/rebuild/list)
broker.rs sqlite-backed Message store (rusqlite)
agent_server.rs per-agent socket listener
coordinator.rs shared runtime state (broker + map<name, AgentSocket>)
lifecycle.rs `nixos-container` shellouts (spawn/kill/rebuild/list)
hive-ag3nt/ in-container harness; produces TWO binaries from one crate
src/lib.rs DEFAULT_SOCKET, re-exports
src/client.rs AgentRequest/AgentResponse over /run/hive/mcp.sock
src/bin/hive-ag3nt.rs sub-agent CLI (serve/send/recv)
src/bin/hive-m1nd.rs manager placeholder (Phase 4)
hive-sh4re/ wire types (HostRequest/Response, AgentRequest/Response, Message)
nix/
modules/hive-c0re.nix systemd service wiring
templates/agent-base.nix nixos-container template (boot.isNspawnContainer = true)
tests/roundtrip.sh Phase 3 end-to-end smoke test
Conventions
- Naming. Containers are length-bounded (
nixos-container≤ 11 chars). Sub-agents areh-<name>with<name>≤ 9 chars; the manager ishm1nd.MAX_AGENT_NAMEenforces the cap inlifecycle.rs. - Identity = socket. No auth/tokens on the per-agent sockets. The socket path identifies the principal; perms come from "who has the bind-mount."
- Wire protocol. JSON line-delimited over unix sockets in both directions.
See
hive-sh4refor the types. (Phase 6+ may swap to real MCP stdio.) - Commit messages. Short, lowercase, no Co-Authored-By trailer.
- Commit before test. Stage and commit when work looks ready, then run
validation (
cargo check,nix flake check, real lpt2 deploy). Failures get a follow-up commit rather than an amend. rebuildis the reconcile verb. It rewrites/etc/nixos-containers/<C>.confEXTRA_NSPAWN_FLAGS idempotently and doesnixos-container updateand stop+start so nspawn-level changes (bind mounts) take effect. Anything that changes per-container state on the host should be re-applied here.
Gotchas / lessons learned
nixos-containerdoesn't expose--bindon the CLI. Path is viaEXTRA_NSPAWN_FLAGSin/etc/nixos-containers/<NAME>.conf— the start script (/nix/store/.../container_-start) expands it unquoted into thesystemd-nspawninvocation. We rewrite this line inset_nspawn_flags()./run/systemd/nspawn/*.nspawnoverrides are ignored bynixos-container's start script (it builds the nspawn cmd line directly). Don't bother.boot.isNspawnContainer = true, notboot.isContainer = true. The latter was renamed in nixos-25.11+.- systemd service PATH ≠ host PATH. Our service explicitly sets
path = [ "/run/current-system/sw" ]sonixos-container(which lives in the system profile, not nixpkgs) is reachable. RuntimeDirectoryPreserve = "yes"keeps/run/hyperhive/(and the agent sub-dirs) acrosshive-c0rerestarts. Without it, every restart wipes bind sources and existing containers can't be started.register_agentis idempotent — drops any prior socket task before rebinding. Required so ahive-c0rerestart followed byrebuild alicerecreates the agent's socket without needing a clean reinstall.claude-codeis unfree.agent-base.nixallow-list's it specifically. The flake pins it to nixpkgs-unstable viaoverlays.claude-unstable(stable lags too far). The overlay imports unstable with its ownallowUnfreePredicateso the access inside the overlay doesn't itself trip.- Claude credentials are stateful and per-container. No
ANTHROPIC_API_KEYenv var path. For now:nixos-container root-login h-<name>→claude(interactive) → log in once. The harness falls back to echo replies whenclaude --printfails. Future: bind-mount a shared~/.claudedir from the host so creds survive container destroy/recreate. - Echo guard.
hive-ag3nt serveskips auto-reply when the incoming body starts with"echo: ". Prevents ping-pong loops when both sides fall back to echo. Real conversations between claude-backed agents will runaway — that's the manager's job to bound (Phase 4+).
Build / deploy / test
# inside the repo (devshell first; no global cargo)
nix develop -c cargo check
nix develop -c cargo build
# evaluate everything (incl. fmt check)
nix flake check
# build only the workspace package
nix build .#default
./result/bin/{hive-c0re,hive-ag3nt,hive-m1nd}
# deploy to an existing host that imports hyperhive.nixosModules.hive-c0re
cd ~/Repos/<nixos-config-repo>
nix flake update --update-input hyperhive
sudo nixos-rebuild switch --flake .#<host>
# end-to-end test (lpt2 or any host with the module enabled)
sudo bash tests/roundtrip.sh
The host config also needs hyperhive.overlays.default applied — the module's
default package = pkgs.hyperhive requires the overlay to bring the package
in.
Phase status
- ✅ Phase 0 — repo + Cargo workspace + flake + agent-base template
- ✅ Phase 1 — container lifecycle (spawn/kill/rebuild/list); nixos-container update hot-reload works under the patch stack (validated empirically on muede-lpt2)
- ✅ Phase 2 — per-agent sockets, in-memory broker, agent harness round-trips messages
- ✅ Phase 3 — sqlite broker (durable across restart) + claude-or-echo turn loop
- ✅ Phase 4 —
hm1ndmanager binary + manager socket + declarativecontainers.hm1nd - ✅ Phase 5 — git-commit approval flow:
- 5a — sqlite approval queue (
request_apply_commit/pending/approve/deny) - 5b — per-agent config flakes at
/var/lib/hyperhive/agents/<name>/config/;approve <id>validates the commit + advancesmain+ rebuilds
- 5a — sqlite approval queue (
- 🔜 Phase 6 — per-agent web UI + dashboard MVP
- 🔜 Phase 7 — dashboard commit-view + polish
Approval flow (Phase 5)
End-to-end: manager edits per-agent config repo → commits → submits commit sha
for approval → user approves on host CLI → hive-c0re advances main + rebuilds.
# Inside the hm1nd container (manager has /agents bind-mounted RW):
cd /agents/alice/config
$EDITOR agent.nix # add `environment.systemPackages = [ pkgs.htop ];`
git commit -am "add htop"
SHA=$(git rev-parse HEAD)
hive-m1nd request-apply-commit alice $SHA
exit
# On the host:
sudo hive-c0re pending # shows the queued approval with id N
sudo hive-c0re approve N # validates, advances main, rebuilds h-alice
sudo nixos-container run h-alice -- which htop # /run/current-system/sw/bin/htop
Per-agent flake layout (generated by setup_config on every spawn / rebuild):
/var/lib/hyperhive/agents/<name>/config/
├── .git/
├── flake.nix # managed by hive-c0re — rewritten when hyperhive flake URL changes
└── agent.nix # manager-editable; per-agent NixOS overrides
The flake's inputs.hyperhive.url is the same URL hive-c0re was launched with
(services.hive-c0re.hyperhiveFlake), inlined as a string. The flake's
nixosConfigurations.default extends hyperhive.nixosConfigurations.agent-base
with ./agent.nix. So adding packages is a one-line edit in agent.nix.
See PLAN.md for the full design and the deferred-out-of-scope list.
Inspirations
~/Repos/bitburner-agent— sibling project, drives Claude Code in a turn loop against a Bitburner CDP session. Patterns to steal as we grow: per-cycle prompt diffing (vs full state), notes compaction as a separate short-lived Claude session, MCP server registering tools from a singleTOOLSarray, dashboard with SSE + xterm.js + sqlite stats sampler, opaque "terminal event" stream that unifies tool-call / sleep / op-notice / etc.