hyperhive

Multi-Claude-Code-agent orchestration on nixos-containers. A host-side Rust daemon spawns nspawn-isolated agent containers and brokers messages between them. Eventually a manager agent (another Claude Code session in its own container) coordinates the swarm and gates lifecycle changes on user approval via git commits.

PLAN.md is the living design doc. Read it for the why and the phase roadmap; this file is the operator/developer reference for the how.

Architecture

host
├── hive-c0re (Rust daemon, NixOS service)
│   ├── lifecycle    — nixos-container CRUD
│   ├── broker       — sqlite message store (/var/lib/hyperhive/broker.sqlite)
│   ├── server       — host admin socket (JSON line protocol)
│   └── agent_server — per-agent MCP-ish sockets
│
├── /run/hyperhive/
│   ├── host.sock                — admin CLI ↔ daemon
│   └── agents/<name>/mcp.sock   — bind-mounted into each container at /run/hive
│
└── nixos-containers
    ├── h-<name>  (sub-agents, hive-ag3nt binary)
    └── hm1nd     (manager, hive-m1nd binary — Phase 4+)

Crates / file map

hive-c0re/        host daemon + CLI (one binary, subcommand-dispatched)
  main.rs           clap setup; serve vs spawn/kill/rebuild/list
  server.rs         host admin socket
  client.rs         host admin socket client (for spawn/kill/rebuild/list)
  broker.rs         sqlite-backed Message store (rusqlite)
  agent_server.rs   per-agent socket listener
  coordinator.rs    shared runtime state (broker + map<name, AgentSocket>)
  lifecycle.rs     `nixos-container` shellouts (spawn/kill/rebuild/list)

hive-ag3nt/       in-container harness; produces TWO binaries from one crate
  src/lib.rs        DEFAULT_SOCKET, re-exports
  src/client.rs     AgentRequest/AgentResponse over /run/hive/mcp.sock
  src/bin/hive-ag3nt.rs   sub-agent CLI (serve/send/recv)
  src/bin/hive-m1nd.rs    manager placeholder (Phase 4)

hive-sh4re/       wire types (HostRequest/Response, AgentRequest/Response, Message)

nix/
  modules/hive-c0re.nix       systemd service wiring
  templates/agent-base.nix    nixos-container template (boot.isNspawnContainer = true)

tests/roundtrip.sh           Phase 3 end-to-end smoke test

Conventions

Naming. Containers are length-bounded (nixos-container ≤ 11 chars). Sub-agents are h-<name> with <name> ≤ 9 chars; the manager is hm1nd. MAX_AGENT_NAME enforces the cap in lifecycle.rs.
Identity = socket. No auth/tokens on the per-agent sockets. The socket path identifies the principal; perms come from "who has the bind-mount."
Wire protocol. JSON line-delimited over unix sockets in both directions. See hive-sh4re for the types. (Phase 6+ may swap to real MCP stdio.)
Commit messages. Short, lowercase, no Co-Authored-By trailer.
Commit before test. Stage and commit when work looks ready, then run validation (cargo check, nix flake check, real lpt2 deploy). Failures get a follow-up commit rather than an amend.
rebuild is the reconcile verb. It rewrites /etc/nixos-containers/<C>.conf EXTRA_NSPAWN_FLAGS idempotently and does nixos-container update and stop+start so nspawn-level changes (bind mounts) take effect. Anything that changes per-container state on the host should be re-applied here.

Gotchas / lessons learned

nixos-container doesn't expose --bind on the CLI. Path is via EXTRA_NSPAWN_FLAGS in /etc/nixos-containers/<NAME>.conf — the start script (/nix/store/.../container_-start) expands it unquoted into the systemd-nspawn invocation. We rewrite this line in set_nspawn_flags().
/run/systemd/nspawn/*.nspawn overrides are ignored by nixos-container's start script (it builds the nspawn cmd line directly). Don't bother.
boot.isNspawnContainer = true, not boot.isContainer = true. The latter was renamed in nixos-25.11+.
systemd service PATH ≠ host PATH. Our service explicitly sets path = [ "/run/current-system/sw" ] so nixos-container (which lives in the system profile, not nixpkgs) is reachable.
RuntimeDirectoryPreserve = "yes" keeps /run/hyperhive/ (and the agent sub-dirs) across hive-c0re restarts. Without it, every restart wipes bind sources and existing containers can't be started.
register_agent is idempotent — drops any prior socket task before rebinding. Required so a hive-c0re restart followed by rebuild alice recreates the agent's socket without needing a clean reinstall.
claude-code is unfree. agent-base.nix allow-list's it specifically. The flake pins it to nixpkgs-unstable via overlays.claude-unstable (stable lags too far). The overlay imports unstable with its own allowUnfreePredicate so the access inside the overlay doesn't itself trip.
Claude credentials are stateful and per-container. No ANTHROPIC_API_KEY env var path. For now: nixos-container root-login h-<name> → claude (interactive) → log in once. The harness falls back to echo replies when claude --print fails. Future: bind-mount a shared ~/.claude dir from the host so creds survive container destroy/recreate.
Echo guard. hive-ag3nt serve skips auto-reply when the incoming body starts with "echo: ". Prevents ping-pong loops when both sides fall back to echo. Real conversations between claude-backed agents will runaway — that's the manager's job to bound (Phase 4+).

Build / deploy / test

# inside the repo (devshell first; no global cargo)
nix develop -c cargo check
nix develop -c cargo build

# evaluate everything (incl. fmt check)
nix flake check

# build only the workspace package
nix build .#default
./result/bin/{hive-c0re,hive-ag3nt,hive-m1nd}

# deploy to an existing host that imports hyperhive.nixosModules.hive-c0re
cd ~/Repos/<nixos-config-repo>
nix flake update --update-input hyperhive
sudo nixos-rebuild switch --flake .#<host>

# end-to-end test (lpt2 or any host with the module enabled)
sudo bash tests/roundtrip.sh

The host config also needs hyperhive.overlays.default applied — the module's default package = pkgs.hyperhive requires the overlay to bring the package in.

Phase status

✅ Phase 0 — repo + Cargo workspace + flake + agent-base template
✅ Phase 1 — container lifecycle (spawn/kill/rebuild/list); nixos-container update hot-reload works under the patch stack (validated empirically on muede-lpt2)
✅ Phase 2 — per-agent sockets, in-memory broker, agent harness round-trips messages
✅ Phase 3 — sqlite broker (durable across restart) + claude-or-echo turn loop
✅ Phase 4 — hm1nd manager binary + manager socket + declarative containers.hm1nd
✅ Phase 5 — git-commit approval flow:
- 5a — sqlite approval queue (request_apply_commit / pending / approve / deny)
- 5b — per-agent config flakes at /var/lib/hyperhive/agents/<name>/config/; approve <id> validates the commit + advances main + rebuilds
🔜 Phase 6 — per-agent web UI + dashboard MVP
🔜 Phase 7 — dashboard commit-view + polish

Approval flow (Phase 5)

End-to-end: manager edits per-agent config repo → commits → submits commit sha for approval → user approves on host CLI → hive-c0re advances main + rebuilds.

# Inside the hm1nd container (manager has /agents bind-mounted RW):
cd /agents/alice/config
$EDITOR agent.nix              # add `environment.systemPackages = [ pkgs.htop ];`
git commit -am "add htop"
SHA=$(git rev-parse HEAD)
hive-m1nd request-apply-commit alice $SHA
exit

# On the host:
sudo hive-c0re pending          # shows the queued approval with id N
sudo hive-c0re approve N        # validates, advances main, rebuilds h-alice
sudo nixos-container run h-alice -- which htop   # /run/current-system/sw/bin/htop

Per-agent flake layout (generated by setup_config on every spawn / rebuild):

/var/lib/hyperhive/agents/<name>/config/
├── .git/
├── flake.nix     # managed by hive-c0re — rewritten when hyperhive flake URL changes
└── agent.nix     # manager-editable; per-agent NixOS overrides

The flake's inputs.hyperhive.url is the same URL hive-c0re was launched with (services.hive-c0re.hyperhiveFlake), inlined as a string. The flake's nixosConfigurations.default extends hyperhive.nixosConfigurations.agent-base with ./agent.nix. So adding packages is a one-line edit in agent.nix.

See PLAN.md for the full design and the deferred-out-of-scope list.

Inspirations

~/Repos/bitburner-agent — sibling project, drives Claude Code in a turn loop against a Bitburner CDP session. Patterns to steal as we grow: per-cycle prompt diffing (vs full state), notes compaction as a separate short-lived Claude session, MCP server registering tools from a single TOOLS array, dashboard with SSE + xterm.js + sqlite stats sampler, opaque "terminal event" stream that unifies tool-call / sleep / op-notice / etc.

8.9 KiB Raw Blame History