hyperhive — developer reference

Operator + dev notes: conventions, gotchas, per-subsystem design.

High-level project intro: README.md.
Open work + backlog: TODO.md.

File map

hive-c0re/         host daemon + CLI (one binary, subcommand-dispatched)
  src/main.rs           clap setup; serve / spawn / kill / rebuild / list /
                         pending / approve / deny / destroy / request-spawn
  src/server.rs         host admin socket (HostRequest → dispatch)
  src/client.rs         admin-socket client
  src/manager_server.rs manager-privileged socket (ManagerRequest)
  src/agent_server.rs   per-sub-agent socket listener (long-poll Recv)
  src/broker.rs         sqlite Message store + broadcast channel for SSE
  src/approvals.rs      sqlite Approval queue + kinds
  src/coordinator.rs    shared state (broker/approvals/transient/sockets)
  src/actions.rs        approve/deny/destroy
  src/auto_update.rs    startup rebuild scan + ensure_manager
  src/lifecycle.rs      `nixos-container` shellouts, per-agent flake generator
  src/dashboard.rs      axum HTTP UI: containers, approvals, async-form actions
  assets/               CSS + JS shipped via include_str!

hive-ag3nt/        in-container harness crate; produces TWO binaries
  src/lib.rs            re-exports + DEFAULT_SOCKET, DEFAULT_WEB_PORT
  src/client.rs         generic JSON-line request/response over unix socket
  src/web_ui.rs         per-container axum HTTP page
  src/events.rs         LiveEvent + broadcast Bus for the SSE stream
  src/turn.rs           claude --print + stream-json pump; --compact retry
  src/mcp.rs            embedded MCP server (rmcp): AgentServer + ManagerServer
  src/login.rs          probe /root/.claude/ for a valid session
  src/login_session.rs  drives `claude auth login` over stdio pipes
  src/bin/hive-ag3nt.rs sub-agent main
  src/bin/hive-m1nd.rs  manager main
  assets/               CSS + JS for the per-agent UI

hive-sh4re/        wire types (HostRequest/Response, AgentRequest/Response,
                   ManagerRequest/Response, Message, Approval, HelperEvent)

nix/
  modules/hive-c0re.nix         systemd service + firewall + git wiring
  templates/harness-base.nix    shared scaffolding for sub-agents + manager
  templates/agent-base.nix      sub-agent nixosConfiguration
  templates/manager.nix         manager nixosConfiguration

Conventions

Naming. Containers are length-bounded (nixos-container ≤ 11 chars). Sub-agents are h-<name> with <name> ≤ 9 chars; the manager is hm1nd. MAX_AGENT_NAME enforces the cap in lifecycle.rs. Per-agent web UI port = WEB_PORT_BASE + FNV1a(name) % WEB_PORT_RANGE (8100..8999); manager fixed at 8000; dashboard cfg.dashboardPort (default 7000).
Identity = socket. No auth/tokens on the per-agent sockets. The socket path identifies the principal; perms come from "who has the bind-mount."
Wire protocol. JSON line-delimited over unix sockets in both directions (host admin / manager / agent). /messages/stream is text/event-stream.
Commit messages. Short, lowercase, no Co-Authored-By trailer.
Commit before test. Stage and commit when work looks ready, then run validation (cargo check, nix flake check, real lpt2 deploy). Failures get a follow-up commit rather than an amend.
rebuild is the reconcile verb. Idempotently rewrites /etc/nixos-containers/<C>.conf (PRIVATE_NETWORK=0, clears HOST_ADDRESS/LOCAL_ADDRESS, sets EXTRA_NSPAWN_FLAGS), regenerates applied/<name>/flake.nix, writes the systemd limits drop-in, then nixos-container update + stop + start. Anything that changes per-container state on the host should be re-applied here.
Actions are factored. approve / deny / destroy live in actions.rs; the admin socket and the dashboard POST handlers both call into them so the two surfaces never drift.
Async forms. Dashboard mutating forms carry data-async; the assets/async_forms.js helper intercepts submit, shows a spinner, and fetches with application/x-www-form-urlencoded (axum Form extractor rejects multipart). New mutating forms should add data-async and optionally data-confirm.

Gotchas / lessons learned

nixos-container doesn't expose --bind on the CLI. Path is via EXTRA_NSPAWN_FLAGS in /etc/nixos-containers/<NAME>.conf — the start script (/nix/store/.../container_-start) expands it unquoted into the systemd-nspawn invocation. We rewrite this line in set_nspawn_flags().
/run/systemd/nspawn/*.nspawn overrides are ignored by nixos-container's start script (it builds the nspawn cmd line directly).
boot.isNspawnContainer = true, not boot.isContainer = true. Renamed in nixos-25.11+.
nixos-container create auto-assigns HOST_ADDRESS/LOCAL_ADDRESS in the .conf. The start script's if HOST_ADDRESS set → --network-veth branch then forces a private netns — which is silently fatal for our web UIs (the bind is invisible from the host). We force-clear those vars (and HOST_ADDRESS6 / LOCAL_ADDRESS6 / HOST_BRIDGE) plus set PRIVATE_NETWORK=0.
systemd service PATH ≠ host PATH. The hive-c0re service sets path = [ pkgs.git "/run/current-system/sw" ]. In-container harness services do the same so anything an agent adds to its own agent.nix (environment.systemPackages) is visible to claude's Bash tool without editing the service definition. environment.HYPERHIVE_GIT bakes git's absolute path in (read by lifecycle::git_command()) for the host.
RuntimeDirectoryPreserve = "yes" keeps /run/hyperhive/ (and the per-agent sub-dirs) across hive-c0re restarts. Without it, every restart wipes bind sources and existing containers can't be started.
register_agent is idempotent — drops any prior socket task before rebinding. Required so a hive-c0re restart followed by rebuild alice recreates the agent's socket without needing a clean reinstall.
claude-code is unfree. harness-base.nix allow-list's it specifically. The flake pins it to nixpkgs-unstable via overlays.claude-unstable (stable lags too far). The overlay imports unstable with its own allowUnfreePredicate so the access inside the overlay doesn't itself trip.
Claude credentials are per-agent. /var/lib/hyperhive/agents/<name>/claude/ bind-mounts to /root/.claude (RW). Sharing one dir across agents is NOT viable — OAuth refresh tokens rotate, so any sibling refresh invalidates all the others. Login flow runs from the per-agent web UI; creds persist across destroy/recreate.
Orphan approvals. If state dirs are wiped out from under a pending approval (test scripts, manual rm -rf), the dashboard's next render marks them failed with note "agent state dir missing" so they fall out of pending. They stay in sqlite for audit.

Agent MCP surface + turn loop

The harness ships an embedded MCP server (rmcp 1.7) that claude launches as a stdio child via --mcp-config. Subcommand: hive-ag3nt mcp (or hive-m1nd mcp for the manager surface).

Sub-agent tools:

mcp__hyperhive__send(to, body) — message a peer or the operator.
mcp__hyperhive__recv() — drain one inbox message.

Manager additionally:

mcp__hyperhive__request_spawn(name) — queue Spawn approval.
mcp__hyperhive__kill(name) — graceful stop.
mcp__hyperhive__request_apply_commit(agent, commit_ref) — submit a config change for any agent (including hm1nd for self-mods).

The shared per-turn plumbing lives in hive_ag3nt::turn::{write_mcp_config, run_turn, drive_turn, emit_turn_end, wait_for_login} so the two binaries can't drift.

Each turn:

claude --print --verbose --output-format stream-json --model haiku \
  --continue --settings '{"autoCompactEnabled":false,"autoMemoryEnabled":false}' \
  --mcp-config <path> --strict-mcp-config \
  --tools <builtins> --allowedTools <builtins+mcp>
# prompt piped over stdin

--continue keeps a persistent session per agent (claude stores sessions in ~/.claude/projects/, which is bind-mounted persistently). Auto-compact and auto-memory are disabled because hyperhive owns compaction (/compact on overflow, retry once).

Loop control. The harness pops one inbox message per cycle (the wake signal — Recv long-polls server-side for up to 30s waking instantly on a new broker Sent event for this agent) and hands claude a prompt naming the agent, the sender, the body, and the MCP tools. Claude drives any further recv/send itself.

Tool envelope (mcp::run_tool_envelope): every MCP tool handler logs the request, runs the body, appends a status line (e.g. [status] 3 unread message(s) in inbox from a non-mutating Status peek), logs the result. New tools call this helper.

Tool whitelist (mcp::ALLOWED_BUILTIN_TOOLS):

Allowed built-ins: Bash, Edit, Glob, Grep, Read, TodoWrite, Write.
Denied by omission: WebFetch, WebSearch, Task, NotebookEdit.
Allowed MCP tools: as listed above per flavor.

Bash is on the allow-list pending a finer-grained pattern allow-list (Bash(git *)-style) — see TODO.md.

Live view. Each agent runs an events::Bus (a tokio::sync::broadcast<LiveEvent> wrapper). The harness emits TurnStart, Stream(value) (one per parsed stream-json line), Note, TurnEnd. The web UI subscribes via /events/stream (SSE) and a JS panel appends rows. No full-page reload — operator input stays put.

Manager (hm1nd) is hive-c0re-managed

The manager container runs through the same lifecycle as sub-agents. On hive-c0re serve startup, if hm1nd is missing, hive-c0re creates it. The manager's flake lives at /var/lib/hyperhive/applied/hm1nd/; its proposed config at /var/lib/hyperhive/agents/hm1nd/config/. Manager can edit its own agent.nix (visible inside the container at /agents/hm1nd/config/) and submit request-apply-commit hm1nd <sha> for operator approval.

Differences from sub-agents:

flake.nix extends hyperhive.nixosConfigurations.manager (vs agent-base).
Container name is hm1nd (no h- prefix).
Fixed web UI port (MANAGER_PORT = 8000).
set_nspawn_flags adds an extra bind: /var/lib/hyperhive/agents → /agents (RW), so the manager can edit per-agent proposed repos.
First-deploy spawn bypasses the approval queue (manager is required infrastructure).
Per-agent socket lives at /run/hyperhive/manager/, owned by manager_server::start.

Migration note (for older hosts): drop any containers.hm1nd = { ... } block from your host NixOS config. hyperhive creates and updates the manager itself now.

Auto-update on startup

hive-c0re serve runs auto_update::run in a background task right after opening the coordinator. It enumerates managed containers and rebuilds any whose recorded hyperhive rev differs from the current one — sub-agents and manager go through the same lifecycle::rebuild path. "Rev" = canonical filesystem path of cfg.hyperhiveFlake. Marker file: /var/lib/hyperhive/applied/.<name>.hyperhive-rev.

If the flake input has no canonical path (e.g. a github: URL), auto-update is a no-op — rebuild manually.

The dashboard surfaces pending updates per agent: a clickable "needs update ↻" badge appears whenever the marker differs from current rev. The badge POSTs /rebuild/<name>, calling the same auto_update::rebuild_agent path so manual triggers and the startup scan can't drift.

Approval flow

End-to-end: manager edits per-agent proposed repo → commits → submits commit sha → user approves on host CLI or dashboard button → hive-c0re reads the file at that sha from proposed, applies into applied, commits there, runs nixos-container update. Helper-event JSON (ApprovalResolved) lands in the manager's inbox.

Two separate git repos per agent:

/var/lib/hyperhive/agents/<name>/config/    # proposed — manager edits, hive-c0re reads only
└── agent.nix                               # the only file the manager can change
                                            # (initial commit by hive-c0re on first spawn,
                                            # never touched by hive-c0re again)

/var/lib/hyperhive/applied/<name>/          # applied — hive-c0re-only; container builds here
├── flake.nix                               # auto-generated; references hyperhive_flake
└── agent.nix                               # overwritten by approve from the proposed commit

The container's --flake ref is <applied_dir>#default. The flake extends hyperhive.nixosConfigurations.{agent-base|manager} with ./agent.nix plus an inline module setting programs.git.config.user (committer identity = the agent's name) and systemd.services.<harness>.environment (HIVE_PORT, HIVE_LABEL, HIVE_DASHBOARD_PORT).

13 KiB Raw Blame History