hyperhive/README.md
müde 62d1a74929 docs sync + revert auto-unfree removal
revert the earlier 'operator must set allowUnfree' move:
per-agent containers evaluate their own nixpkgs and the operator's
host-level allowUnfree doesn't propagate in. restoring the scoped
allowUnfreePredicate inside both the claude-unstable overlay and
harness-base.nix; documented in README + gotchas as 'nothing to
set on the operator side'.

docs:
- claude.md file map adds crash_watch.rs, kick_agent on coordinator,
  /api/model + journald viewer + bind-with-retry references.
- scratchpad rewritten to reflect the recent run.
- web-ui.md: notification row + browser notifications section,
  state row (badge + model chip + last-turn chip + cancel button),
  per-agent inbox, /model slash, /cancel-question + journald
  endpoints, focus-preservation on refresh.
- turn-loop.md: --model is read from Bus::model() per turn (runtime
  override via /model); recv(wait_seconds) up to 180s with the
  rationale; ask_operator gains ttl_seconds; new TurnState section;
  kick_agent inbox-on-startup hint.
- approvals.md: ttl/cancel resolution paths for operator questions.
- persistence.md: /state/hyperhive-model file.
- gotchas.md: web UI port collision policy (rename, don't probe);
  bind retry + SO_REUSEADDR shape; auto-unfree restored.
- todo.md: cleaned up empty sections and stale entries; /model
  shipped, dropped from the list.
2026-05-15 21:26:13 +02:00

4.8 KiB

hyperhive

Multi-Claude-Code-agent orchestration on nixos-containers.

A host-side Rust daemon (hive-c0re) spawns nspawn-isolated agent containers and brokers messages between them. A manager agent (hm1nd) coordinates the swarm and gates lifecycle changes on user approval via git commits, surfaced through a vibec0re-styled HTTP dashboard.

host (NixOS, runs hive-c0re.service)
│
├── operator
│   ├── browser → :7000               hive-c0re dashboard (containers, approvals)
│   ├── browser → :8000 / :8100-8999  per-agent web UIs (live SSE, send, login)
│   └── CLI     → /run/hyperhive/host.sock         JSON-line admin protocol
│
├── hive-c0re  (Rust daemon)
│   ├── lifecycle    nixos-container CRUD + per-agent flake generation
│   ├── broker       sqlite messages + tokio broadcast (powers SSE + wake-ups)
│   ├── approvals    sqlite queue, two kinds: ApplyCommit (config) + Spawn
│   ├── auto_update  rebuilds any container whose recorded flake rev is stale
│   ├── dashboard    axum HTTP + async-form actions + SSE message flow
│   └── sockets      /run/hyperhive/{host,manager,agents/<n>}/mcp.sock
│
└── nixos-containers  (each bind-mounts its socket dir → /run/hive,
   │                   credentials dir → /root/.claude,
   │                   durable notes dir → /state;
   │                   manager additionally gets /agents RW)
   │
   ├── hm1nd      hive-m1nd serve : claude turn loop +
   │              MCP (send / recv / request_spawn / kill / start /
   │                   restart / request_apply_commit / ask_operator)
   │              + web UI on :8000
   │
   └── h-<name>   hive-ag3nt serve : claude turn loop +
                  MCP (send / recv) + web UI on a hashed :8100-8999

Each turn: harness pops one inbox message (Recv long-polls server-side and wakes on a broker Sent event) → builds a wake prompt → spawns claude --print --continue --output-format stream-json --mcp-config … → streams JSON events into the per-agent SSE bus + a sqlite history db → claude drives any further recv/send itself via the embedded MCP server.

Operator surface per agent: terminal-themed live tail with a textarea prompt; slash commands /help /clear /cancel /compact; granular state badge (idle / thinking / offline) with age timer; cancel-turn button while thinking; sticky-bottom auto-scroll with "↓ N new" pill; event history backfilled on page load.

Config changes flow the other way: manager edits /agents/<name>/config/agent.nix (bind-mounted from the host's proposed repo) → commits → submits the sha as an approval → operator clicks ◆ APPR0VE on the dashboard → hive-c0re copies the file into the applied repo and nixos-container updates the agent. For decisions the manager needs human signal on, ask_operator(question, options?, multi?) queues a free-text/checkbox/radio form on the dashboard; the answer arrives later as a HelperEvent::OperatorAnswered in the manager's inbox.

Host config

Minimal flake.nix for a host that runs hive-c0re:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.11";
    hyperhive.url = "git+https://git.berlin.ccc.de/vinzenz/hyperhive";
  };

  outputs = { nixpkgs, hyperhive, ... }: {
    nixosConfigurations.my-host = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        hyperhive.nixosModules.hive-c0re
        ({ ... }: {
          services.hive-c0re.enable = true;
          # ... rest of your host config (hardware, networking, users, …)
          system.stateVersion = "25.11";
        })
      ];
    };
  };
}

hive-c0re will then:

  • open its admin socket at /run/hyperhive/host.sock + dashboard on :7000,
  • auto-create the manager container (hm1nd) if missing,
  • auto-rebuild any managed container whose hyperhive rev is stale.

claude-code is unfree; hyperhive whitelists it for itself (scoped: only claude-code, nothing else) inside the claude-unstable overlay and harness-base.nix. Per-agent containers evaluate their own nixpkgs instance so the operator's host-level allowUnfree doesn't propagate in — the predicate has to live inline. Nothing to set on the operator side.

Build / deploy

# inside the repo (devshell first; no global cargo)
nix develop -c cargo check
nix develop -c cargo clippy --workspace --all-targets -- -D warnings

# evaluate everything (rust+nix+toml fmt + clippy)
nix flake check

# deploy to a host that imports `hyperhive.nixosModules.hive-c0re`
cd ~/Repos/<nixos-config-repo>
nix flake update --update-input hyperhive
sudo nixos-rebuild switch --flake .#<host>

No overlays on the host's pkgs — the module pulls hive-c0re's package straight from hyperhive.packages.<system>.default. Just import the module and the service is wired up.