hyperhive/README.md
müde 62d1a74929 docs sync + revert auto-unfree removal
revert the earlier 'operator must set allowUnfree' move:
per-agent containers evaluate their own nixpkgs and the operator's
host-level allowUnfree doesn't propagate in. restoring the scoped
allowUnfreePredicate inside both the claude-unstable overlay and
harness-base.nix; documented in README + gotchas as 'nothing to
set on the operator side'.

docs:
- claude.md file map adds crash_watch.rs, kick_agent on coordinator,
  /api/model + journald viewer + bind-with-retry references.
- scratchpad rewritten to reflect the recent run.
- web-ui.md: notification row + browser notifications section,
  state row (badge + model chip + last-turn chip + cancel button),
  per-agent inbox, /model slash, /cancel-question + journald
  endpoints, focus-preservation on refresh.
- turn-loop.md: --model is read from Bus::model() per turn (runtime
  override via /model); recv(wait_seconds) up to 180s with the
  rationale; ask_operator gains ttl_seconds; new TurnState section;
  kick_agent inbox-on-startup hint.
- approvals.md: ttl/cancel resolution paths for operator questions.
- persistence.md: /state/hyperhive-model file.
- gotchas.md: web UI port collision policy (rename, don't probe);
  bind retry + SO_REUSEADDR shape; auto-unfree restored.
- todo.md: cleaned up empty sections and stale entries; /model
  shipped, dropped from the list.
2026-05-15 21:26:13 +02:00

119 lines
4.8 KiB
Markdown

# hyperhive
Multi-Claude-Code-agent orchestration on **nixos-containers**.
A host-side Rust daemon (`hive-c0re`) spawns nspawn-isolated agent
containers and brokers messages between them. A manager agent (`hm1nd`)
coordinates the swarm and gates lifecycle changes on user approval via git
commits, surfaced through a vibec0re-styled HTTP dashboard.
```
host (NixOS, runs hive-c0re.service)
├── operator
│ ├── browser → :7000 hive-c0re dashboard (containers, approvals)
│ ├── browser → :8000 / :8100-8999 per-agent web UIs (live SSE, send, login)
│ └── CLI → /run/hyperhive/host.sock JSON-line admin protocol
├── hive-c0re (Rust daemon)
│ ├── lifecycle nixos-container CRUD + per-agent flake generation
│ ├── broker sqlite messages + tokio broadcast (powers SSE + wake-ups)
│ ├── approvals sqlite queue, two kinds: ApplyCommit (config) + Spawn
│ ├── auto_update rebuilds any container whose recorded flake rev is stale
│ ├── dashboard axum HTTP + async-form actions + SSE message flow
│ └── sockets /run/hyperhive/{host,manager,agents/<n>}/mcp.sock
└── nixos-containers (each bind-mounts its socket dir → /run/hive,
│ credentials dir → /root/.claude,
│ durable notes dir → /state;
│ manager additionally gets /agents RW)
├── hm1nd hive-m1nd serve : claude turn loop +
│ MCP (send / recv / request_spawn / kill / start /
│ restart / request_apply_commit / ask_operator)
│ + web UI on :8000
└── h-<name> hive-ag3nt serve : claude turn loop +
MCP (send / recv) + web UI on a hashed :8100-8999
```
Each turn: harness pops one inbox message (Recv long-polls server-side and
wakes on a broker Sent event) → builds a wake prompt → spawns
`claude --print --continue --output-format stream-json --mcp-config …`
streams JSON events into the per-agent SSE bus + a sqlite history db →
claude drives any further `recv`/`send` itself via the embedded MCP server.
Operator surface per agent: terminal-themed live tail with a textarea
prompt; slash commands `/help` `/clear` `/cancel` `/compact`; granular
state badge (idle / thinking / offline) with age timer; cancel-turn
button while thinking; sticky-bottom auto-scroll with "↓ N new" pill;
event history backfilled on page load.
Config changes flow the other way: manager edits `/agents/<name>/config/agent.nix`
(bind-mounted from the host's proposed repo) → commits → submits the sha as
an approval → operator clicks ◆ APPR0VE on the dashboard → hive-c0re copies
the file into the applied repo and `nixos-container update`s the agent.
For decisions the manager needs human signal on, `ask_operator(question,
options?, multi?)` queues a free-text/checkbox/radio form on the
dashboard; the answer arrives later as a `HelperEvent::OperatorAnswered`
in the manager's inbox.
## Host config
Minimal `flake.nix` for a host that runs hive-c0re:
```nix
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.11";
hyperhive.url = "git+https://git.berlin.ccc.de/vinzenz/hyperhive";
};
outputs = { nixpkgs, hyperhive, ... }: {
nixosConfigurations.my-host = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
hyperhive.nixosModules.hive-c0re
({ ... }: {
services.hive-c0re.enable = true;
# ... rest of your host config (hardware, networking, users, …)
system.stateVersion = "25.11";
})
];
};
};
}
```
hive-c0re will then:
- open its admin socket at `/run/hyperhive/host.sock` + dashboard on
`:7000`,
- auto-create the manager container (`hm1nd`) if missing,
- auto-rebuild any managed container whose hyperhive rev is stale.
`claude-code` is unfree; hyperhive whitelists it for itself
(scoped: only `claude-code`, nothing else) inside the
`claude-unstable` overlay and `harness-base.nix`. Per-agent
containers evaluate their own nixpkgs instance so the operator's
host-level `allowUnfree` doesn't propagate in — the predicate has
to live inline. Nothing to set on the operator side.
## Build / deploy
```sh
# inside the repo (devshell first; no global cargo)
nix develop -c cargo check
nix develop -c cargo clippy --workspace --all-targets -- -D warnings
# evaluate everything (rust+nix+toml fmt + clippy)
nix flake check
# deploy to a host that imports `hyperhive.nixosModules.hive-c0re`
cd ~/Repos/<nixos-config-repo>
nix flake update --update-input hyperhive
sudo nixos-rebuild switch --flake .#<host>
```
No overlays on the host's `pkgs` — the module pulls hive-c0re's package
straight from `hyperhive.packages.<system>.default`. Just import the
module and the service is wired up.