müde 8b9f7d21b7 model persisted to /state; stop auto-allowing claude-code unfree

model persistence: /model <name> now writes to /state/hyperhive-model
(in-container), Bus::new reads it on init. operator override survives
harness restart and container rebuild; gone on --purge like every
other piece of agent state. path overridable via HYPERHIVE_MODEL_FILE
for tests. failure to persist is a warn, not fatal — runtime override
still applies, just won't survive a restart.

unfree opt-in: drop the auto-allowUnfreePredicate from
harness-base.nix and the claude-unstable overlay. operator now has to
set nixpkgs.config.allowUnfree (or a predicate listing claude-code)
in their own host config. silent unfree bypass was sketchy; this is
honest. readme + gotchas updated to spell out the snippet.

todo: drops model-persistence + container-crash + journald (all
shipped); adds per-agent send allow-list (constrain who an agent can
message).

2026-05-15 21:05:40 +02:00

3.6 KiB

Raw Blame History

Gotchas

NixOS + nspawn quirks and lessons we hit the hard way. If something here looks unmotivated in the code, there's usually a story underneath.

`nixos-container` doesn't expose `--bind` on the CLI

The CLI doesn't accept --bind. Path is via EXTRA_NSPAWN_FLAGS in /etc/nixos-containers/<NAME>.conf — the start script (/nix/store/.../container_-start) expands it unquoted into the systemd-nspawn invocation. lifecycle::set_nspawn_flags() rewrites this line.

`/run/systemd/nspawn/*.nspawn` overrides are ignored

nixos-container's start script builds the nspawn command line directly. Dropping a .nspawn file under /run/systemd/nspawn/ looks like the obvious extension point and does nothing. Use EXTRA_NSPAWN_FLAGS (above).

`boot.isNspawnContainer = true`

Not boot.isContainer = true. Renamed in nixos-25.11+.

`nixos-container create` auto-assigns `HOST_ADDRESS` / `LOCAL_ADDRESS`

…in the .conf. The start script's if HOST_ADDRESS set → --network-veth branch then forces a private netns — silently fatal for our web UIs (the bind is invisible from the host). We force-clear HOST_ADDRESS / LOCAL_ADDRESS / HOST_ADDRESS6 / LOCAL_ADDRESS6 / HOST_BRIDGE and set PRIVATE_NETWORK=0.

systemd service PATH ≠ host PATH

The hive-c0re service sets path = [ pkgs.git "/run/current-system/sw" ]. In-container harness services do the same so anything an agent adds to its own agent.nix (environment.systemPackages) is visible to claude's Bash tool without editing the service definition. environment.HYPERHIVE_GIT bakes git's absolute path in (read by lifecycle::git_command()) for the host.

`RuntimeDirectoryPreserve = "yes"`

…keeps /run/hyperhive/ (and the per-agent sub-dirs) across hive-c0re restarts. Without it, every restart wipes bind sources and existing containers can't be started.

`register_agent` is idempotent

Drops any prior socket task before rebinding. Required so a hive-c0re restart followed by rebuild alice recreates the agent's socket without needing a clean reinstall.

`claude-code` is unfree

The flake pins it to nixpkgs-unstable via overlays.claude-unstable (stable lags too far). The overlay imports unstable inheriting the user's nixpkgs.config, so the operator must opt in by setting allowUnfree = true (or an allowUnfreePredicate that whitelists claude-code) on their host config. hyperhive deliberately does NOT auto-allow — silent unfree bypass would be sketchy, and the error message is clear enough that the operator can fix it once and forget about it. Same on the per-agent containers (they inherit through the same nixpkgs).

Claude credentials are per-agent

/var/lib/hyperhive/agents/<name>/claude/ bind-mounts to /root/.claude (RW). Sharing one dir across agents is NOT viable — OAuth refresh tokens rotate, so any sibling refresh invalidates all the others. Login flow runs from the per-agent web UI; creds persist across destroy/recreate (--purge wipes them).

Persistent notes dir per agent

/var/lib/hyperhive/agents/<name>/state/ bind-mounts to /state (RW). System prompts tell agents to keep durable knowledge here (/state/notes.md, anything else under /state/). The harness also writes its events log here (/state/hyperhive-events.sqlite). Survives destroy/recreate alongside the claude dir.

Orphan approvals

If state dirs are wiped out from under a pending approval (test scripts, manual rm -rf), the dashboard's next render marks them failed with note "agent state dir missing" so they fall out of pending. They stay in sqlite for audit.

3.6 KiB Raw Blame History

Gotchas

nixos-container doesn't expose --bind on the CLI

/run/systemd/nspawn/*.nspawn overrides are ignored

boot.isNspawnContainer = true

nixos-container create auto-assigns HOST_ADDRESS / LOCAL_ADDRESS