model persistence: /model <name> now writes to /state/hyperhive-model (in-container), Bus::new reads it on init. operator override survives harness restart and container rebuild; gone on --purge like every other piece of agent state. path overridable via HYPERHIVE_MODEL_FILE for tests. failure to persist is a warn, not fatal — runtime override still applies, just won't survive a restart. unfree opt-in: drop the auto-allowUnfreePredicate from harness-base.nix and the claude-unstable overlay. operator now has to set nixpkgs.config.allowUnfree (or a predicate listing claude-code) in their own host config. silent unfree bypass was sketchy; this is honest. readme + gotchas updated to spell out the snippet. todo: drops model-persistence + container-crash + journald (all shipped); adds per-agent send allow-list (constrain who an agent can message).
87 lines
3.6 KiB
Markdown
87 lines
3.6 KiB
Markdown
# Gotchas
|
|
|
|
NixOS + nspawn quirks and lessons we hit the hard way. If something
|
|
here looks unmotivated in the code, there's usually a story underneath.
|
|
|
|
## `nixos-container` doesn't expose `--bind` on the CLI
|
|
|
|
The CLI doesn't accept `--bind`. Path is via `EXTRA_NSPAWN_FLAGS` in
|
|
`/etc/nixos-containers/<NAME>.conf` — the start script
|
|
(`/nix/store/.../container_-start`) expands it unquoted into the
|
|
`systemd-nspawn` invocation. `lifecycle::set_nspawn_flags()` rewrites
|
|
this line.
|
|
|
|
## `/run/systemd/nspawn/*.nspawn` overrides are ignored
|
|
|
|
`nixos-container`'s start script builds the nspawn command line
|
|
directly. Dropping a `.nspawn` file under `/run/systemd/nspawn/`
|
|
looks like the obvious extension point and does nothing. Use
|
|
`EXTRA_NSPAWN_FLAGS` (above).
|
|
|
|
## `boot.isNspawnContainer = true`
|
|
|
|
Not `boot.isContainer = true`. Renamed in nixos-25.11+.
|
|
|
|
## `nixos-container create` auto-assigns `HOST_ADDRESS` / `LOCAL_ADDRESS`
|
|
|
|
…in the `.conf`. The start script's `if HOST_ADDRESS set →
|
|
--network-veth` branch then forces a private netns — silently fatal
|
|
for our web UIs (the bind is invisible from the host). We
|
|
force-clear `HOST_ADDRESS` / `LOCAL_ADDRESS` / `HOST_ADDRESS6` /
|
|
`LOCAL_ADDRESS6` / `HOST_BRIDGE` and set `PRIVATE_NETWORK=0`.
|
|
|
|
## systemd service PATH ≠ host PATH
|
|
|
|
The hive-c0re service sets `path = [ pkgs.git "/run/current-system/sw" ]`.
|
|
In-container harness services do the same so anything an agent adds
|
|
to its own `agent.nix` (`environment.systemPackages`) is visible to
|
|
claude's Bash tool without editing the service definition.
|
|
`environment.HYPERHIVE_GIT` bakes git's absolute path in (read by
|
|
`lifecycle::git_command()`) for the host.
|
|
|
|
## `RuntimeDirectoryPreserve = "yes"`
|
|
|
|
…keeps `/run/hyperhive/` (and the per-agent sub-dirs) across
|
|
hive-c0re restarts. Without it, every restart wipes bind sources and
|
|
existing containers can't be started.
|
|
|
|
## `register_agent` is idempotent
|
|
|
|
Drops any prior socket task before rebinding. Required so a
|
|
hive-c0re restart followed by `rebuild alice` recreates the agent's
|
|
socket without needing a clean reinstall.
|
|
|
|
## `claude-code` is unfree
|
|
|
|
The flake pins it to **nixpkgs-unstable** via
|
|
`overlays.claude-unstable` (stable lags too far). The overlay
|
|
imports unstable inheriting the user's `nixpkgs.config`, so the
|
|
operator must opt in by setting `allowUnfree = true` (or an
|
|
`allowUnfreePredicate` that whitelists `claude-code`) on their host
|
|
config. hyperhive deliberately does NOT auto-allow — silent unfree
|
|
bypass would be sketchy, and the error message is clear enough that
|
|
the operator can fix it once and forget about it. Same on the
|
|
per-agent containers (they inherit through the same nixpkgs).
|
|
|
|
## Claude credentials are per-agent
|
|
|
|
`/var/lib/hyperhive/agents/<name>/claude/` bind-mounts to
|
|
`/root/.claude` (RW). Sharing one dir across agents is NOT viable —
|
|
OAuth refresh tokens rotate, so any sibling refresh invalidates all
|
|
the others. Login flow runs from the per-agent web UI; creds persist
|
|
across `destroy`/recreate (`--purge` wipes them).
|
|
|
|
## Persistent notes dir per agent
|
|
|
|
`/var/lib/hyperhive/agents/<name>/state/` bind-mounts to `/state`
|
|
(RW). System prompts tell agents to keep durable knowledge here
|
|
(`/state/notes.md`, anything else under `/state/`). The harness also
|
|
writes its events log here (`/state/hyperhive-events.sqlite`).
|
|
Survives `destroy`/recreate alongside the claude dir.
|
|
|
|
## Orphan approvals
|
|
|
|
If state dirs are wiped out from under a pending approval (test
|
|
scripts, manual `rm -rf`), the dashboard's next render marks them
|
|
`failed` with note `"agent state dir missing"` so they fall out of
|
|
`pending`. They stay in sqlite for audit.
|