hyperhive/docs/persistence.md
müde 62d1a74929 docs sync + revert auto-unfree removal
revert the earlier 'operator must set allowUnfree' move:
per-agent containers evaluate their own nixpkgs and the operator's
host-level allowUnfree doesn't propagate in. restoring the scoped
allowUnfreePredicate inside both the claude-unstable overlay and
harness-base.nix; documented in README + gotchas as 'nothing to
set on the operator side'.

docs:
- claude.md file map adds crash_watch.rs, kick_agent on coordinator,
  /api/model + journald viewer + bind-with-retry references.
- scratchpad rewritten to reflect the recent run.
- web-ui.md: notification row + browser notifications section,
  state row (badge + model chip + last-turn chip + cancel button),
  per-agent inbox, /model slash, /cancel-question + journald
  endpoints, focus-preservation on refresh.
- turn-loop.md: --model is read from Bus::model() per turn (runtime
  override via /model); recv(wait_seconds) up to 180s with the
  rationale; ask_operator gains ttl_seconds; new TurnState section;
  kick_agent inbox-on-startup hint.
- approvals.md: ttl/cancel resolution paths for operator questions.
- persistence.md: /state/hyperhive-model file.
- gotchas.md: web UI port collision policy (rename, don't probe);
  bind retry + SO_REUSEADDR shape; auto-unfree restored.
- todo.md: cleaned up empty sections and stale entries; /model
  shipped, dropped from the list.
2026-05-15 21:26:13 +02:00

4.1 KiB

Persistence + retention

Where state lives, what survives what, and how it's bounded.

Two sqlite databases

/var/lib/hyperhive/broker.sqlite (host)

Three tables, all in one file:

  • messages — every inter-agent / operator-bound message. sender / recipient / body / sent_at / delivered_at.
  • approvals — the queue. agent / kind (apply_commit | spawn) / commit_ref / requested_at / status / resolved_at / note.
  • operator_questionsask_operator queue. asker / question / options_json / multi / asked_at / answered_at / answer.

Retention:

  • Broker::vacuum_delivered runs hourly via a tokio task in hive-c0re::main. Drops delivered rows older than 30 days. Undelivered rows are always kept (still in flight).
  • Approvals and questions are kept indefinitely — both are audit trails. actions::destroy and answered questions stay visible to anything that queries by id.

/state/hyperhive-events.sqlite (per agent)

Lives inside each container's bind-mounted /state/ dir (host path: /var/lib/hyperhive/agents/<name>/state/hyperhive-events.sqlite). One table:

  • events(id, ts, kind, payload_json) — every LiveEvent the harness emits during turn loop execution.

The harness writes; the host vacuums. hive-c0re::events_vacuum runs hourly and sweeps every existing agent state dir, applying the same two-stage delete to each file: drop rows older than 7 days, then trim to the 2000 most-recent. Centralising retention on the host means a misbehaving harness can't disable its own vacuum and agents don't need any cleanup wiring of their own.

Path overridable via HYPERHIVE_EVENTS_DB (for dev / no-/state setups). On open failure the Bus falls back to no-store mode rather than crashing the harness — events still broadcast over SSE, just nothing persisted.

/state/hyperhive-model (per agent)

Single-line text file holding the claude model name currently selected for this agent (default haiku when absent). Written by Bus::set_model whenever the operator flips it via /model <name> in the web terminal. Read once at harness boot in Bus::new. Path overridable via HYPERHIVE_MODEL_FILE. Survives destroy/recreate, gone on --purge.

State dirs (per agent)

Under /var/lib/hyperhive/agents/<name>/:

  • config/ — the proposed nix repo (manager-editable).
  • claude/ — claude OAuth credentials, bind-mounted RW to /root/.claude inside the container.
  • state/ — durable notes + the events.sqlite db, bind-mounted to /state inside the container.

Under /var/lib/hyperhive/applied/<name>/ — the hive-c0re-only applied repo (flake.nix + agent.nix) that the container actually builds from.

Destroy vs purge

  • DESTR0Y (default) — stops + removes the nspawn container, drops the systemd drop-in, fails any pending approvals. State dirs stay put; the agent appears in the dashboard's K3PT ST4T3 section as a tombstone with ⊕ R3V1V3 and PURG3 actions. R3V1V3 queues a Spawn approval that reuses the kept state on approve (no re-login).
  • PURG3 (opt-in via the dashboard button or hive-c0re destroy --purge <name>) — DESTR0Y plus wipes /var/lib/hyperhive/{agents,applied}/<name>/. Config history, claude creds, /state/ notes, and the events db are all gone. No undo.

The manager is non-destroyable from both paths (declarative container; would fight with the host's NixOS config).

Run-time dirs

/run/hyperhive/ is tmpfs-backed (systemd RuntimeDirectory=) but preserved across hive-c0re restarts via RuntimeDirectoryPreserve=yes. Without that, every restart wipes bind sources and existing containers can't be started.

  • /run/hyperhive/host.sock — admin socket (host-side CLI).
  • /run/hyperhive/manager/mcp.sock — manager-privileged socket.
  • /run/hyperhive/agents/<name>/mcp.sock — per-sub-agent socket (bind-mounted into the container as /run/hive/mcp.sock).

On startup, Coordinator::register_agent drops any prior socket task before rebinding — idempotent so a hive-c0re restart followed by rebuild alice recreates the agent's socket without a clean reinstall.