6.5 KiB
Persistence + retention
Where state lives, what survives what, and how it's bounded.
Two sqlite databases
/var/lib/hyperhive/broker.sqlite (host)
Three tables, all in one file:
messages— every inter-agent / operator-bound message.sender / recipient / body / sent_at / delivered_at.approvals— the queue.agent / kind (apply_commit | spawn) / commit_ref / requested_at / status / resolved_at / note.operator_questions—ask/answerqueue (despite the file name, stores both operator-targeted + agent-to-agent questions since theaskrename).asker / question / options_json / multi / asked_at / deadline_at (ttl) / answered_at / answer / target.target IS NULL= operator path (dashboard);target = '<agent>'= peer Q&A (HelperEvent::QuestionAskedpushed into target's inbox, answered viaAnswerrequest). Migrated viaALTER TABLE ADD COLUMNagainstpragma_table_info.
Retention:
Broker::vacuum_deliveredruns hourly via a tokio task inhive-c0re::main. Drops delivered rows older than 30 days. Undelivered rows are always kept (still in flight).- Approvals and questions are kept indefinitely — both are
audit trails.
actions::destroyand answered questions stay visible to anything that queries by id.
/state/hyperhive-events.sqlite (per agent)
Lives inside each container's bind-mounted /state/ dir (host
path: /var/lib/hyperhive/agents/<name>/state/hyperhive-events.sqlite).
One table:
events(id, ts, kind, payload_json)— everyLiveEventthe harness emits during turn loop execution.
The harness writes; the host vacuums. hive-c0re::events_vacuum
runs hourly and sweeps every existing agent state dir, deleting
rows older than 7 days. Age-only — no row cap — so a chatty turn
doesn't lose history sooner than a quiet one; disk pressure on a
sustained burst is the cheaper problem to have. Centralising
retention on the host means a misbehaving harness can't disable
its own vacuum and agents don't need any cleanup wiring of their
own.
Path overridable via HYPERHIVE_EVENTS_DB (for dev / no-/state
setups). On open failure the Bus falls back to no-store mode
rather than crashing the harness — events still broadcast over SSE,
just nothing persisted.
/state/hyperhive-turn-stats.sqlite (per agent)
Per-turn analytics sink. One row per claude turn captures
identity (model, wake_from, result_kind), timing
(started_at, ended_at, duration_ms), cost (input / output /
cache_read / cache_creation token counts), behaviour
(tool_call_count + tool_call_breakdown_json), and post-turn
snapshot metrics (open_threads_count,
open_reminders_count — fetched via the same socket the harness
already uses for GetOpenThreads + CountPendingReminders).
Bin-loop helpers build_row + record land each row at
turn_end; writes are best-effort, a sqlite hiccup logs + lets
the turn loop continue.
No host-side vacuum yet — tracked in TODO.md under Telemetry
(target retention ~90 days, age-only sweep like events_vacuum).
/state/hyperhive-model (per agent)
Single-line text file holding the claude model name currently
selected for this agent (default haiku when absent). Written by
Bus::set_model whenever the operator flips it via /model <name> in the web terminal. Read once at harness boot in
Bus::new. Path overridable via HYPERHIVE_MODEL_FILE.
Survives destroy/recreate, gone on --purge.
State dirs (per agent)
Under /var/lib/hyperhive/agents/<name>/:
config/— the proposed nix repo (manager-editable). Bind-mounted read-only to/agents/<name>/configinside the sub-agent's own container so the agent can inspect what defines it and request precise changes from the manager; RW into the manager via the/agentstree bind.claude/— claude OAuth credentials, bind-mounted RW to/root/.claudeinside the container.state/— durable notes, the events.sqlite db, and the turn-stats sqlite db. Bind-mounted to/agents/<name>/stateinside the container (the manager still uses the legacy/statemount point — same host path either way).
Under /var/lib/hyperhive/applied/<name>/ — the hive-c0re-only
applied repo. Tracks flake.nix (module-only boilerplate; never
edited after first spawn) + agent.nix (the actual config; the
manager's edits land here via the approval flow) + any other
files the manager committed. .git/ carries the proposal /
approved / building / deployed / failed / denied tag history.
Under /var/lib/hyperhive/meta/ — the swarm-wide deploy flake.
Single repo for the whole host; flake.nix declares one input
per agent + one nixosConfigurations.<n> output per agent;
flake.lock is the canonical "what's deployed where." The git
log is the deploy audit trail (one commit per successful
deploy or hyperhive bump). Manager has this RO-mounted at
/meta/.
Marker file /var/lib/hyperhive/.meta-migration-done is
written by the startup migration after every container has
been repointed at meta#<n>. Removing it forces a re-run on
next hive-c0re start (idempotent — only the actual repoint
step would re-fire).
Destroy vs purge
DESTR0Y(default) — stops + removes the nspawn container, drops the systemd drop-in, fails any pending approvals. State dirs stay put; the agent appears in the dashboard's K3PT ST4T3 section as a tombstone with⊕ R3V1V3andPURG3actions.R3V1V3queues a Spawn approval that reuses the kept state on approve (no re-login).PURG3(opt-in via the dashboard button orhive-c0re destroy --purge <name>) — DESTR0Y plus wipes/var/lib/hyperhive/{agents,applied}/<name>/. Config history, claude creds, /state/ notes, and the events db are all gone. No undo.
The manager is non-destroyable from both paths (declarative container; would fight with the host's NixOS config).
Run-time dirs
/run/hyperhive/ is tmpfs-backed (systemd RuntimeDirectory=) but
preserved across hive-c0re restarts via RuntimeDirectoryPreserve=yes.
Without that, every restart wipes bind sources and existing
containers can't be started.
/run/hyperhive/host.sock— admin socket (host-side CLI)./run/hyperhive/manager/mcp.sock— manager-privileged socket./run/hyperhive/agents/<name>/mcp.sock— per-sub-agent socket (bind-mounted into the container as/run/hive/mcp.sock).
On startup, Coordinator::register_agent drops any prior socket
task before rebinding — idempotent so a hive-c0re restart followed
by rebuild alice recreates the agent's socket without a clean
reinstall.