claude.md flips 'in flight' → 'just landed' for the meta overhaul + extends the file map with meta.rs and migrate.rs. docs/approvals.md replaces the in-flight callout with a proper 'Meta flake' section (two-phase deploy walkthrough, sync_agents semantics, single-phase variants), updates the two-repo box diagram to include the /var/lib/hyperhive/meta/ tree and tracks flake.nix in applied, rewrites the container --flake reference to meta#<name>, replaces the 'Manager view of applied' section with a unified '/agents + /applied + /meta' inventory listing every useful git incantation, and explains the in-place no-state-loss migration that now runs on hive-c0re startup. docs/persistence.md grows entries for the meta repo + the .meta-migration-done marker. readme box diagram picks up the /meta RO bind; approval-flow paragraph rewritten end to end to describe the meta lock dance. lifecycle::flake_base deleted — the meta render hardcodes the manager vs agent-base choice as nix expression.
15 KiB
Approvals + manager + helper events
The approval queue is hyperhive's pivot: nothing that changes the
shape of an agent (its config, whether it exists) happens without an
operator click. The manager (hm1nd) is the policy gate in front of
that queue; helper events are how it stays informed about what
happens after a decision lands.
End-to-end approval flow
- Manager edits files under
/agents/<name>/config/(any tracked path, butagent.nixis the contract entry point) and commits with its own git identity. - Manager submits the commit sha via
request_apply_commit(agent, commit_ref). - hive-c0re immediately fetches that commit from the proposed
repo into the applied repo and tags it
proposal/<id>. The approval row stores both the manager-supplied sha and the canonical hive-c0re-vouched sha. From here on the proposed repo is irrelevant for this approval — the manager can amend, force-push, orrm -rfthe proposed repo and the queued approval still points at an immutable git object inside applied. - Operator sees the diff on the dashboard, clicks ◆ APPR0VE (or
hive-c0re approve <id>on the CLI). - hive-c0re moves the working tree to
proposal/<id>and runs the build under a sequence of tags (see below). On success,applied/mainfast-forwards to the proposal commit. On failure, main stays put and the working tree resets back to the previous deployed commit. HelperEvent::ApprovalResolved(andRebuiltfor the ApplyCommit kind) land in the manager's inbox, carrying both the canonical sha and the terminal tag.
Spawn approvals follow the same shape but skip the commit-diff
step — the operator just sees the name. On approve, hive-c0re
creates the container in a background task while the dashboard
shows a spinner.
Meta flake
The hive-c0re-owned repo at /var/lib/hyperhive/meta/
declares one flake input per agent (agent-<n>.url = "git+file:///var/lib/hyperhive/applied/<n>") and one
nixosConfigurations.<n> output per agent. Each output wraps
inputs.agent-<n>.nixosModules.default with the identity +
HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT injection
module that setup_applied used to generate inline.
Containers run against --flake /var/lib/hyperhive/meta#<n>.
Per-deploy lock flow (two-phase, owned by
actions::run_apply_commit → meta::{prepare,finalize,abort} _deploy):
meta::prepare_deploy(name)runsnix flake lock --update-input agent-<n>without committing. Working tree of meta now points the input atapplied/<n>/main(whichrun_apply_commitalready fast-forwarded toproposal/<id>).lifecycle::rebuild_no_metarunsnixos-container update <c> --flake meta#<name>. Nix evaluates against the staged lock.- On success —
meta::finalize_deploy(name, sha, "deployed/ <id>")stagesflake.lockand commits withdeploy <n> deployed/<id> <sha12>. Meta's git log gains one entry per successful deploy. - On failure —
meta::abort_deploy()runsgit restore flake.lockso the meta history shows only successes; the failure stays as an annotatedfailed/<id>tag inapplied/<n>.
Single-phase variants exist for paths without
rollback semantics: meta::lock_update_for_rebuild(name) for
the manual ↻ R3BU1LD button (commits if the lock changed)
and meta::lock_update_hyperhive() for the
auto-update flake-rev bump (one shot before per-agent
rebuilds, commits if the lock changed).
meta::sync_agents(hyperhive_flake, dashboard_port, &agents)
is the idempotent reconciler called by spawn, destroy,
rebuild, and the startup migration. Renders flake.nix
from the agent list; if it differs from disk, runs
nix flake lock + commits as regenerate meta flake (or
seed meta from N agent(s) on the very first call).
The manager has /meta RO-bound inside its container:
git -C /meta log --oneline is the swarm-wide deploy log,
cat /meta/flake.lock | jq '.nodes["agent-<n>"].locked'
resolves which sha each agent is pinned at right now.
Dashboard surfaces the same info as a deployed:<sha12> chip
per container row.
Two repos per agent
/var/lib/hyperhive/agents/<name>/config/ proposed — manager RW
└── <anything> # any files the manager
# wants in the commit.
# agent.nix is the
# convention entry
# point; flake.nix is
# tracked boilerplate
# (manager doesn't edit
# it).
/var/lib/hyperhive/applied/<name>/ applied — core-only
├── .git/ # tag-rich history
├── flake.nix # tracked, fixed
│ # boilerplate exporting
│ # nixosModules.default
├── agent.nix # working tree of main
└── <other manager files> # also tracked
/var/lib/hyperhive/meta/ swarm-wide flake — core
├── .git/ # one commit per successful
│ # deploy
├── flake.nix # generated from agent set
└── flake.lock # pins each agent's sha
Why two physical repos: the manager's /agents/<n>/config/ is
RW — a buggy or hostile agent can git clean -fdx its own
proposed tree. The applied repo is never bind-mounted (except
the read-only .git exposure described below) so a destructive
move inside the container cannot reach it.
The container's --flake ref is /var/lib/hyperhive/meta#<name>
(see "Meta flake" above). The agent's own applied/<n>/flake.nix
is a fixed boilerplate that exports nixosModules.default = import ./agent.nix; the meta flake imports that module and
wraps it with identity + HIVE_PORT / HIVE_LABEL /
HIVE_DASHBOARD_PORT.
Tag state machine
Every approval id walks through a fixed set of tags on the underlying commit inside the applied repo:
| Tag | When | Annotated? |
|---|---|---|
proposal/<id> |
request_apply_commit, after fetch | no |
approved/<id> |
operator approve | no |
building/<id> |
rebuild started | no |
deployed/<id> |
rebuild succeeded — main ff's here |
no |
failed/<id> |
rebuild failed | yes (body = error) |
denied/<id> |
operator deny | yes (body = operator note) |
applied/main is always the latest deployed/*. denied/ and
failed/ are terminal; the manager submits a new commit + new
approval id to retry. Because tags are first-class git objects,
rejected and failed trees stay browsable forever — git log --tags in the applied repo is the audit trail.
Manager view of applied + meta
The manager container gets three host-side bind mounts via
set_nspawn_flags:
/var/lib/hyperhive/agents/→/agents/(RW) — proposed repos. Manager edits + commits per-agent config here./var/lib/hyperhive/applied/→/applied/(RO) — every agent's authoritative applied repo, including.git./var/lib/hyperhive/meta/→/meta/(RO) — the swarm-wide deploy flake.
Each proposed repo (/agents/<n>/config/) is pre-configured
with applied as a git remote pointing at
/applied/<n>/.git. Useful incantations from inside the
manager:
git -C /agents/<n>/config fetch applied
git -C /agents/<n>/config log applied/main --oneline
git -C /agents/<n>/config show applied/refs/tags/deployed/<id>
git -C /agents/<n>/config show applied/refs/tags/failed/<id> # body = build error
git -C /agents/<n>/config show applied/refs/tags/denied/<id> # body = operator note
git -C /agents/<n>/config rebase applied/main # base in-flight work on what's deployed
git -C /meta log --oneline # swarm-wide deploy history
cat /meta/flake.lock | jq '.nodes | with_entries(select(.key | startswith("agent-")))'
The RO binds block push at the kernel level, so the manager can only fetch / read — git plumbing inside the container cannot corrupt either authoritative repo.
Migration from the pre-tag / pre-meta schemes
Both overhauls (tag-driven flow + meta flake) ship in-place migrations that run on every hive-c0re startup. Idempotent; each phase is a no-op once already applied. Behaviour:
- Tag-driven phase: assumes the operator ran the one-shot
git tag deployed/0 mainscript (see commit history / earlier docs revisions) once per agent. Tagging is non-destructive: it doesn't touch live containers, state dirs, or claude creds. - Meta-flake phase: rewrites each
applied/<n>/flake.nixto the module-only boilerplate, wires theappliedremote in each proposed repo, bootstraps the meta repo from the current agent list, andnixos-container updates every container atmeta#<n>. The expensive last step is guarded by/var/lib/hyperhive/.meta-migration-doneso it only runs once across hive-c0re restarts. SetHIVE_SKIP_META_MIGRATION=1on the service to defer.
No state loss in either migration. claude creds, /state/ notes, the events DB, proposed history, and applied history all survive. The manager keeps its session; sub-agents stay logged in.
Manager (hm1nd) is hive-c0re-managed
The manager container runs through the same lifecycle as
sub-agents. On hive-c0re serve startup, if hm1nd is missing,
hive-c0re creates it. The manager's flake lives at
/var/lib/hyperhive/applied/hm1nd/; its proposed config at
/var/lib/hyperhive/agents/hm1nd/config/. Manager can edit its own
agent.nix (visible inside the container at /agents/hm1nd/config/)
and submit request_apply_commit("hm1nd", <sha>) for operator
approval.
Differences from sub-agents:
flake.nixextendshyperhive.nixosConfigurations.manager(vsagent-base).- Container name is
hm1nd(noh-prefix). - Fixed web UI port (
MANAGER_PORT = 8000). set_nspawn_flagsadds two extra binds:/var/lib/hyperhive/agents→/agents(RW) so the manager can edit per-agent proposed repos, and/var/lib/hyperhive/applied→/applied(RO) so the manager cangit fetchdeployed/failed/denied tags from any agent's authoritative applied repo (see "Manager view of applied" below).- First-deploy spawn bypasses the approval queue (manager is required infrastructure).
- Per-agent socket lives at
/run/hyperhive/manager/, owned bymanager_server::start.
Migration note (for older hosts): drop any containers.hm1nd = { ... } block from your host NixOS config. hyperhive creates and
updates the manager itself.
Manager policy
From hive-ag3nt/prompts/manager.md: the manager does NOT
rubber-stamp sub-agent config requests. It verifies (role match,
package legitimacy, cheaper alternative, blast radius) before
committing and calling request_apply_commit.
For ambiguous cases or anything that needs human signal, the
manager calls ask_operator(question, options?, multi?, ttl_seconds?) — queues the question on the dashboard and returns
the id immediately. The operator's answer arrives later as
HelperEvent::OperatorAnswered in the manager inbox. Storage is
hive-c0re::operator_questions (sqlite); the answer flow is:
POST /answer-question/{id}
→ OperatorQuestions::answer
→ notify_manager(OperatorAnswered { id, question, answer })
Two more paths resolve a pending question with a sentinel answer:
POST /cancel-question/{id}(✗ CANC3L button on the dashboard) resolves with[cancelled]. The manager sees a terminal state and can fall back.ttl_secondsdeadline: a tokio watchdog spawned at submit time firesanswer(id, "[expired]")once the ttl runs out. Already- resolved races no-op. The dashboard surfaces a⏳ MM:SSchip on each pending question with a deadline.
Helper events to the manager
Coordinator::notify_manager(&HelperEvent) enqueues an inbox
message from sender system with the event JSON in the body. The
manager harness no longer short-circuits these — they drive a
regular claude turn so the manager can react. Variants
(hive_sh4re::HelperEvent):
ApprovalResolved { id, agent, commit_ref, status, note }— fired byactions::approve+actions::denywhenever an approval transitions to its terminal state.Spawned { agent, ok, note }—actions::approve(Spawn-kind)- admin
HostRequest::Spawn.
- admin
Rebuilt { agent, ok, note }—auto_update::rebuild_agent(covers startup scan + manual/rebuildfrom dashboard) +actions::approve(ApplyCommit).Killed { agent }— adminHostRequest::Kill+ dashboard/kill+ managerKillMCP tool.Destroyed { agent }—actions::destroy.ContainerCrash { agent, note }—crash_watch: a previously- running container went away with no operator-initiated transient state (Stopping / Restarting / Destroying / Rebuilding). Manager canstartit again or escalate.NeedsLogin { agent }— sub-agent has no claude session yet. Manager can't act directly (interactive OAuth); typically flags the operator.LoggedIn { agent }— sub-agent just completed login. Manager often greets the agent on this event.NeedsUpdate { agent }— sub-agent's recorded flake rev is stale. Manager callsupdate(name)to rebuild — idempotent, no approval required.OperatorAnswered { id, question, answer }— dashboard/answer-question/{id}after the operator submits the answer form.
To add a new event: new HelperEvent variant + call sites + update
prompts/manager.md so the manager knows the new shape.
Auto-update on startup
hive-c0re serve runs auto_update::run in a background task right
after opening the coordinator. It enumerates managed containers and
rebuilds any whose recorded hyperhive rev differs from the current
one — sub-agents and manager go through the same lifecycle::rebuild
path.
"Rev" = canonical filesystem path of cfg.hyperhiveFlake. Marker
file: /var/lib/hyperhive/applied/.<name>.hyperhive-rev. If the
flake input has no canonical path (e.g. a github: URL),
auto-update is a no-op — rebuild manually.
The dashboard surfaces pending updates per agent: a clickable
"needs update ↻" badge appears whenever the marker differs from
current rev. The badge POSTs /rebuild/<name>, calling the same
auto_update::rebuild_agent path so manual triggers and the
startup scan can't drift. When at least one container is stale, a
top-level ↻ UPD4TE 4LL button appears that loops over every
stale container.