final docs + cleanup sync for meta-flake era

claude.md flips 'in flight' → 'just landed' for the meta overhaul + extends the file map with meta.rs and migrate.rs. docs/approvals.md replaces the in-flight callout with a proper 'Meta flake' section (two-phase deploy walkthrough, sync_agents semantics, single-phase variants), updates the two-repo box diagram to include the /var/lib/hyperhive/meta/ tree and tracks flake.nix in applied, rewrites the container --flake reference to meta#<name>, replaces the 'Manager view of applied' section with a unified '/agents + /applied + /meta' inventory listing every useful git incantation, and explains the in-place no-state-loss migration that now runs on hive-c0re startup. docs/persistence.md grows entries for the meta repo + the .meta-migration-done marker. readme box diagram picks up the /meta RO bind; approval-flow paragraph rewritten end to end to describe the meta lock dance. lifecycle::flake_base deleted — the meta render hardcodes the manager vs agent-base choice as nix expression.
2026-05-16 00:40:06 +02:00 · 2026-05-16 00:40:06 +02:00 · 14aa7c7acc
commit 14aa7c7acc
parent 2f6ecc4dc0
5 changed files with 213 additions and 132 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -31,10 +31,18 @@ hive-c0re/         host daemon + CLI (one binary, subcommand-dispatched)
  src/coordinator.rs    shared state (broker/approvals/questions/transient/
                         sockets) + tombstone enumeration + kick_agent
  src/actions.rs        approve/deny/destroy (transient-aware)
-  src/auto_update.rs    startup rebuild scan + ensure_manager
-  src/lifecycle.rs      `nixos-container` shellouts, per-agent flake generator
+  src/auto_update.rs    startup rebuild scan + ensure_manager +
+                         meta::lock_update_hyperhive bump
+  src/lifecycle.rs      `nixos-container` shellouts; per-agent applied
+                         + proposed git repo seeding; tag plumbing
+  src/meta.rs           single hive-c0re-owned flake at /var/lib/
+                         hyperhive/meta/ — sync_agents, two-phase
+                         prepare/finalize/abort, lock_update_*
+  src/migrate.rs        startup auto-migration from pre-meta layout
+                         (idempotent, marker-guarded phase 4)
  src/dashboard.rs      axum HTTP: static shell + /api/state JSON + actions
                         + journald viewer + bind-with-retry (SO_REUSEADDR)
+                         + deployed_sha chip per container
  assets/               index.html, dashboard.css, app.js (include_str!)

 hive-ag3nt/        in-container harness crate; produces TWO binaries
@ -114,51 +122,40 @@ read them à la carte.
 In-flight or recent context that hasn't earned a section yet.
 Prune freely.

- **In flight:** meta-flake overhaul. Each agent's applied
-  repo becomes a tiny module-only flake (`nixosModules.default
-  = import ./agent.nix`); `agent.nix` is just a NixOS module
-  function `{ config, pkgs, lib, ... }: { ... }` — no
-  extendModules, no hyperhive input visible to the manager.
-  A single hive-c0re-owned repo at `/var/lib/hyperhive/meta/`
-  declares one input per agent (pointing at that agent's
-  applied repo via `git+file://`) and one
-  `nixosConfigurations.<n>` output per agent, wrapping
-  `inputs.agent-<n>.nixosModules.default` with the identity
-  + `HIVE_PORT` / `HIVE_LABEL` / `HIVE_DASHBOARD_PORT`
-  injection that today's per-agent `setup_applied` does
-  inline. Containers run against `meta#<n>` instead of
-  `applied/<n>#default`. Every approval that lands does
-  `nix flake lock --update-input agent-<n>` in meta and
-  commits the lock — meta's git log is the system-wide
-  deploy audit trail; per-agent tags stay as before for
-  inside-baseball state.
- **Companion change:** the manager's `/agents/<n>/config/`
-  (proposed) gets `applied` pre-configured as a git remote
-  pointing at `/applied/<n>/.git` (the RO bind already
-  there). `git fetch applied` / `git show
-  applied/refs/tags/deployed/<id>` / `git rebase
-  applied/main` etc. all just work from inside the
-  manager. The manager additionally gets `/meta` RO-bound,
-  so `git -C /meta log --oneline` and
-  `cat /meta/flake.lock` answer "what's actually deployed
-  across the swarm right now."
- **Auto-migration on startup:** new phase before
-  `auto_update::run` rewrites each existing
-  `applied/<n>/flake.nix` to the module-only shape +
-  relocates `deployed/0`, adds the `applied` remote to each
-  proposed repo, bootstraps the meta repo from the agent
-  list if missing, and `nixos-container update`s every
-  container to point at `meta#<n>` (no fs wipe, no
-  re-login). Idempotent; `HIVE_SKIP_META_MIGRATION=1`
-  defers it.
- **Just landed (prior overhaul still in place):** tag-driven
-  config-apply. Two-repo split (proposed = manager RW,
-  applied = core-only); `request_apply_commit` fetches the
-  manager's commit into applied and pins it as
-  `proposal/<id>`; approve / deny / build walk through tags
-  on the same commit; `applied/main` only fast-forwards on
-  `deployed/`. `failed/` + `denied/` are annotated. See
-  `docs/approvals.md` for the state machine.
+- **Just landed:** meta-flake overhaul. Each agent's applied
+  repo is a tiny module-only flake (`nixosModules.default =
+  import ./agent.nix`); `agent.nix` is a plain NixOS module
+  function — no extendModules, no hyperhive input visible to
+  the manager. A single hive-c0re-owned repo at
+  `/var/lib/hyperhive/meta/` declares one input per agent
+  (pointing at that agent's applied repo via `git+file://`)
+  and one `nixosConfigurations.<n>` output per agent,
+  wrapping `inputs.agent-<n>.nixosModules.default` with the
+  identity + `HIVE_PORT` / `HIVE_LABEL` /
+  `HIVE_DASHBOARD_PORT` injection. Containers run against
+  `meta#<n>`. Every approve runs `nix flake lock
+  --update-input agent-<n>` (two-phase: prepare on the
+  build path, finalize/abort on the result) — meta's git
+  log is the system-wide deploy audit trail; failures and
+  denials live as annotated tags in applied. The manager
+  has `/applied` and `/meta` RO-bound and the `applied`
+  remote pre-wired in every proposed repo so `git fetch
+  applied`, `git show applied/refs/tags/deployed/<id>`,
+  `git -C /meta log --oneline`, `cat /meta/flake.lock`
+  all just work. Migration runs idempotently on
+  hive-c0re startup (`HIVE_SKIP_META_MIGRATION=1` skips it):
+  rewrites pre-meta applied flakes to module-only, wires
+  the proposed remote, seeds meta, and repoints every
+  container at `meta#<n>` (guarded by a marker so the
+  expensive phase only runs once).
+- **Just landed (prior overhaul still underneath):** tag-
+  driven config-apply. Two-repo split (proposed = manager
+  RW, applied = core-only); `request_apply_commit` fetches
+  the manager's commit into applied and pins it as
+  `proposal/<id>`; approve / deny / build walk through
+  tags on the same commit; `applied/main` only fast-
+  forwards on `deployed/`. `failed/` + `denied/` are
+  annotated. See `docs/approvals.md`.
 - **Recent (since last compaction):** inline +/- diffs on
  Write/Edit, send full body via collapsed details, operator
  cancel + ttl on questions, deny-with-reason, dashboard
--- a/README.md
+++ b/README.md
@ -26,8 +26,9 @@ host (NixOS, runs hive-c0re.service)
 └── nixos-containers  (each bind-mounts its socket dir → /run/hive,
   │                   credentials dir → /root/.claude,
   │                   durable notes dir → /state;
-   │                   manager additionally gets /agents RW
-   │                   + /applied RO for the deployed-tag mirror)
+   │                   manager additionally gets /agents RW,
+   │                   /applied RO (deployed-tag mirror),
+   │                   /meta RO (swarm-wide deploy flake))
   │
   ├── hm1nd      hive-m1nd serve : claude turn loop +
   │              MCP (send / recv / request_spawn / kill / start /
@ -54,21 +55,30 @@ load; collapsible inbox + collapsible journald viewer + collapsible
 `agent.nix` viewer per agent on the dashboard.

 Config changes flow the other way: manager edits files under
-`/agents/<name>/config/` (`agent.nix` is the entry point, but arbitrary
-sibling files in the commit are preserved) → commits → submits the sha
-via `request_apply_commit`. Hive-c0re immediately fetches that commit
-from the proposed repo into the applied repo and pins it as
-`proposal/<id>` — from this moment the proposal is immutable from the
-manager's side. Operator clicks ◆ APPR0VE on the dashboard → hive-c0re
-moves the working tree to the proposal, runs `nixos-container update`,
-and either fast-forwards `applied/main` (tagging `deployed/<id>`) or
-annotates `failed/<id>` with the build error and rolls back to the
-previous deployed tree. Denials leave a `denied/<id>` annotated tag
-carrying the operator's note. The manager sees everything that
-shipped (or didn't) via a read-only `/applied/<n>/.git` mirror inside
-its container; `git show applied/deployed/<id>` etc. is the audit
-trail. See [`docs/approvals.md`](docs/approvals.md) for the full tag
-state machine.
+`/agents/<name>/config/` — `agent.nix` is a plain NixOS module function
+`{ config, pkgs, lib, ... }: { ... }`, and arbitrary sibling files in
+the commit are preserved → commits → submits the sha via
+`request_apply_commit`. Hive-c0re immediately fetches that commit from
+the proposed repo into the applied repo and pins it as `proposal/<id>`
+— immutable from the manager's side from then on. Operator clicks
+◆ APPR0VE → hive-c0re fast-forwards `applied/<n>/main` to the proposal,
+runs `nix flake lock --update-input agent-<n>` against the host-wide
+meta flake at `/var/lib/hyperhive/meta/`, builds via
+`nixos-container update <c> --flake meta#<name>`, and either commits
+the lock + tags `deployed/<id>` on success or `git restore`s the lock +
+annotates `failed/<id>` with the build error + rolls back
+`applied/<n>/main` on failure. Denials leave a `denied/<id>` annotated
+tag carrying the operator's note.
+
+Meta's git log is the swarm-wide deploy audit trail (one commit per
+successful deploy). Per-agent applied repos carry the tag-rich state
+machine for inside-baseball decisions. The manager sees both — proposed
+repos ship with an `applied` remote pre-wired, and `/meta/` is RO-bound
+inside the container — so `git fetch applied`,
+`git show applied/refs/tags/deployed/<id>`, `git log /meta`,
+`cat /meta/flake.lock` all just work without constructing paths by
+hand. See [`docs/approvals.md`](docs/approvals.md) for the full state
+machine + lock-flow walkthrough.
 For decisions the manager needs human signal on, `ask_operator(question,
 options?, multi?)` queues a free-text/checkbox/radio form on the
 dashboard; the answer arrives later as a `HelperEvent::OperatorAnswered`
--- a/docs/approvals.md
+++ b/docs/approvals.md
@ -37,26 +37,58 @@ step — the operator just sees the name. On approve, hive-c0re
 creates the container in a background task while the dashboard
 shows a spinner.

-## Meta flake (in flight)
+## Meta flake

-> The next overhaul (currently being implemented) introduces a
-> single hive-c0re-owned meta repo at
-> `/var/lib/hyperhive/meta/` that consumes every agent's
-> applied repo as a flake input and owns the wrapper
-> nixosConfiguration. Each agent's `applied/<n>/flake.nix`
-> shrinks to `nixosModules.default = import ./agent.nix` —
-> `agent.nix` becomes a plain NixOS module function (no
-> extendModules / hyperhive input). Containers will run
-> against `--flake /var/lib/hyperhive/meta#<n>`. Every
-> approval that builds does
-> `nix flake lock --update-input agent-<n>` in meta and
-> commits the lock; meta's git log is the system-wide deploy
-> trail. Manager additionally gets `/applied/<n>/.git`
-> pre-registered as the `applied` remote inside its proposed
-> repo, and `/meta` RO-bound for browsing the deploy log.
-> Auto-migrates on startup. Sections below describe the
-> current (still-deployed) tag-driven shape that the meta
-> flake builds on top of.
+The hive-c0re-owned repo at `/var/lib/hyperhive/meta/`
+declares one flake input per agent (`agent-<n>.url =
+"git+file:///var/lib/hyperhive/applied/<n>"`) and one
+`nixosConfigurations.<n>` output per agent. Each output wraps
+`inputs.agent-<n>.nixosModules.default` with the identity +
+`HIVE_PORT` / `HIVE_LABEL` / `HIVE_DASHBOARD_PORT` injection
+module that `setup_applied` used to generate inline.
+Containers run against `--flake /var/lib/hyperhive/meta#<n>`.
+
+Per-deploy lock flow (two-phase, owned by
+`actions::run_apply_commit` → `meta::{prepare,finalize,abort}
+_deploy`):
+
+1. `meta::prepare_deploy(name)` runs
+   `nix flake lock --update-input agent-<n>` without
+   committing. Working tree of meta now points the input at
+   `applied/<n>/main` (which `run_apply_commit` already
+   fast-forwarded to `proposal/<id>`).
+2. `lifecycle::rebuild_no_meta` runs
+   `nixos-container update <c> --flake meta#<name>`. Nix
+   evaluates against the staged lock.
+3. On success — `meta::finalize_deploy(name, sha, "deployed/
+   <id>")` stages `flake.lock` and commits with
+   `deploy <n> deployed/<id> <sha12>`. Meta's git log gains
+   one entry per successful deploy.
+4. On failure — `meta::abort_deploy()` runs
+   `git restore flake.lock` so the meta history shows only
+   successes; the failure stays as an annotated `failed/<id>`
+   tag in `applied/<n>`.
+
+Single-phase variants exist for paths without
+rollback semantics: `meta::lock_update_for_rebuild(name)` for
+the manual `↻ R3BU1LD` button (commits if the lock changed)
+and `meta::lock_update_hyperhive()` for the
+auto-update flake-rev bump (one shot before per-agent
+rebuilds, commits if the lock changed).
+
+`meta::sync_agents(hyperhive_flake, dashboard_port, &agents)`
+is the idempotent reconciler called by `spawn`, `destroy`,
+`rebuild`, and the startup migration. Renders `flake.nix`
+from the agent list; if it differs from disk, runs
+`nix flake lock` + commits as `regenerate meta flake` (or
+`seed meta from N agent(s)` on the very first call).
+
+The manager has `/meta` RO-bound inside its container:
+`git -C /meta log --oneline` is the swarm-wide deploy log,
+`cat /meta/flake.lock | jq '.nodes["agent-<n>"].locked'`
+resolves which sha each agent is pinned at right now.
+Dashboard surfaces the same info as a `deployed:<sha12>` chip
+per container row.

 ## Two repos per agent

@ -67,17 +99,23 @@ shows a spinner.
                                            # agent.nix is the
                                            # convention entry
                                            # point; flake.nix is
-                                            # generated and not
-                                            # tracked here.
+                                            # tracked boilerplate
+                                            # (manager doesn't edit
+                                            # it).

 /var/lib/hyperhive/applied/<name>/          applied — core-only
 ├── .git/                                   # tag-rich history
-├── .gitignore                              # ignores flake.nix
-├── flake.nix                               # hive-c0re-generated,
-│                                           # untracked, rewritten
-│                                           # on spawn/rebuild only
+├── flake.nix                               # tracked, fixed
+│                                           # boilerplate exporting
+│                                           # nixosModules.default
 ├── agent.nix                               # working tree of main
 └── <other manager files>                   # also tracked
+
+/var/lib/hyperhive/meta/                    swarm-wide flake — core
+├── .git/                                   # one commit per successful
+│                                           # deploy
+├── flake.nix                               # generated from agent set
+└── flake.lock                              # pins each agent's sha
 ```

 Why two physical repos: the manager's `/agents/<n>/config/` is
@ -86,13 +124,12 @@ proposed tree. The applied repo is never bind-mounted (except
 the read-only `.git` exposure described below) so a destructive
 move inside the container cannot reach it.

-The container's `--flake` ref is `<applied_dir>#default`. The
-generated `flake.nix` extends
-`hyperhive.nixosConfigurations.{agent-base|manager}` with
-`./agent.nix` plus an inline module setting
-`programs.git.config.user` (committer identity = the agent's name)
-and `systemd.services.<harness>.environment` (`HIVE_PORT`,
-`HIVE_LABEL`, `HIVE_DASHBOARD_PORT`).
+The container's `--flake` ref is `/var/lib/hyperhive/meta#<name>`
+(see "Meta flake" above). The agent's own `applied/<n>/flake.nix`
+is a fixed boilerplate that exports `nixosModules.default =
+import ./agent.nix`; the meta flake imports that module and
+wraps it with identity + `HIVE_PORT` / `HIVE_LABEL` /
+`HIVE_DASHBOARD_PORT`.

 ### Tag state machine

@ -114,29 +151,63 @@ approval id to retry. Because tags are first-class git objects,
 rejected and failed trees stay browsable forever — `git log
 --tags` in the applied repo is the audit trail.

-### Manager view of applied
+### Manager view of applied + meta

-`/applied/` is a **read-only bind-mount** of
-`/var/lib/hyperhive/applied/` (the entire tree) inside the
-manager container. The manager fetches tags into its proposed
-clone with `git fetch /applied/<n>/.git
-'refs/tags/*:refs/tags/applied/*'` and `git show` any
-deployed / failed / denied tree to see what actually shipped,
-what error blocked the last build, or what note the operator
-left on a denial. The RO bind means git plumbing inside the
-manager cannot corrupt the applied repos — and a single mount
-covers every agent (existing + future) without rebuilding the
-manager on each spawn.
+The manager container gets three host-side bind mounts via
+`set_nspawn_flags`:

-## Migration from the pre-tag scheme
+- `/var/lib/hyperhive/agents/` → `/agents/` (RW) — proposed
+  repos. Manager edits + commits per-agent config here.
+- `/var/lib/hyperhive/applied/` → `/applied/` (RO) — every
+  agent's authoritative applied repo, including `.git`.
+- `/var/lib/hyperhive/meta/` → `/meta/` (RO) — the swarm-wide
+  deploy flake.

-There is no in-place migration. Each existing agent must be
-purged and re-spawned: `hive-c0re destroy --purge <name>` (or
-PURG3 on the dashboard), then `request_spawn` and the operator
-approves the fresh agent. The new agent starts with `deployed/0`
-seeded by hive-c0re; the manager's first config edit becomes
-`proposal/1` and walks the tag scheme from there. Pre-overhaul
-tombstones lose their config history.
+Each proposed repo (`/agents/<n>/config/`) is pre-configured
+with `applied` as a git remote pointing at
+`/applied/<n>/.git`. Useful incantations from inside the
+manager:
+
+```sh
+git -C /agents/<n>/config fetch applied
+git -C /agents/<n>/config log applied/main --oneline
+git -C /agents/<n>/config show applied/refs/tags/deployed/<id>
+git -C /agents/<n>/config show applied/refs/tags/failed/<id>   # body = build error
+git -C /agents/<n>/config show applied/refs/tags/denied/<id>   # body = operator note
+git -C /agents/<n>/config rebase applied/main                 # base in-flight work on what's deployed
+
+git -C /meta log --oneline                                    # swarm-wide deploy history
+cat /meta/flake.lock | jq '.nodes | with_entries(select(.key | startswith("agent-")))'
+```
+
+The RO binds block push at the kernel level, so the manager
+can only fetch / read — git plumbing inside the container
+cannot corrupt either authoritative repo.
+
+## Migration from the pre-tag / pre-meta schemes
+
+Both overhauls (tag-driven flow + meta flake) ship in-place
+migrations that run on every hive-c0re startup. Idempotent;
+each phase is a no-op once already applied. Behaviour:
+
+- Tag-driven phase: assumes the operator ran the one-shot
+  `git tag deployed/0 main` script (see commit history /
+  earlier docs revisions) once per agent. Tagging is
+  non-destructive: it doesn't touch live containers, state
+  dirs, or claude creds.
+- Meta-flake phase: rewrites each `applied/<n>/flake.nix` to
+  the module-only boilerplate, wires the `applied` remote in
+  each proposed repo, bootstraps the meta repo from the
+  current agent list, and `nixos-container update`s every
+  container at `meta#<n>`. The expensive last step is
+  guarded by `/var/lib/hyperhive/.meta-migration-done` so
+  it only runs once across hive-c0re restarts. Set
+  `HIVE_SKIP_META_MIGRATION=1` on the service to defer.
+
+No state loss in either migration. claude creds, /state/
+notes, the events DB, proposed history, and applied history
+all survive. The manager keeps its session; sub-agents stay
+logged in.

 ## Manager (`hm1nd`) is hive-c0re-managed

--- a/docs/persistence.md
+++ b/docs/persistence.md
@ -67,8 +67,25 @@ Under `/var/lib/hyperhive/agents/<name>/`:
  to `/state` inside the container.

 Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
-applied repo (`flake.nix` + `agent.nix`) that the container
-actually builds from.
+applied repo. Tracks `flake.nix` (module-only boilerplate; never
+edited after first spawn) + `agent.nix` (the actual config; the
+manager's edits land here via the approval flow) + any other
+files the manager committed. `.git/` carries the proposal /
+approved / building / deployed / failed / denied tag history.
+
+Under `/var/lib/hyperhive/meta/` — the swarm-wide deploy flake.
+Single repo for the whole host; `flake.nix` declares one input
+per agent + one `nixosConfigurations.<n>` output per agent;
+`flake.lock` is the canonical "what's deployed where." The git
+log is the deploy audit trail (one commit per successful
+deploy or hyperhive bump). Manager has this RO-mounted at
+`/meta/`.
+
+Marker file `/var/lib/hyperhive/.meta-migration-done` is
+written by the startup migration after every container has
+been repointed at `meta#<n>`. Removing it forces a re-run on
+next hive-c0re start (idempotent — only the actual repoint
+step would re-fire).

 ## Destroy vs purge

--- a/hive-c0re/src/lifecycle.rs
+++ b/hive-c0re/src/lifecycle.rs
@ -80,20 +80,6 @@ pub fn is_manager(name: &str) -> bool {
    name == MANAGER_NAME
 }

-/// The nixosConfiguration in the hyperhive flake the agent's
-/// wrapper extends. Manager → `manager`; everyone else →
-/// `agent-base`. Used by the meta-flake generator to know which
-/// base to extend per agent.
-#[must_use]
-#[allow(dead_code)] // wired up by the meta module in a follow-up commit
-pub fn flake_base(name: &str) -> &'static str {
-    if is_manager(name) {
-        "manager"
-    } else {
-        "agent-base"
-    }
-}
-
 fn validate(name: &str) -> Result<()> {
    if name.is_empty() {
        bail!("agent name must not be empty");