final docs + cleanup sync for meta-flake era

claude.md flips 'in flight' → 'just landed' for the meta
overhaul + extends the file map with meta.rs and migrate.rs.
docs/approvals.md replaces the in-flight callout with a
proper 'Meta flake' section (two-phase deploy walkthrough,
sync_agents semantics, single-phase variants), updates the
two-repo box diagram to include the /var/lib/hyperhive/meta/
tree and tracks flake.nix in applied, rewrites the
container --flake reference to meta#<name>, replaces the
'Manager view of applied' section with a unified
'/agents + /applied + /meta' inventory listing every useful
git incantation, and explains the in-place no-state-loss
migration that now runs on hive-c0re startup.
docs/persistence.md grows entries for the meta repo + the
.meta-migration-done marker. readme box diagram picks up the
/meta RO bind; approval-flow paragraph rewritten end to end
to describe the meta lock dance.

lifecycle::flake_base deleted — the meta render hardcodes
the manager vs agent-base choice as nix expression.
This commit is contained in:
müde 2026-05-16 00:40:06 +02:00
parent 2f6ecc4dc0
commit 14aa7c7acc
5 changed files with 213 additions and 132 deletions

View file

@ -31,10 +31,18 @@ hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched)
src/coordinator.rs shared state (broker/approvals/questions/transient/ src/coordinator.rs shared state (broker/approvals/questions/transient/
sockets) + tombstone enumeration + kick_agent sockets) + tombstone enumeration + kick_agent
src/actions.rs approve/deny/destroy (transient-aware) src/actions.rs approve/deny/destroy (transient-aware)
src/auto_update.rs startup rebuild scan + ensure_manager src/auto_update.rs startup rebuild scan + ensure_manager +
src/lifecycle.rs `nixos-container` shellouts, per-agent flake generator meta::lock_update_hyperhive bump
src/lifecycle.rs `nixos-container` shellouts; per-agent applied
+ proposed git repo seeding; tag plumbing
src/meta.rs single hive-c0re-owned flake at /var/lib/
hyperhive/meta/ — sync_agents, two-phase
prepare/finalize/abort, lock_update_*
src/migrate.rs startup auto-migration from pre-meta layout
(idempotent, marker-guarded phase 4)
src/dashboard.rs axum HTTP: static shell + /api/state JSON + actions src/dashboard.rs axum HTTP: static shell + /api/state JSON + actions
+ journald viewer + bind-with-retry (SO_REUSEADDR) + journald viewer + bind-with-retry (SO_REUSEADDR)
+ deployed_sha chip per container
assets/ index.html, dashboard.css, app.js (include_str!) assets/ index.html, dashboard.css, app.js (include_str!)
hive-ag3nt/ in-container harness crate; produces TWO binaries hive-ag3nt/ in-container harness crate; produces TWO binaries
@ -114,51 +122,40 @@ read them à la carte.
In-flight or recent context that hasn't earned a section yet. In-flight or recent context that hasn't earned a section yet.
Prune freely. Prune freely.
- **In flight:** meta-flake overhaul. Each agent's applied - **Just landed:** meta-flake overhaul. Each agent's applied
repo becomes a tiny module-only flake (`nixosModules.default repo is a tiny module-only flake (`nixosModules.default =
= import ./agent.nix`); `agent.nix` is just a NixOS module import ./agent.nix`); `agent.nix` is a plain NixOS module
function `{ config, pkgs, lib, ... }: { ... }` — no function — no extendModules, no hyperhive input visible to
extendModules, no hyperhive input visible to the manager. the manager. A single hive-c0re-owned repo at
A single hive-c0re-owned repo at `/var/lib/hyperhive/meta/` `/var/lib/hyperhive/meta/` declares one input per agent
declares one input per agent (pointing at that agent's (pointing at that agent's applied repo via `git+file://`)
applied repo via `git+file://`) and one and one `nixosConfigurations.<n>` output per agent,
`nixosConfigurations.<n>` output per agent, wrapping wrapping `inputs.agent-<n>.nixosModules.default` with the
`inputs.agent-<n>.nixosModules.default` with the identity identity + `HIVE_PORT` / `HIVE_LABEL` /
+ `HIVE_PORT` / `HIVE_LABEL` / `HIVE_DASHBOARD_PORT` `HIVE_DASHBOARD_PORT` injection. Containers run against
injection that today's per-agent `setup_applied` does `meta#<n>`. Every approve runs `nix flake lock
inline. Containers run against `meta#<n>` instead of --update-input agent-<n>` (two-phase: prepare on the
`applied/<n>#default`. Every approval that lands does build path, finalize/abort on the result) — meta's git
`nix flake lock --update-input agent-<n>` in meta and log is the system-wide deploy audit trail; failures and
commits the lock — meta's git log is the system-wide denials live as annotated tags in applied. The manager
deploy audit trail; per-agent tags stay as before for has `/applied` and `/meta` RO-bound and the `applied`
inside-baseball state. remote pre-wired in every proposed repo so `git fetch
- **Companion change:** the manager's `/agents/<n>/config/` applied`, `git show applied/refs/tags/deployed/<id>`,
(proposed) gets `applied` pre-configured as a git remote `git -C /meta log --oneline`, `cat /meta/flake.lock`
pointing at `/applied/<n>/.git` (the RO bind already all just work. Migration runs idempotently on
there). `git fetch applied` / `git show hive-c0re startup (`HIVE_SKIP_META_MIGRATION=1` skips it):
applied/refs/tags/deployed/<id>` / `git rebase rewrites pre-meta applied flakes to module-only, wires
applied/main` etc. all just work from inside the the proposed remote, seeds meta, and repoints every
manager. The manager additionally gets `/meta` RO-bound, container at `meta#<n>` (guarded by a marker so the
so `git -C /meta log --oneline` and expensive phase only runs once).
`cat /meta/flake.lock` answer "what's actually deployed - **Just landed (prior overhaul still underneath):** tag-
across the swarm right now." driven config-apply. Two-repo split (proposed = manager
- **Auto-migration on startup:** new phase before RW, applied = core-only); `request_apply_commit` fetches
`auto_update::run` rewrites each existing the manager's commit into applied and pins it as
`applied/<n>/flake.nix` to the module-only shape + `proposal/<id>`; approve / deny / build walk through
relocates `deployed/0`, adds the `applied` remote to each tags on the same commit; `applied/main` only fast-
proposed repo, bootstraps the meta repo from the agent forwards on `deployed/`. `failed/` + `denied/` are
list if missing, and `nixos-container update`s every annotated. See `docs/approvals.md`.
container to point at `meta#<n>` (no fs wipe, no
re-login). Idempotent; `HIVE_SKIP_META_MIGRATION=1`
defers it.
- **Just landed (prior overhaul still in place):** tag-driven
config-apply. Two-repo split (proposed = manager RW,
applied = core-only); `request_apply_commit` fetches the
manager's commit into applied and pins it as
`proposal/<id>`; approve / deny / build walk through tags
on the same commit; `applied/main` only fast-forwards on
`deployed/`. `failed/` + `denied/` are annotated. See
`docs/approvals.md` for the state machine.
- **Recent (since last compaction):** inline +/- diffs on - **Recent (since last compaction):** inline +/- diffs on
Write/Edit, send full body via collapsed details, operator Write/Edit, send full body via collapsed details, operator
cancel + ttl on questions, deny-with-reason, dashboard cancel + ttl on questions, deny-with-reason, dashboard

View file

@ -26,8 +26,9 @@ host (NixOS, runs hive-c0re.service)
└── nixos-containers (each bind-mounts its socket dir → /run/hive, └── nixos-containers (each bind-mounts its socket dir → /run/hive,
│ credentials dir → /root/.claude, │ credentials dir → /root/.claude,
│ durable notes dir → /state; │ durable notes dir → /state;
│ manager additionally gets /agents RW │ manager additionally gets /agents RW,
│ + /applied RO for the deployed-tag mirror) │ /applied RO (deployed-tag mirror),
│ /meta RO (swarm-wide deploy flake))
├── hm1nd hive-m1nd serve : claude turn loop + ├── hm1nd hive-m1nd serve : claude turn loop +
│ MCP (send / recv / request_spawn / kill / start / │ MCP (send / recv / request_spawn / kill / start /
@ -54,21 +55,30 @@ load; collapsible inbox + collapsible journald viewer + collapsible
`agent.nix` viewer per agent on the dashboard. `agent.nix` viewer per agent on the dashboard.
Config changes flow the other way: manager edits files under Config changes flow the other way: manager edits files under
`/agents/<name>/config/` (`agent.nix` is the entry point, but arbitrary `/agents/<name>/config/``agent.nix` is a plain NixOS module function
sibling files in the commit are preserved) → commits → submits the sha `{ config, pkgs, lib, ... }: { ... }`, and arbitrary sibling files in
via `request_apply_commit`. Hive-c0re immediately fetches that commit the commit are preserved → commits → submits the sha via
from the proposed repo into the applied repo and pins it as `request_apply_commit`. Hive-c0re immediately fetches that commit from
`proposal/<id>` — from this moment the proposal is immutable from the the proposed repo into the applied repo and pins it as `proposal/<id>`
manager's side. Operator clicks ◆ APPR0VE on the dashboard → hive-c0re — immutable from the manager's side from then on. Operator clicks
moves the working tree to the proposal, runs `nixos-container update`, ◆ APPR0VE → hive-c0re fast-forwards `applied/<n>/main` to the proposal,
and either fast-forwards `applied/main` (tagging `deployed/<id>`) or runs `nix flake lock --update-input agent-<n>` against the host-wide
annotates `failed/<id>` with the build error and rolls back to the meta flake at `/var/lib/hyperhive/meta/`, builds via
previous deployed tree. Denials leave a `denied/<id>` annotated tag `nixos-container update <c> --flake meta#<name>`, and either commits
carrying the operator's note. The manager sees everything that the lock + tags `deployed/<id>` on success or `git restore`s the lock +
shipped (or didn't) via a read-only `/applied/<n>/.git` mirror inside annotates `failed/<id>` with the build error + rolls back
its container; `git show applied/deployed/<id>` etc. is the audit `applied/<n>/main` on failure. Denials leave a `denied/<id>` annotated
trail. See [`docs/approvals.md`](docs/approvals.md) for the full tag tag carrying the operator's note.
state machine.
Meta's git log is the swarm-wide deploy audit trail (one commit per
successful deploy). Per-agent applied repos carry the tag-rich state
machine for inside-baseball decisions. The manager sees both — proposed
repos ship with an `applied` remote pre-wired, and `/meta/` is RO-bound
inside the container — so `git fetch applied`,
`git show applied/refs/tags/deployed/<id>`, `git log /meta`,
`cat /meta/flake.lock` all just work without constructing paths by
hand. See [`docs/approvals.md`](docs/approvals.md) for the full state
machine + lock-flow walkthrough.
For decisions the manager needs human signal on, `ask_operator(question, For decisions the manager needs human signal on, `ask_operator(question,
options?, multi?)` queues a free-text/checkbox/radio form on the options?, multi?)` queues a free-text/checkbox/radio form on the
dashboard; the answer arrives later as a `HelperEvent::OperatorAnswered` dashboard; the answer arrives later as a `HelperEvent::OperatorAnswered`

View file

@ -37,26 +37,58 @@ step — the operator just sees the name. On approve, hive-c0re
creates the container in a background task while the dashboard creates the container in a background task while the dashboard
shows a spinner. shows a spinner.
## Meta flake (in flight) ## Meta flake
> The next overhaul (currently being implemented) introduces a The hive-c0re-owned repo at `/var/lib/hyperhive/meta/`
> single hive-c0re-owned meta repo at declares one flake input per agent (`agent-<n>.url =
> `/var/lib/hyperhive/meta/` that consumes every agent's "git+file:///var/lib/hyperhive/applied/<n>"`) and one
> applied repo as a flake input and owns the wrapper `nixosConfigurations.<n>` output per agent. Each output wraps
> nixosConfiguration. Each agent's `applied/<n>/flake.nix` `inputs.agent-<n>.nixosModules.default` with the identity +
> shrinks to `nixosModules.default = import ./agent.nix` `HIVE_PORT` / `HIVE_LABEL` / `HIVE_DASHBOARD_PORT` injection
> `agent.nix` becomes a plain NixOS module function (no module that `setup_applied` used to generate inline.
> extendModules / hyperhive input). Containers will run Containers run against `--flake /var/lib/hyperhive/meta#<n>`.
> against `--flake /var/lib/hyperhive/meta#<n>`. Every
> approval that builds does Per-deploy lock flow (two-phase, owned by
> `nix flake lock --update-input agent-<n>` in meta and `actions::run_apply_commit` → `meta::{prepare,finalize,abort}
> commits the lock; meta's git log is the system-wide deploy _deploy`):
> trail. Manager additionally gets `/applied/<n>/.git`
> pre-registered as the `applied` remote inside its proposed 1. `meta::prepare_deploy(name)` runs
> repo, and `/meta` RO-bound for browsing the deploy log. `nix flake lock --update-input agent-<n>` without
> Auto-migrates on startup. Sections below describe the committing. Working tree of meta now points the input at
> current (still-deployed) tag-driven shape that the meta `applied/<n>/main` (which `run_apply_commit` already
> flake builds on top of. fast-forwarded to `proposal/<id>`).
2. `lifecycle::rebuild_no_meta` runs
`nixos-container update <c> --flake meta#<name>`. Nix
evaluates against the staged lock.
3. On success — `meta::finalize_deploy(name, sha, "deployed/
<id>")` stages `flake.lock` and commits with
`deploy <n> deployed/<id> <sha12>`. Meta's git log gains
one entry per successful deploy.
4. On failure — `meta::abort_deploy()` runs
`git restore flake.lock` so the meta history shows only
successes; the failure stays as an annotated `failed/<id>`
tag in `applied/<n>`.
Single-phase variants exist for paths without
rollback semantics: `meta::lock_update_for_rebuild(name)` for
the manual `↻ R3BU1LD` button (commits if the lock changed)
and `meta::lock_update_hyperhive()` for the
auto-update flake-rev bump (one shot before per-agent
rebuilds, commits if the lock changed).
`meta::sync_agents(hyperhive_flake, dashboard_port, &agents)`
is the idempotent reconciler called by `spawn`, `destroy`,
`rebuild`, and the startup migration. Renders `flake.nix`
from the agent list; if it differs from disk, runs
`nix flake lock` + commits as `regenerate meta flake` (or
`seed meta from N agent(s)` on the very first call).
The manager has `/meta` RO-bound inside its container:
`git -C /meta log --oneline` is the swarm-wide deploy log,
`cat /meta/flake.lock | jq '.nodes["agent-<n>"].locked'`
resolves which sha each agent is pinned at right now.
Dashboard surfaces the same info as a `deployed:<sha12>` chip
per container row.
## Two repos per agent ## Two repos per agent
@ -67,17 +99,23 @@ shows a spinner.
# agent.nix is the # agent.nix is the
# convention entry # convention entry
# point; flake.nix is # point; flake.nix is
# generated and not # tracked boilerplate
# tracked here. # (manager doesn't edit
# it).
/var/lib/hyperhive/applied/<name>/ applied — core-only /var/lib/hyperhive/applied/<name>/ applied — core-only
├── .git/ # tag-rich history ├── .git/ # tag-rich history
├── .gitignore # ignores flake.nix ├── flake.nix # tracked, fixed
├── flake.nix # hive-c0re-generated, │ # boilerplate exporting
│ # untracked, rewritten │ # nixosModules.default
│ # on spawn/rebuild only
├── agent.nix # working tree of main ├── agent.nix # working tree of main
└── <other manager files> # also tracked └── <other manager files> # also tracked
/var/lib/hyperhive/meta/ swarm-wide flake — core
├── .git/ # one commit per successful
│ # deploy
├── flake.nix # generated from agent set
└── flake.lock # pins each agent's sha
``` ```
Why two physical repos: the manager's `/agents/<n>/config/` is Why two physical repos: the manager's `/agents/<n>/config/` is
@ -86,13 +124,12 @@ proposed tree. The applied repo is never bind-mounted (except
the read-only `.git` exposure described below) so a destructive the read-only `.git` exposure described below) so a destructive
move inside the container cannot reach it. move inside the container cannot reach it.
The container's `--flake` ref is `<applied_dir>#default`. The The container's `--flake` ref is `/var/lib/hyperhive/meta#<name>`
generated `flake.nix` extends (see "Meta flake" above). The agent's own `applied/<n>/flake.nix`
`hyperhive.nixosConfigurations.{agent-base|manager}` with is a fixed boilerplate that exports `nixosModules.default =
`./agent.nix` plus an inline module setting import ./agent.nix`; the meta flake imports that module and
`programs.git.config.user` (committer identity = the agent's name) wraps it with identity + `HIVE_PORT` / `HIVE_LABEL` /
and `systemd.services.<harness>.environment` (`HIVE_PORT`, `HIVE_DASHBOARD_PORT`.
`HIVE_LABEL`, `HIVE_DASHBOARD_PORT`).
### Tag state machine ### Tag state machine
@ -114,29 +151,63 @@ approval id to retry. Because tags are first-class git objects,
rejected and failed trees stay browsable forever — `git log rejected and failed trees stay browsable forever — `git log
--tags` in the applied repo is the audit trail. --tags` in the applied repo is the audit trail.
### Manager view of applied ### Manager view of applied + meta
`/applied/` is a **read-only bind-mount** of The manager container gets three host-side bind mounts via
`/var/lib/hyperhive/applied/` (the entire tree) inside the `set_nspawn_flags`:
manager container. The manager fetches tags into its proposed
clone with `git fetch /applied/<n>/.git
'refs/tags/*:refs/tags/applied/*'` and `git show` any
deployed / failed / denied tree to see what actually shipped,
what error blocked the last build, or what note the operator
left on a denial. The RO bind means git plumbing inside the
manager cannot corrupt the applied repos — and a single mount
covers every agent (existing + future) without rebuilding the
manager on each spawn.
## Migration from the pre-tag scheme - `/var/lib/hyperhive/agents/``/agents/` (RW) — proposed
repos. Manager edits + commits per-agent config here.
- `/var/lib/hyperhive/applied/``/applied/` (RO) — every
agent's authoritative applied repo, including `.git`.
- `/var/lib/hyperhive/meta/``/meta/` (RO) — the swarm-wide
deploy flake.
There is no in-place migration. Each existing agent must be Each proposed repo (`/agents/<n>/config/`) is pre-configured
purged and re-spawned: `hive-c0re destroy --purge <name>` (or with `applied` as a git remote pointing at
PURG3 on the dashboard), then `request_spawn` and the operator `/applied/<n>/.git`. Useful incantations from inside the
approves the fresh agent. The new agent starts with `deployed/0` manager:
seeded by hive-c0re; the manager's first config edit becomes
`proposal/1` and walks the tag scheme from there. Pre-overhaul ```sh
tombstones lose their config history. git -C /agents/<n>/config fetch applied
git -C /agents/<n>/config log applied/main --oneline
git -C /agents/<n>/config show applied/refs/tags/deployed/<id>
git -C /agents/<n>/config show applied/refs/tags/failed/<id> # body = build error
git -C /agents/<n>/config show applied/refs/tags/denied/<id> # body = operator note
git -C /agents/<n>/config rebase applied/main # base in-flight work on what's deployed
git -C /meta log --oneline # swarm-wide deploy history
cat /meta/flake.lock | jq '.nodes | with_entries(select(.key | startswith("agent-")))'
```
The RO binds block push at the kernel level, so the manager
can only fetch / read — git plumbing inside the container
cannot corrupt either authoritative repo.
## Migration from the pre-tag / pre-meta schemes
Both overhauls (tag-driven flow + meta flake) ship in-place
migrations that run on every hive-c0re startup. Idempotent;
each phase is a no-op once already applied. Behaviour:
- Tag-driven phase: assumes the operator ran the one-shot
`git tag deployed/0 main` script (see commit history /
earlier docs revisions) once per agent. Tagging is
non-destructive: it doesn't touch live containers, state
dirs, or claude creds.
- Meta-flake phase: rewrites each `applied/<n>/flake.nix` to
the module-only boilerplate, wires the `applied` remote in
each proposed repo, bootstraps the meta repo from the
current agent list, and `nixos-container update`s every
container at `meta#<n>`. The expensive last step is
guarded by `/var/lib/hyperhive/.meta-migration-done` so
it only runs once across hive-c0re restarts. Set
`HIVE_SKIP_META_MIGRATION=1` on the service to defer.
No state loss in either migration. claude creds, /state/
notes, the events DB, proposed history, and applied history
all survive. The manager keeps its session; sub-agents stay
logged in.
## Manager (`hm1nd`) is hive-c0re-managed ## Manager (`hm1nd`) is hive-c0re-managed

View file

@ -67,8 +67,25 @@ Under `/var/lib/hyperhive/agents/<name>/`:
to `/state` inside the container. to `/state` inside the container.
Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
applied repo (`flake.nix` + `agent.nix`) that the container applied repo. Tracks `flake.nix` (module-only boilerplate; never
actually builds from. edited after first spawn) + `agent.nix` (the actual config; the
manager's edits land here via the approval flow) + any other
files the manager committed. `.git/` carries the proposal /
approved / building / deployed / failed / denied tag history.
Under `/var/lib/hyperhive/meta/` — the swarm-wide deploy flake.
Single repo for the whole host; `flake.nix` declares one input
per agent + one `nixosConfigurations.<n>` output per agent;
`flake.lock` is the canonical "what's deployed where." The git
log is the deploy audit trail (one commit per successful
deploy or hyperhive bump). Manager has this RO-mounted at
`/meta/`.
Marker file `/var/lib/hyperhive/.meta-migration-done` is
written by the startup migration after every container has
been repointed at `meta#<n>`. Removing it forces a re-run on
next hive-c0re start (idempotent — only the actual repoint
step would re-fire).
## Destroy vs purge ## Destroy vs purge

View file

@ -80,20 +80,6 @@ pub fn is_manager(name: &str) -> bool {
name == MANAGER_NAME name == MANAGER_NAME
} }
/// The nixosConfiguration in the hyperhive flake the agent's
/// wrapper extends. Manager → `manager`; everyone else →
/// `agent-base`. Used by the meta-flake generator to know which
/// base to extend per agent.
#[must_use]
#[allow(dead_code)] // wired up by the meta module in a follow-up commit
pub fn flake_base(name: &str) -> &'static str {
if is_manager(name) {
"manager"
} else {
"agent-base"
}
}
fn validate(name: &str) -> Result<()> { fn validate(name: &str) -> Result<()> {
if name.is_empty() { if name.is_empty() {
bail!("agent name must not be empty"); bail!("agent name must not be empty");