final docs + cleanup sync for meta-flake era
claude.md flips 'in flight' → 'just landed' for the meta overhaul + extends the file map with meta.rs and migrate.rs. docs/approvals.md replaces the in-flight callout with a proper 'Meta flake' section (two-phase deploy walkthrough, sync_agents semantics, single-phase variants), updates the two-repo box diagram to include the /var/lib/hyperhive/meta/ tree and tracks flake.nix in applied, rewrites the container --flake reference to meta#<name>, replaces the 'Manager view of applied' section with a unified '/agents + /applied + /meta' inventory listing every useful git incantation, and explains the in-place no-state-loss migration that now runs on hive-c0re startup. docs/persistence.md grows entries for the meta repo + the .meta-migration-done marker. readme box diagram picks up the /meta RO bind; approval-flow paragraph rewritten end to end to describe the meta lock dance. lifecycle::flake_base deleted — the meta render hardcodes the manager vs agent-base choice as nix expression.
This commit is contained in:
parent
2f6ecc4dc0
commit
14aa7c7acc
5 changed files with 213 additions and 132 deletions
|
|
@ -37,26 +37,58 @@ step — the operator just sees the name. On approve, hive-c0re
|
|||
creates the container in a background task while the dashboard
|
||||
shows a spinner.
|
||||
|
||||
## Meta flake (in flight)
|
||||
## Meta flake
|
||||
|
||||
> The next overhaul (currently being implemented) introduces a
|
||||
> single hive-c0re-owned meta repo at
|
||||
> `/var/lib/hyperhive/meta/` that consumes every agent's
|
||||
> applied repo as a flake input and owns the wrapper
|
||||
> nixosConfiguration. Each agent's `applied/<n>/flake.nix`
|
||||
> shrinks to `nixosModules.default = import ./agent.nix` —
|
||||
> `agent.nix` becomes a plain NixOS module function (no
|
||||
> extendModules / hyperhive input). Containers will run
|
||||
> against `--flake /var/lib/hyperhive/meta#<n>`. Every
|
||||
> approval that builds does
|
||||
> `nix flake lock --update-input agent-<n>` in meta and
|
||||
> commits the lock; meta's git log is the system-wide deploy
|
||||
> trail. Manager additionally gets `/applied/<n>/.git`
|
||||
> pre-registered as the `applied` remote inside its proposed
|
||||
> repo, and `/meta` RO-bound for browsing the deploy log.
|
||||
> Auto-migrates on startup. Sections below describe the
|
||||
> current (still-deployed) tag-driven shape that the meta
|
||||
> flake builds on top of.
|
||||
The hive-c0re-owned repo at `/var/lib/hyperhive/meta/`
|
||||
declares one flake input per agent (`agent-<n>.url =
|
||||
"git+file:///var/lib/hyperhive/applied/<n>"`) and one
|
||||
`nixosConfigurations.<n>` output per agent. Each output wraps
|
||||
`inputs.agent-<n>.nixosModules.default` with the identity +
|
||||
`HIVE_PORT` / `HIVE_LABEL` / `HIVE_DASHBOARD_PORT` injection
|
||||
module that `setup_applied` used to generate inline.
|
||||
Containers run against `--flake /var/lib/hyperhive/meta#<n>`.
|
||||
|
||||
Per-deploy lock flow (two-phase, owned by
|
||||
`actions::run_apply_commit` → `meta::{prepare,finalize,abort}
|
||||
_deploy`):
|
||||
|
||||
1. `meta::prepare_deploy(name)` runs
|
||||
`nix flake lock --update-input agent-<n>` without
|
||||
committing. Working tree of meta now points the input at
|
||||
`applied/<n>/main` (which `run_apply_commit` already
|
||||
fast-forwarded to `proposal/<id>`).
|
||||
2. `lifecycle::rebuild_no_meta` runs
|
||||
`nixos-container update <c> --flake meta#<name>`. Nix
|
||||
evaluates against the staged lock.
|
||||
3. On success — `meta::finalize_deploy(name, sha, "deployed/
|
||||
<id>")` stages `flake.lock` and commits with
|
||||
`deploy <n> deployed/<id> <sha12>`. Meta's git log gains
|
||||
one entry per successful deploy.
|
||||
4. On failure — `meta::abort_deploy()` runs
|
||||
`git restore flake.lock` so the meta history shows only
|
||||
successes; the failure stays as an annotated `failed/<id>`
|
||||
tag in `applied/<n>`.
|
||||
|
||||
Single-phase variants exist for paths without
|
||||
rollback semantics: `meta::lock_update_for_rebuild(name)` for
|
||||
the manual `↻ R3BU1LD` button (commits if the lock changed)
|
||||
and `meta::lock_update_hyperhive()` for the
|
||||
auto-update flake-rev bump (one shot before per-agent
|
||||
rebuilds, commits if the lock changed).
|
||||
|
||||
`meta::sync_agents(hyperhive_flake, dashboard_port, &agents)`
|
||||
is the idempotent reconciler called by `spawn`, `destroy`,
|
||||
`rebuild`, and the startup migration. Renders `flake.nix`
|
||||
from the agent list; if it differs from disk, runs
|
||||
`nix flake lock` + commits as `regenerate meta flake` (or
|
||||
`seed meta from N agent(s)` on the very first call).
|
||||
|
||||
The manager has `/meta` RO-bound inside its container:
|
||||
`git -C /meta log --oneline` is the swarm-wide deploy log,
|
||||
`cat /meta/flake.lock | jq '.nodes["agent-<n>"].locked'`
|
||||
resolves which sha each agent is pinned at right now.
|
||||
Dashboard surfaces the same info as a `deployed:<sha12>` chip
|
||||
per container row.
|
||||
|
||||
## Two repos per agent
|
||||
|
||||
|
|
@ -67,17 +99,23 @@ shows a spinner.
|
|||
# agent.nix is the
|
||||
# convention entry
|
||||
# point; flake.nix is
|
||||
# generated and not
|
||||
# tracked here.
|
||||
# tracked boilerplate
|
||||
# (manager doesn't edit
|
||||
# it).
|
||||
|
||||
/var/lib/hyperhive/applied/<name>/ applied — core-only
|
||||
├── .git/ # tag-rich history
|
||||
├── .gitignore # ignores flake.nix
|
||||
├── flake.nix # hive-c0re-generated,
|
||||
│ # untracked, rewritten
|
||||
│ # on spawn/rebuild only
|
||||
├── flake.nix # tracked, fixed
|
||||
│ # boilerplate exporting
|
||||
│ # nixosModules.default
|
||||
├── agent.nix # working tree of main
|
||||
└── <other manager files> # also tracked
|
||||
|
||||
/var/lib/hyperhive/meta/ swarm-wide flake — core
|
||||
├── .git/ # one commit per successful
|
||||
│ # deploy
|
||||
├── flake.nix # generated from agent set
|
||||
└── flake.lock # pins each agent's sha
|
||||
```
|
||||
|
||||
Why two physical repos: the manager's `/agents/<n>/config/` is
|
||||
|
|
@ -86,13 +124,12 @@ proposed tree. The applied repo is never bind-mounted (except
|
|||
the read-only `.git` exposure described below) so a destructive
|
||||
move inside the container cannot reach it.
|
||||
|
||||
The container's `--flake` ref is `<applied_dir>#default`. The
|
||||
generated `flake.nix` extends
|
||||
`hyperhive.nixosConfigurations.{agent-base|manager}` with
|
||||
`./agent.nix` plus an inline module setting
|
||||
`programs.git.config.user` (committer identity = the agent's name)
|
||||
and `systemd.services.<harness>.environment` (`HIVE_PORT`,
|
||||
`HIVE_LABEL`, `HIVE_DASHBOARD_PORT`).
|
||||
The container's `--flake` ref is `/var/lib/hyperhive/meta#<name>`
|
||||
(see "Meta flake" above). The agent's own `applied/<n>/flake.nix`
|
||||
is a fixed boilerplate that exports `nixosModules.default =
|
||||
import ./agent.nix`; the meta flake imports that module and
|
||||
wraps it with identity + `HIVE_PORT` / `HIVE_LABEL` /
|
||||
`HIVE_DASHBOARD_PORT`.
|
||||
|
||||
### Tag state machine
|
||||
|
||||
|
|
@ -114,29 +151,63 @@ approval id to retry. Because tags are first-class git objects,
|
|||
rejected and failed trees stay browsable forever — `git log
|
||||
--tags` in the applied repo is the audit trail.
|
||||
|
||||
### Manager view of applied
|
||||
### Manager view of applied + meta
|
||||
|
||||
`/applied/` is a **read-only bind-mount** of
|
||||
`/var/lib/hyperhive/applied/` (the entire tree) inside the
|
||||
manager container. The manager fetches tags into its proposed
|
||||
clone with `git fetch /applied/<n>/.git
|
||||
'refs/tags/*:refs/tags/applied/*'` and `git show` any
|
||||
deployed / failed / denied tree to see what actually shipped,
|
||||
what error blocked the last build, or what note the operator
|
||||
left on a denial. The RO bind means git plumbing inside the
|
||||
manager cannot corrupt the applied repos — and a single mount
|
||||
covers every agent (existing + future) without rebuilding the
|
||||
manager on each spawn.
|
||||
The manager container gets three host-side bind mounts via
|
||||
`set_nspawn_flags`:
|
||||
|
||||
## Migration from the pre-tag scheme
|
||||
- `/var/lib/hyperhive/agents/` → `/agents/` (RW) — proposed
|
||||
repos. Manager edits + commits per-agent config here.
|
||||
- `/var/lib/hyperhive/applied/` → `/applied/` (RO) — every
|
||||
agent's authoritative applied repo, including `.git`.
|
||||
- `/var/lib/hyperhive/meta/` → `/meta/` (RO) — the swarm-wide
|
||||
deploy flake.
|
||||
|
||||
There is no in-place migration. Each existing agent must be
|
||||
purged and re-spawned: `hive-c0re destroy --purge <name>` (or
|
||||
PURG3 on the dashboard), then `request_spawn` and the operator
|
||||
approves the fresh agent. The new agent starts with `deployed/0`
|
||||
seeded by hive-c0re; the manager's first config edit becomes
|
||||
`proposal/1` and walks the tag scheme from there. Pre-overhaul
|
||||
tombstones lose their config history.
|
||||
Each proposed repo (`/agents/<n>/config/`) is pre-configured
|
||||
with `applied` as a git remote pointing at
|
||||
`/applied/<n>/.git`. Useful incantations from inside the
|
||||
manager:
|
||||
|
||||
```sh
|
||||
git -C /agents/<n>/config fetch applied
|
||||
git -C /agents/<n>/config log applied/main --oneline
|
||||
git -C /agents/<n>/config show applied/refs/tags/deployed/<id>
|
||||
git -C /agents/<n>/config show applied/refs/tags/failed/<id> # body = build error
|
||||
git -C /agents/<n>/config show applied/refs/tags/denied/<id> # body = operator note
|
||||
git -C /agents/<n>/config rebase applied/main # base in-flight work on what's deployed
|
||||
|
||||
git -C /meta log --oneline # swarm-wide deploy history
|
||||
cat /meta/flake.lock | jq '.nodes | with_entries(select(.key | startswith("agent-")))'
|
||||
```
|
||||
|
||||
The RO binds block push at the kernel level, so the manager
|
||||
can only fetch / read — git plumbing inside the container
|
||||
cannot corrupt either authoritative repo.
|
||||
|
||||
## Migration from the pre-tag / pre-meta schemes
|
||||
|
||||
Both overhauls (tag-driven flow + meta flake) ship in-place
|
||||
migrations that run on every hive-c0re startup. Idempotent;
|
||||
each phase is a no-op once already applied. Behaviour:
|
||||
|
||||
- Tag-driven phase: assumes the operator ran the one-shot
|
||||
`git tag deployed/0 main` script (see commit history /
|
||||
earlier docs revisions) once per agent. Tagging is
|
||||
non-destructive: it doesn't touch live containers, state
|
||||
dirs, or claude creds.
|
||||
- Meta-flake phase: rewrites each `applied/<n>/flake.nix` to
|
||||
the module-only boilerplate, wires the `applied` remote in
|
||||
each proposed repo, bootstraps the meta repo from the
|
||||
current agent list, and `nixos-container update`s every
|
||||
container at `meta#<n>`. The expensive last step is
|
||||
guarded by `/var/lib/hyperhive/.meta-migration-done` so
|
||||
it only runs once across hive-c0re restarts. Set
|
||||
`HIVE_SKIP_META_MIGRATION=1` on the service to defer.
|
||||
|
||||
No state loss in either migration. claude creds, /state/
|
||||
notes, the events DB, proposed history, and applied history
|
||||
all survive. The manager keeps its session; sub-agents stay
|
||||
logged in.
|
||||
|
||||
## Manager (`hm1nd`) is hive-c0re-managed
|
||||
|
||||
|
|
|
|||
|
|
@ -67,8 +67,25 @@ Under `/var/lib/hyperhive/agents/<name>/`:
|
|||
to `/state` inside the container.
|
||||
|
||||
Under `/var/lib/hyperhive/applied/<name>/` — the hive-c0re-only
|
||||
applied repo (`flake.nix` + `agent.nix`) that the container
|
||||
actually builds from.
|
||||
applied repo. Tracks `flake.nix` (module-only boilerplate; never
|
||||
edited after first spawn) + `agent.nix` (the actual config; the
|
||||
manager's edits land here via the approval flow) + any other
|
||||
files the manager committed. `.git/` carries the proposal /
|
||||
approved / building / deployed / failed / denied tag history.
|
||||
|
||||
Under `/var/lib/hyperhive/meta/` — the swarm-wide deploy flake.
|
||||
Single repo for the whole host; `flake.nix` declares one input
|
||||
per agent + one `nixosConfigurations.<n>` output per agent;
|
||||
`flake.lock` is the canonical "what's deployed where." The git
|
||||
log is the deploy audit trail (one commit per successful
|
||||
deploy or hyperhive bump). Manager has this RO-mounted at
|
||||
`/meta/`.
|
||||
|
||||
Marker file `/var/lib/hyperhive/.meta-migration-done` is
|
||||
written by the startup migration after every container has
|
||||
been repointed at `meta#<n>`. Removing it forces a re-run on
|
||||
next hive-c0re start (idempotent — only the actual repoint
|
||||
step would re-fire).
|
||||
|
||||
## Destroy vs purge
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue