diff --git a/CLAUDE.md b/CLAUDE.md index ea18516..631ef25 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -31,10 +31,18 @@ hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched) src/coordinator.rs shared state (broker/approvals/questions/transient/ sockets) + tombstone enumeration + kick_agent src/actions.rs approve/deny/destroy (transient-aware) - src/auto_update.rs startup rebuild scan + ensure_manager - src/lifecycle.rs `nixos-container` shellouts, per-agent flake generator + src/auto_update.rs startup rebuild scan + ensure_manager + + meta::lock_update_hyperhive bump + src/lifecycle.rs `nixos-container` shellouts; per-agent applied + + proposed git repo seeding; tag plumbing + src/meta.rs single hive-c0re-owned flake at /var/lib/ + hyperhive/meta/ — sync_agents, two-phase + prepare/finalize/abort, lock_update_* + src/migrate.rs startup auto-migration from pre-meta layout + (idempotent, marker-guarded phase 4) src/dashboard.rs axum HTTP: static shell + /api/state JSON + actions + journald viewer + bind-with-retry (SO_REUSEADDR) + + deployed_sha chip per container assets/ index.html, dashboard.css, app.js (include_str!) hive-ag3nt/ in-container harness crate; produces TWO binaries @@ -114,51 +122,40 @@ read them à la carte. In-flight or recent context that hasn't earned a section yet. Prune freely. -- **In flight:** meta-flake overhaul. Each agent's applied - repo becomes a tiny module-only flake (`nixosModules.default - = import ./agent.nix`); `agent.nix` is just a NixOS module - function `{ config, pkgs, lib, ... }: { ... }` — no - extendModules, no hyperhive input visible to the manager. - A single hive-c0re-owned repo at `/var/lib/hyperhive/meta/` - declares one input per agent (pointing at that agent's - applied repo via `git+file://`) and one - `nixosConfigurations.` output per agent, wrapping - `inputs.agent-.nixosModules.default` with the identity - + `HIVE_PORT` / `HIVE_LABEL` / `HIVE_DASHBOARD_PORT` - injection that today's per-agent `setup_applied` does - inline. Containers run against `meta#` instead of - `applied/#default`. Every approval that lands does - `nix flake lock --update-input agent-` in meta and - commits the lock — meta's git log is the system-wide - deploy audit trail; per-agent tags stay as before for - inside-baseball state. -- **Companion change:** the manager's `/agents//config/` - (proposed) gets `applied` pre-configured as a git remote - pointing at `/applied//.git` (the RO bind already - there). `git fetch applied` / `git show - applied/refs/tags/deployed/` / `git rebase - applied/main` etc. all just work from inside the - manager. The manager additionally gets `/meta` RO-bound, - so `git -C /meta log --oneline` and - `cat /meta/flake.lock` answer "what's actually deployed - across the swarm right now." -- **Auto-migration on startup:** new phase before - `auto_update::run` rewrites each existing - `applied//flake.nix` to the module-only shape + - relocates `deployed/0`, adds the `applied` remote to each - proposed repo, bootstraps the meta repo from the agent - list if missing, and `nixos-container update`s every - container to point at `meta#` (no fs wipe, no - re-login). Idempotent; `HIVE_SKIP_META_MIGRATION=1` - defers it. -- **Just landed (prior overhaul still in place):** tag-driven - config-apply. Two-repo split (proposed = manager RW, - applied = core-only); `request_apply_commit` fetches the - manager's commit into applied and pins it as - `proposal/`; approve / deny / build walk through tags - on the same commit; `applied/main` only fast-forwards on - `deployed/`. `failed/` + `denied/` are annotated. See - `docs/approvals.md` for the state machine. +- **Just landed:** meta-flake overhaul. Each agent's applied + repo is a tiny module-only flake (`nixosModules.default = + import ./agent.nix`); `agent.nix` is a plain NixOS module + function — no extendModules, no hyperhive input visible to + the manager. A single hive-c0re-owned repo at + `/var/lib/hyperhive/meta/` declares one input per agent + (pointing at that agent's applied repo via `git+file://`) + and one `nixosConfigurations.` output per agent, + wrapping `inputs.agent-.nixosModules.default` with the + identity + `HIVE_PORT` / `HIVE_LABEL` / + `HIVE_DASHBOARD_PORT` injection. Containers run against + `meta#`. Every approve runs `nix flake lock + --update-input agent-` (two-phase: prepare on the + build path, finalize/abort on the result) — meta's git + log is the system-wide deploy audit trail; failures and + denials live as annotated tags in applied. The manager + has `/applied` and `/meta` RO-bound and the `applied` + remote pre-wired in every proposed repo so `git fetch + applied`, `git show applied/refs/tags/deployed/`, + `git -C /meta log --oneline`, `cat /meta/flake.lock` + all just work. Migration runs idempotently on + hive-c0re startup (`HIVE_SKIP_META_MIGRATION=1` skips it): + rewrites pre-meta applied flakes to module-only, wires + the proposed remote, seeds meta, and repoints every + container at `meta#` (guarded by a marker so the + expensive phase only runs once). +- **Just landed (prior overhaul still underneath):** tag- + driven config-apply. Two-repo split (proposed = manager + RW, applied = core-only); `request_apply_commit` fetches + the manager's commit into applied and pins it as + `proposal/`; approve / deny / build walk through + tags on the same commit; `applied/main` only fast- + forwards on `deployed/`. `failed/` + `denied/` are + annotated. See `docs/approvals.md`. - **Recent (since last compaction):** inline +/- diffs on Write/Edit, send full body via collapsed details, operator cancel + ttl on questions, deny-with-reason, dashboard diff --git a/README.md b/README.md index 614c3f0..5fbcf39 100644 --- a/README.md +++ b/README.md @@ -26,8 +26,9 @@ host (NixOS, runs hive-c0re.service) └── nixos-containers (each bind-mounts its socket dir → /run/hive, │ credentials dir → /root/.claude, │ durable notes dir → /state; - │ manager additionally gets /agents RW - │ + /applied RO for the deployed-tag mirror) + │ manager additionally gets /agents RW, + │ /applied RO (deployed-tag mirror), + │ /meta RO (swarm-wide deploy flake)) │ ├── hm1nd hive-m1nd serve : claude turn loop + │ MCP (send / recv / request_spawn / kill / start / @@ -54,21 +55,30 @@ load; collapsible inbox + collapsible journald viewer + collapsible `agent.nix` viewer per agent on the dashboard. Config changes flow the other way: manager edits files under -`/agents//config/` (`agent.nix` is the entry point, but arbitrary -sibling files in the commit are preserved) → commits → submits the sha -via `request_apply_commit`. Hive-c0re immediately fetches that commit -from the proposed repo into the applied repo and pins it as -`proposal/` — from this moment the proposal is immutable from the -manager's side. Operator clicks ◆ APPR0VE on the dashboard → hive-c0re -moves the working tree to the proposal, runs `nixos-container update`, -and either fast-forwards `applied/main` (tagging `deployed/`) or -annotates `failed/` with the build error and rolls back to the -previous deployed tree. Denials leave a `denied/` annotated tag -carrying the operator's note. The manager sees everything that -shipped (or didn't) via a read-only `/applied//.git` mirror inside -its container; `git show applied/deployed/` etc. is the audit -trail. See [`docs/approvals.md`](docs/approvals.md) for the full tag -state machine. +`/agents//config/` — `agent.nix` is a plain NixOS module function +`{ config, pkgs, lib, ... }: { ... }`, and arbitrary sibling files in +the commit are preserved → commits → submits the sha via +`request_apply_commit`. Hive-c0re immediately fetches that commit from +the proposed repo into the applied repo and pins it as `proposal/` +— immutable from the manager's side from then on. Operator clicks +◆ APPR0VE → hive-c0re fast-forwards `applied//main` to the proposal, +runs `nix flake lock --update-input agent-` against the host-wide +meta flake at `/var/lib/hyperhive/meta/`, builds via +`nixos-container update --flake meta#`, and either commits +the lock + tags `deployed/` on success or `git restore`s the lock + +annotates `failed/` with the build error + rolls back +`applied//main` on failure. Denials leave a `denied/` annotated +tag carrying the operator's note. + +Meta's git log is the swarm-wide deploy audit trail (one commit per +successful deploy). Per-agent applied repos carry the tag-rich state +machine for inside-baseball decisions. The manager sees both — proposed +repos ship with an `applied` remote pre-wired, and `/meta/` is RO-bound +inside the container — so `git fetch applied`, +`git show applied/refs/tags/deployed/`, `git log /meta`, +`cat /meta/flake.lock` all just work without constructing paths by +hand. See [`docs/approvals.md`](docs/approvals.md) for the full state +machine + lock-flow walkthrough. For decisions the manager needs human signal on, `ask_operator(question, options?, multi?)` queues a free-text/checkbox/radio form on the dashboard; the answer arrives later as a `HelperEvent::OperatorAnswered` diff --git a/docs/approvals.md b/docs/approvals.md index f935de1..de35a2f 100644 --- a/docs/approvals.md +++ b/docs/approvals.md @@ -37,26 +37,58 @@ step — the operator just sees the name. On approve, hive-c0re creates the container in a background task while the dashboard shows a spinner. -## Meta flake (in flight) +## Meta flake -> The next overhaul (currently being implemented) introduces a -> single hive-c0re-owned meta repo at -> `/var/lib/hyperhive/meta/` that consumes every agent's -> applied repo as a flake input and owns the wrapper -> nixosConfiguration. Each agent's `applied//flake.nix` -> shrinks to `nixosModules.default = import ./agent.nix` — -> `agent.nix` becomes a plain NixOS module function (no -> extendModules / hyperhive input). Containers will run -> against `--flake /var/lib/hyperhive/meta#`. Every -> approval that builds does -> `nix flake lock --update-input agent-` in meta and -> commits the lock; meta's git log is the system-wide deploy -> trail. Manager additionally gets `/applied//.git` -> pre-registered as the `applied` remote inside its proposed -> repo, and `/meta` RO-bound for browsing the deploy log. -> Auto-migrates on startup. Sections below describe the -> current (still-deployed) tag-driven shape that the meta -> flake builds on top of. +The hive-c0re-owned repo at `/var/lib/hyperhive/meta/` +declares one flake input per agent (`agent-.url = +"git+file:///var/lib/hyperhive/applied/"`) and one +`nixosConfigurations.` output per agent. Each output wraps +`inputs.agent-.nixosModules.default` with the identity + +`HIVE_PORT` / `HIVE_LABEL` / `HIVE_DASHBOARD_PORT` injection +module that `setup_applied` used to generate inline. +Containers run against `--flake /var/lib/hyperhive/meta#`. + +Per-deploy lock flow (two-phase, owned by +`actions::run_apply_commit` → `meta::{prepare,finalize,abort} +_deploy`): + +1. `meta::prepare_deploy(name)` runs + `nix flake lock --update-input agent-` without + committing. Working tree of meta now points the input at + `applied//main` (which `run_apply_commit` already + fast-forwarded to `proposal/`). +2. `lifecycle::rebuild_no_meta` runs + `nixos-container update --flake meta#`. Nix + evaluates against the staged lock. +3. On success — `meta::finalize_deploy(name, sha, "deployed/ + ")` stages `flake.lock` and commits with + `deploy deployed/ `. Meta's git log gains + one entry per successful deploy. +4. On failure — `meta::abort_deploy()` runs + `git restore flake.lock` so the meta history shows only + successes; the failure stays as an annotated `failed/` + tag in `applied/`. + +Single-phase variants exist for paths without +rollback semantics: `meta::lock_update_for_rebuild(name)` for +the manual `↻ R3BU1LD` button (commits if the lock changed) +and `meta::lock_update_hyperhive()` for the +auto-update flake-rev bump (one shot before per-agent +rebuilds, commits if the lock changed). + +`meta::sync_agents(hyperhive_flake, dashboard_port, &agents)` +is the idempotent reconciler called by `spawn`, `destroy`, +`rebuild`, and the startup migration. Renders `flake.nix` +from the agent list; if it differs from disk, runs +`nix flake lock` + commits as `regenerate meta flake` (or +`seed meta from N agent(s)` on the very first call). + +The manager has `/meta` RO-bound inside its container: +`git -C /meta log --oneline` is the swarm-wide deploy log, +`cat /meta/flake.lock | jq '.nodes["agent-"].locked'` +resolves which sha each agent is pinned at right now. +Dashboard surfaces the same info as a `deployed:` chip +per container row. ## Two repos per agent @@ -67,17 +99,23 @@ shows a spinner. # agent.nix is the # convention entry # point; flake.nix is - # generated and not - # tracked here. + # tracked boilerplate + # (manager doesn't edit + # it). /var/lib/hyperhive/applied// applied — core-only ├── .git/ # tag-rich history -├── .gitignore # ignores flake.nix -├── flake.nix # hive-c0re-generated, -│ # untracked, rewritten -│ # on spawn/rebuild only +├── flake.nix # tracked, fixed +│ # boilerplate exporting +│ # nixosModules.default ├── agent.nix # working tree of main └── # also tracked + +/var/lib/hyperhive/meta/ swarm-wide flake — core +├── .git/ # one commit per successful +│ # deploy +├── flake.nix # generated from agent set +└── flake.lock # pins each agent's sha ``` Why two physical repos: the manager's `/agents//config/` is @@ -86,13 +124,12 @@ proposed tree. The applied repo is never bind-mounted (except the read-only `.git` exposure described below) so a destructive move inside the container cannot reach it. -The container's `--flake` ref is `#default`. The -generated `flake.nix` extends -`hyperhive.nixosConfigurations.{agent-base|manager}` with -`./agent.nix` plus an inline module setting -`programs.git.config.user` (committer identity = the agent's name) -and `systemd.services..environment` (`HIVE_PORT`, -`HIVE_LABEL`, `HIVE_DASHBOARD_PORT`). +The container's `--flake` ref is `/var/lib/hyperhive/meta#` +(see "Meta flake" above). The agent's own `applied//flake.nix` +is a fixed boilerplate that exports `nixosModules.default = +import ./agent.nix`; the meta flake imports that module and +wraps it with identity + `HIVE_PORT` / `HIVE_LABEL` / +`HIVE_DASHBOARD_PORT`. ### Tag state machine @@ -114,29 +151,63 @@ approval id to retry. Because tags are first-class git objects, rejected and failed trees stay browsable forever — `git log --tags` in the applied repo is the audit trail. -### Manager view of applied +### Manager view of applied + meta -`/applied/` is a **read-only bind-mount** of -`/var/lib/hyperhive/applied/` (the entire tree) inside the -manager container. The manager fetches tags into its proposed -clone with `git fetch /applied//.git -'refs/tags/*:refs/tags/applied/*'` and `git show` any -deployed / failed / denied tree to see what actually shipped, -what error blocked the last build, or what note the operator -left on a denial. The RO bind means git plumbing inside the -manager cannot corrupt the applied repos — and a single mount -covers every agent (existing + future) without rebuilding the -manager on each spawn. +The manager container gets three host-side bind mounts via +`set_nspawn_flags`: -## Migration from the pre-tag scheme +- `/var/lib/hyperhive/agents/` → `/agents/` (RW) — proposed + repos. Manager edits + commits per-agent config here. +- `/var/lib/hyperhive/applied/` → `/applied/` (RO) — every + agent's authoritative applied repo, including `.git`. +- `/var/lib/hyperhive/meta/` → `/meta/` (RO) — the swarm-wide + deploy flake. -There is no in-place migration. Each existing agent must be -purged and re-spawned: `hive-c0re destroy --purge ` (or -PURG3 on the dashboard), then `request_spawn` and the operator -approves the fresh agent. The new agent starts with `deployed/0` -seeded by hive-c0re; the manager's first config edit becomes -`proposal/1` and walks the tag scheme from there. Pre-overhaul -tombstones lose their config history. +Each proposed repo (`/agents//config/`) is pre-configured +with `applied` as a git remote pointing at +`/applied//.git`. Useful incantations from inside the +manager: + +```sh +git -C /agents//config fetch applied +git -C /agents//config log applied/main --oneline +git -C /agents//config show applied/refs/tags/deployed/ +git -C /agents//config show applied/refs/tags/failed/ # body = build error +git -C /agents//config show applied/refs/tags/denied/ # body = operator note +git -C /agents//config rebase applied/main # base in-flight work on what's deployed + +git -C /meta log --oneline # swarm-wide deploy history +cat /meta/flake.lock | jq '.nodes | with_entries(select(.key | startswith("agent-")))' +``` + +The RO binds block push at the kernel level, so the manager +can only fetch / read — git plumbing inside the container +cannot corrupt either authoritative repo. + +## Migration from the pre-tag / pre-meta schemes + +Both overhauls (tag-driven flow + meta flake) ship in-place +migrations that run on every hive-c0re startup. Idempotent; +each phase is a no-op once already applied. Behaviour: + +- Tag-driven phase: assumes the operator ran the one-shot + `git tag deployed/0 main` script (see commit history / + earlier docs revisions) once per agent. Tagging is + non-destructive: it doesn't touch live containers, state + dirs, or claude creds. +- Meta-flake phase: rewrites each `applied//flake.nix` to + the module-only boilerplate, wires the `applied` remote in + each proposed repo, bootstraps the meta repo from the + current agent list, and `nixos-container update`s every + container at `meta#`. The expensive last step is + guarded by `/var/lib/hyperhive/.meta-migration-done` so + it only runs once across hive-c0re restarts. Set + `HIVE_SKIP_META_MIGRATION=1` on the service to defer. + +No state loss in either migration. claude creds, /state/ +notes, the events DB, proposed history, and applied history +all survive. The manager keeps its session; sub-agents stay +logged in. ## Manager (`hm1nd`) is hive-c0re-managed diff --git a/docs/persistence.md b/docs/persistence.md index 69cfe6b..aeb90fd 100644 --- a/docs/persistence.md +++ b/docs/persistence.md @@ -67,8 +67,25 @@ Under `/var/lib/hyperhive/agents//`: to `/state` inside the container. Under `/var/lib/hyperhive/applied//` — the hive-c0re-only -applied repo (`flake.nix` + `agent.nix`) that the container -actually builds from. +applied repo. Tracks `flake.nix` (module-only boilerplate; never +edited after first spawn) + `agent.nix` (the actual config; the +manager's edits land here via the approval flow) + any other +files the manager committed. `.git/` carries the proposal / +approved / building / deployed / failed / denied tag history. + +Under `/var/lib/hyperhive/meta/` — the swarm-wide deploy flake. +Single repo for the whole host; `flake.nix` declares one input +per agent + one `nixosConfigurations.` output per agent; +`flake.lock` is the canonical "what's deployed where." The git +log is the deploy audit trail (one commit per successful +deploy or hyperhive bump). Manager has this RO-mounted at +`/meta/`. + +Marker file `/var/lib/hyperhive/.meta-migration-done` is +written by the startup migration after every container has +been repointed at `meta#`. Removing it forces a re-run on +next hive-c0re start (idempotent — only the actual repoint +step would re-fire). ## Destroy vs purge diff --git a/hive-c0re/src/lifecycle.rs b/hive-c0re/src/lifecycle.rs index 525005d..f3e72da 100644 --- a/hive-c0re/src/lifecycle.rs +++ b/hive-c0re/src/lifecycle.rs @@ -80,20 +80,6 @@ pub fn is_manager(name: &str) -> bool { name == MANAGER_NAME } -/// The nixosConfiguration in the hyperhive flake the agent's -/// wrapper extends. Manager → `manager`; everyone else → -/// `agent-base`. Used by the meta-flake generator to know which -/// base to extend per agent. -#[must_use] -#[allow(dead_code)] // wired up by the meta module in a follow-up commit -pub fn flake_base(name: &str) -> &'static str { - if is_manager(name) { - "manager" - } else { - "agent-base" - } -} - fn validate(name: &str) -> Result<()> { if name.is_empty() { bail!("agent name must not be empty");