diff --git a/CLAUDE.md b/CLAUDE.md index 296d77d..bded37d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -82,6 +82,14 @@ hive-c0re/ host daemon + CLI (one binary, subcommand-dispatched) prepare/finalize/abort, lock_update_* src/migrate.rs startup auto-migration from pre-meta layout (idempotent, marker-guarded phase 4) + src/topology.rs agent parent/child storage at + /var/lib/hyperhive/meta/topology.json — sole + source of truth for who's the parent of whom + (single source the dashboard, render_flake, + and the eventual cap-enforcement plumbing all + read). Reconciled by `meta::sync_agents`; + operator/manager edits land via the + eventual write API (#361 follow-ups). src/forge.rs optional Forgejo wiring: per-agent users + tokens, the `agent-configs` org (`push_config`), and meta read access; mirrors each applied repo @@ -171,6 +179,7 @@ docs/ persistence.md sqlite dbs, retention, state dir layout terminal-rendering.md per-agent terminal row taxonomy (as built) boundary.md operator/agent trust model rationale + agent-hierarchy.md tree-shape topology design + manager-privilege audit (#361) damocles-migration.md future migration plan for damocles → hyperhive ``` diff --git a/docs/agent-hierarchy.md b/docs/agent-hierarchy.md new file mode 100644 index 0000000..b00eee6 --- /dev/null +++ b/docs/agent-hierarchy.md @@ -0,0 +1,178 @@ +# Agent hierarchy & privileges + +Design + audit doc for milestone #6 (the +[issue](http://localhost:3000/hyperhive/hyperhive/issues/361) tree). +The implementation lands in pieces; this doc tracks what's done, what's +planned, and what currently special-cases the manager. + +## Current state (as of this PR) + +Topology lives in the hive-c0re-owned **meta repo**, alongside +`flake.nix`, at `/var/lib/hyperhive/meta/topology.json`: + +```json +{ + "manager": null, + "alice": "manager", + "bob": "alice" +} +``` + +`null` = root-level agent. Today only the manager qualifies. Other +agents default to `"manager"` as parent on first sync. Operator/manager +re-parenting via the write API + dashboard UI lands in a follow-up. + +### Why meta, not per-agent `agent.nix` + +An agent shouldn't be able to claim a parent without that parent's +consent, and operator-driven re-parenting shouldn't require touching +the moved agent's config. Topology IS a system-level concern; meta is +where system-level facts live. + +### Flow + +1. **Read**: `topology::read()` parses `topology.json` into a + `BTreeMap>`. Missing / unparsable file → + empty map → every agent treated as root (safe degradation for + fresh installs that haven't run `meta::sync_agents` yet). +2. **Reconcile**: `meta::sync_agents` calls `topology::reconcile` + alongside its `flake.nix` regeneration. New agents land at their + default position (manager as parent, manager itself as root); + removed agents drop. Existing entries are preserved as-is so + operator overrides stick across regenerations. +3. **Inject**: `meta::render_flake` looks up each agent's parent and + passes it to `mkAgent`. When non-null, the mkAgent body sets + `HIVE_PARENT = parent` in the agent's systemd service environment + so the harness / claude prompts can see it. +4. **Surface**: `container_view::build_all` reads `topology.json` and + populates `ContainerView.parent: Option` on every rescan. + The dashboard renders the field as a tree (#363 follow-up). + +## Target topology semantics + +Once enforcement lands the rules collapse into: + +| operation | who can do it | +|---|---| +| `kill` / `start` / `restart` / `update` (any descendant) | any ancestor | +| `request_init_config` (spawn a new child) | any agent, child added under self | +| `request_apply_commit` (any descendant's config) | any ancestor | +| `get_logs` (any descendant) | any ancestor | +| moderate questions / reminders (cancel any open thread of a descendant) | any ancestor | +| `send` / `recv` routing | parent ↔ same-parent siblings ↔ self ↔ descendants; explicit allow-list for anyone else | +| `request_update_meta_inputs` (bump meta lock) | root agents only (today: just `manager`) | + +"Ancestor" walks `ContainerView.parent` chains; cycles are guarded by a +visited-set at dispatch time (a malformed topology.json can't lock the +dispatcher into a loop). + +## Current manager special-casings — the audit + +What currently makes the manager different from every other agent, and +which axis the post-milestone version reads each special-case along: + +### A — naming + bootstrap + +- `MANAGER_AGENT = "manager"` (broker recipient name) and + `MANAGER_NAME = "hm1nd"` (container name). ~28 grep hits across + `hive-c0re/src/`. **Just a name** — the rename plan is `manager` → + `root`, executed via the one-shot migration script in + `migrate.rs` (idempotent, marker-guarded). +- `auto_update::ensure_manager` runs at hive-c0re boot and spawns + `hm1nd` if missing. Becomes "ensure the root agent exists" once any + agent can be at the root. **Topology**: root has no parent, so + hive-c0re itself owns its lifecycle (no parent to delegate to). + +### B — wire-protocol privileges + +The `ManagerRequest::*` variants in `hive-sh4re/src/lib.rs` are +operations the manager flavour socket can make that sub-agent sockets +can't: + +| variant | semantic | post-milestone | +|---|---|---| +| `RequestInitConfig` | seed an agent's proposed config repo | **topology** — descendants only | +| `RequestApplyCommit` | submit a commit sha for operator approval | **topology** — descendants only | +| `RequestSpawn` (deprecated) | shortcut for spawn | **topology** — descendants only | +| `Kill` / `Start` / `Restart` / `Update` | container lifecycle on an existing agent | **topology** — descendants only | +| `RequestUpdateMetaInputs` | bump meta `flake.lock` | **per-agent cap** (root-only today; a future "let coder bump its own input" might grant it) | +| `GetLogs` | journalctl scrape of a sub-agent | **topology** — descendants only | +| `Wake` | inject a `from: ` message into self's inbox | **not really privileged** — the wire surface exists because daemon co-processes (e.g. `forge_notify`) need it. Sub-agents have the same via their own socket. | + +### C — storage / mounts (`hive-c0re::lifecycle`) + +The manager container's nspawn bind set: + +- `HOST_AGENTS_ROOT (/var/lib/hyperhive/agents) → /agents` RW — so the + manager can edit any agent's proposed config repo +- `HOST_APPLIED_ROOT (/var/lib/hyperhive/applied) → /applied` RO — so + the manager can diff against what's deployed +- `HOST_META_ROOT (/var/lib/hyperhive/meta) → /meta` RO — so the + manager can read the system-wide deploy log + +Tree-shape version: +- Each agent gets RW to `/agents//` for every descendant in + its subtree. The root agent (today: manager) gets RW to the full + forest as a special case of "the root has every other agent as a + descendant". +- RO `/meta` access if the agent holds a "meta read" cap. +- `request_update_meta_inputs` is the only path that actually writes + `flake.lock`, gated by the cap; everyone else stays RO. + +### D — drop legacy `/state` for manager + +`lifecycle.rs::notes_mount` currently ternary's `/state` for the +manager and `/agents//state` for everyone else (because the +manager pre-dates the per-agent state-dir layout). Milestone bullet: +unify on `/agents//state` for everyone. One-time `mv` of +`/var/lib/hyperhive/manager/state` → `/var/lib/hyperhive/agents/manager/state` +in `migrate.rs` (idempotent, marker-guarded). + +### E — prompt + tools + +- `prompts/manager.md` vs `prompts/agent.md` — two separate system + prompts. **Per-agent cap list** of what the agent can do, rendered + into a single parametrised prompt at boot. +- `mcp.rs::Flavor::{Agent, Manager}` controls which MCP tools claude + sees. Already structured this way internally — the per-flavour + allow-list becomes a per-cap-set lookup. + +### F — drive-by checks across c0re + +(`grep -n MANAGER_AGENT` produced ~28 hits) + +- `loose_ends.rs`: manager sees hive-wide loose-ends, sub-agents only + their own. **Topology** — every agent sees its own + its + descendants'. +- `operator_questions.rs` + `broker.rs`: "manager can cancel any + question" override on the owner check. **Topology** — agents can + moderate threads of their descendants. (per mara's + https://localhost:3000/hyperhive/hyperhive/issues/361#issuecomment-3344) +- `reminder_scheduler.rs`: same override pattern for reminder cancel. + **Topology** — descendants only. +- `actions.rs`: `destroy` refuses to act on `MANAGER_NAME` (no + foot-shooting). **Topology** — agents can destroy descendants but + never themselves or ancestors. +- `crash_watch.rs`: skips `ContainerCrash` for the manager (it + auto-restarts via systemd). **Topology** — the root container has + different recovery semantics, every other agent falls into the same + watch loop. + +### G — sub-agents inside the same container + +Future work mentioned in #361: when enabled for an agent, it can spawn +temporary "sub-agents" that run inside its own container. Lighter than +a full nspawn agent. Open questions, not yet wired: + +- Inherit caps from parent, or take an explicit narrower set? +- Survive container restart, or always ephemeral? +- Inbox: separate from parent, or shared? +- Filesystem: share parent's `/state` RW, or a sub-dir? +- Identity: distinct broker recipient name, or address the parent? + +## Cross-references + +- Milestone: [#361 "Agent privileges and sub-agents"](http://localhost:3000/hyperhive/hyperhive/issues/361) +- Dashboard render: [#363 "show agent topology in container list"](http://localhost:3000/hyperhive/hyperhive/issues/363) +- Audit table source: [comment 3335 on #361](http://localhost:3000/hyperhive/hyperhive/issues/361#issuecomment-3335) +- Operator/agent trust boundary (orthogonal axis): [`boundary.md`](boundary.md) diff --git a/hive-c0re/src/container_view.rs b/hive-c0re/src/container_view.rs index adc9a42..01be7ef 100644 --- a/hive-c0re/src/container_view.rs +++ b/hive-c0re/src/container_view.rs @@ -87,6 +87,14 @@ pub struct ContainerView { /// status is set. #[serde(default, skip_serializing_if = "Option::is_none")] pub status_set_at: Option, + /// Name of this agent's parent in the agent hierarchy (#361). `None` + /// marks the agent as root-level; the dashboard renders it without + /// indentation. Sourced from `meta/topology.json` (single source of + /// truth, hive-c0re-owned) — NOT from per-agent agent.nix, because + /// an agent shouldn't be able to unilaterally declare its own place + /// in the tree. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub parent: Option, } /// Build the full container list. Wraps `lifecycle::list()` and @@ -94,6 +102,10 @@ pub struct ContainerView { pub async fn build_all(coord: &Coordinator) -> Vec { let raw = lifecycle::list().await.unwrap_or_default(); let locked = read_meta_locked_revs(); + // Pull the topology map once and look up each agent's parent below. + // Empty / absent topology.json → every agent root-level (matches + // the pre-#361 status quo for fresh installs). + let topology = crate::topology::read(); let mut out = Vec::new(); for c in &raw { let (logical, is_manager) = if c == MANAGER_NAME { @@ -130,6 +142,7 @@ pub async fn build_all(coord: &Coordinator) -> Vec { let rate_limited = is_rate_limited(&logical); let extra_links = read_dashboard_links(&logical); let (status_text, status_set_at) = read_status(&logical); + let parent = topology.get(&logical).cloned().flatten(); out.push(ContainerView { port: lifecycle::agent_web_port(&logical), running: lifecycle::is_running(&logical).await, @@ -146,6 +159,7 @@ pub async fn build_all(coord: &Coordinator) -> Vec { extra_links, status_text, status_set_at, + parent, }); } out diff --git a/hive-c0re/src/main.rs b/hive-c0re/src/main.rs index 5cd24a3..62e33fc 100644 --- a/hive-c0re/src/main.rs +++ b/hive-c0re/src/main.rs @@ -28,6 +28,7 @@ mod migrate; mod operator_questions; mod questions; mod rebuild_queue; +mod topology; mod reminder_scheduler; mod server; diff --git a/hive-c0re/src/meta.rs b/hive-c0re/src/meta.rs index 8d869d1..dc3cd2f 100644 --- a/hive-c0re/src/meta.rs +++ b/hive-c0re/src/meta.rs @@ -85,6 +85,16 @@ pub async fn sync_agents( std::fs::write(&flake_path, &new_flake) .with_context(|| format!("write {}", flake_path.display()))?; + // Reconcile topology.json against the live agent set — adds + // entries for newly-spawned agents (default: manager as parent, + // manager itself as root) and drops removed agents. Operator + // overrides via the write API (#361 follow-up) are preserved + // because reconcile only fills in missing entries. Idempotent; + // when nothing changed the file isn't touched. + let agent_names: Vec = agents.iter().map(|a| a.name.clone()).collect(); + let topology_changed = crate::topology::reconcile(&agent_names) + .with_context(|| format!("reconcile {}", crate::topology::topology_path().display()))?; + if initial { git(&dir, &["init", "--initial-branch=main"]).await?; } @@ -96,12 +106,20 @@ pub async fn sync_agents( // contain '/flake.nix'". Lock then commit once with both // flake.nix and flake.lock — single commit per change. git(&dir, &["add", "flake.nix"]).await?; + // Stage topology.json on every sync (regenerated by reconcile + // above when the agent set changed). git add is a no-op when the + // file content is unchanged. + if crate::topology::topology_path().exists() { + git(&dir, &["add", "topology.json"]).await?; + } nix(&dir, &["flake", "lock"]).await?; if std::path::Path::new(&dir).join("flake.lock").exists() { git(&dir, &["add", "flake.lock"]).await?; } let msg = if initial { format!("seed meta from {} agent(s)", agents.len()) + } else if topology_changed { + "regenerate meta flake + topology".to_owned() } else { "regenerate meta flake".to_owned() }; @@ -348,7 +366,7 @@ where let pronouns_escaped = operator_pronouns.replace('\\', "\\\\").replace('"', "\\\""); let _ = writeln!( out, - " dashboardPort = {dashboard_port};\n operatorPronouns = \"{pronouns_escaped}\";\n mkAgent = {{ name, isManager, port }}:" + " dashboardPort = {dashboard_port};\n operatorPronouns = \"{pronouns_escaped}\";\n mkAgent = {{ name, isManager, port, parent ? null }}:" ); out.push_str( r#" let @@ -357,6 +375,7 @@ where else hyperhive.nixosConfigurations.agent-base; input = inputs."agent-${name}"; service = if isManager then "hive-m1nd" else "hive-ag3nt"; + parentEnv = if parent == null then {} else { HIVE_PARENT = parent; }; in base.extendModules { modules = [ @@ -372,7 +391,7 @@ where HIVE_LABEL = name; HYPERHIVE_STATE_DIR = "/agents/${name}/state"; }; - systemd.services.${service}.environment = { + systemd.services.${service}.environment = parentEnv // { HIVE_PORT = toString port; HIVE_LABEL = name; HIVE_DASHBOARD_PORT = toString dashboardPort; @@ -406,14 +425,25 @@ where nixosConfigurations = { "#, ); + // Pull the topology map once and look up each agent's parent. An + // empty / absent topology.json yields `parent = null` for everyone + // — equivalent to the pre-#361 status quo (every container at root). + // `meta::sync_agents` seeds the file on first run with manager as + // root + everyone else under manager. + let topology = crate::topology::read(); for spec in agents { + let parent_attr = topology + .get(&spec.name) + .and_then(|p| p.as_ref()) + .map_or_else(|| "null".to_owned(), |p| format!("\"{p}\"")); let _ = writeln!( out, - " {} = mkAgent {{ name = \"{}\"; isManager = {}; port = {}; }};", + " {} = mkAgent {{ name = \"{}\"; isManager = {}; port = {}; parent = {}; }};", spec.name, spec.name, if spec.is_manager { "true" } else { "false" }, spec.port, + parent_attr, ); } out.push_str(" };\n };\n}\n"); diff --git a/hive-c0re/src/topology.rs b/hive-c0re/src/topology.rs new file mode 100644 index 0000000..6687449 --- /dev/null +++ b/hive-c0re/src/topology.rs @@ -0,0 +1,168 @@ +//! Agent topology storage — single source of truth for parent/child +//! relations in the hive. Lives in the hive-c0re-owned meta repo at +//! `/var/lib/hyperhive/meta/topology.json`, alongside `flake.nix`, so +//! topology changes thread through the same git commit log as deploys. +//! +//! Why meta, not per-agent: an agent shouldn't be able to claim a +//! parent without that parent's consent, and an operator-driven +//! re-parenting shouldn't require touching the moved agent's own +//! config. Topology IS a system-level concern; meta is where +//! system-level facts live. +//! +//! Format — flat JSON map keyed by agent name, values are the parent +//! agent's name or `null` for root: +//! +//! ```json +//! { +//! "manager": null, +//! "alice": "manager", +//! "bob": "alice" +//! } +//! ``` +//! +//! Agents present in `nixos-container list` but absent from the file +//! default to root-level (`parent = None`). This file is operator/ +//! manager-managed via approval-gated writes (write API lands in a +//! follow-up PR on the #361 milestone); for the bootstrap commit +//! `meta::sync_agents` seeds it with the existing implicit topology +//! (manager as root, all current sub-agents as direct children). + +use std::collections::BTreeMap; +use std::path::PathBuf; + +const TOPOLOGY_FILE: &str = "topology.json"; + +#[must_use] +pub fn topology_path() -> PathBuf { + crate::meta::meta_dir().join(TOPOLOGY_FILE) +} + +/// Snapshot of the topology map. Read on every `container_view::build_all` +/// and every `render_flake` call. The file is small (one line per agent), +/// so we re-read rather than caching — keeps the source of truth on disk. +/// +/// Returns an empty map when the file is absent or unparsable; callers +/// treat that as "no recorded parents", which falls back to every agent +/// being root-level. Safe degradation for fresh installs that haven't +/// run through `meta::sync_agents` yet. +#[must_use] +pub fn read() -> BTreeMap> { + let path = topology_path(); + let Ok(raw) = std::fs::read_to_string(&path) else { + return BTreeMap::new(); + }; + serde_json::from_str(&raw).unwrap_or_default() +} + +/// Look up one agent's parent. Returns `None` when the agent is root +/// or absent from the file. Cheap convenience over `read()` for +/// callers that want a single entry. +#[must_use] +pub fn parent_of(name: &str) -> Option { + read().get(name).cloned().flatten() +} + +/// Persist the topology map. Sorted JSON output (BTreeMap is sorted by +/// key) keeps git diffs minimal across re-writes. Best-effort — +/// returns an `io::Error` so callers can decide whether a failure +/// should abort their op (sync_agents, RequestSetParent) or just log. +pub fn write(topology: &BTreeMap>) -> std::io::Result<()> { + let path = topology_path(); + if let Some(parent) = path.parent() { + std::fs::create_dir_all(parent)?; + } + let text = serde_json::to_string_pretty(topology) + .map_err(|e| std::io::Error::new(std::io::ErrorKind::InvalidData, e))?; + std::fs::write(&path, format!("{text}\n")) +} + +/// Compute the default topology for a fresh install: every non-manager +/// agent has the manager as parent; manager itself is root. Used by +/// `meta::sync_agents` on first call to seed `topology.json`. +/// +/// As soon as an explicit write lands (#361 follow-up: dashboard / +/// `RequestSetParent` API), this seeding stops touching pre-existing +/// entries — `sync_agents` only adds rows for newly-spawned agents +/// against whatever the operator has configured. +#[must_use] +pub fn default_seed(agent_names: &[String]) -> BTreeMap> { + let mut out = BTreeMap::new(); + for name in agent_names { + if name == crate::lifecycle::MANAGER_NAME { + out.insert(name.clone(), None); + } else { + out.insert(name.clone(), Some(crate::lifecycle::MANAGER_NAME.to_owned())); + } + } + out +} + +/// Reconcile `topology.json` against the current agent set. Adds an +/// entry (default: parent = manager, manager itself = root) for any +/// agent missing from the file; removes entries for agents no longer +/// present. Existing entries are preserved as-is — operator/manager +/// choices stick across regenerations. Returns true when the file +/// changed and should be re-committed by the caller. +pub fn reconcile(agent_names: &[String]) -> std::io::Result { + let mut current = read(); + let mut changed = false; + // Add missing agents at their default position. + for name in agent_names { + if !current.contains_key(name) { + let parent = if name == crate::lifecycle::MANAGER_NAME { + None + } else { + Some(crate::lifecycle::MANAGER_NAME.to_owned()) + }; + current.insert(name.clone(), parent); + changed = true; + } + } + // Drop entries for agents that no longer exist. + let known: std::collections::HashSet<_> = agent_names.iter().collect(); + current.retain(|name, _| { + let keep = known.contains(name); + if !keep { + changed = true; + } + keep + }); + if changed { + write(¤t)?; + } + Ok(changed) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn default_seed_makes_manager_root_others_children() { + let agents = vec![ + "alice".to_owned(), + crate::lifecycle::MANAGER_NAME.to_owned(), + "bob".to_owned(), + ]; + let seed = default_seed(&agents); + assert_eq!( + seed.get(crate::lifecycle::MANAGER_NAME), + Some(&None), + "manager should be root" + ); + assert_eq!( + seed.get("alice"), + Some(&Some(crate::lifecycle::MANAGER_NAME.to_owned())) + ); + assert_eq!( + seed.get("bob"), + Some(&Some(crate::lifecycle::MANAGER_NAME.to_owned())) + ); + } + + #[test] + fn default_seed_handles_empty_input() { + let seed = default_seed(&[]); + assert!(seed.is_empty()); + } +}