hyperhive/docs/agent-hierarchy.md

8.3 KiB

Agent hierarchy & privileges

Design + audit doc for milestone #6 (the issue tree). The implementation lands in pieces; this doc tracks what's done, what's planned, and what currently special-cases the manager.

Current state (as of this PR)

Topology lives in the hive-c0re-owned meta repo, alongside flake.nix, at /var/lib/hyperhive/meta/topology.json:

{
  "manager": null,
  "alice":   "manager",
  "bob":     "alice"
}

null = root-level agent. Today only the manager qualifies. Other agents default to "manager" as parent on first sync. Operator/manager re-parenting via the write API + dashboard UI lands in a follow-up.

Why meta, not per-agent agent.nix

An agent shouldn't be able to claim a parent without that parent's consent, and operator-driven re-parenting shouldn't require touching the moved agent's config. Topology IS a system-level concern; meta is where system-level facts live.

Flow

  1. Read: topology::read() parses topology.json into a BTreeMap<String, Option<String>>. Missing / unparsable file → empty map → every agent treated as root (safe degradation for fresh installs that haven't run meta::sync_agents yet).
  2. Reconcile: meta::sync_agents calls topology::reconcile alongside its flake.nix regeneration. New agents land at their default position (manager as parent, manager itself as root); removed agents drop. Existing entries are preserved as-is so operator overrides stick across regenerations.
  3. Inject: meta::render_flake looks up each agent's parent and passes it to mkAgent. When non-null, the mkAgent body sets HIVE_PARENT = parent in the agent's systemd service environment so the harness / claude prompts can see it.
  4. Surface: container_view::build_all reads topology.json and populates ContainerView.parent: Option<String> on every rescan. The dashboard renders the field as a tree (#363 follow-up).

Target topology semantics

Once enforcement lands the rules collapse into:

operation who can do it
kill / start / restart / update (any descendant) any ancestor
request_init_config (spawn a new child) any agent, child added under self
request_apply_commit (any descendant's config) any ancestor
get_logs (any descendant) any ancestor
moderate questions / reminders (cancel any open thread of a descendant) any ancestor
send / recv routing parent ↔ same-parent siblings ↔ self ↔ descendants; explicit allow-list for anyone else
request_update_meta_inputs (bump meta lock) root agents only (today: just manager)

"Ancestor" walks ContainerView.parent chains; cycles are guarded by a visited-set at dispatch time (a malformed topology.json can't lock the dispatcher into a loop).

Current manager special-casings — the audit

What currently makes the manager different from every other agent, and which axis the post-milestone version reads each special-case along:

A — naming + bootstrap

  • MANAGER_AGENT = "manager" (broker recipient name) and MANAGER_NAME = "hm1nd" (container name). ~28 grep hits across hive-c0re/src/. Just a name — the rename plan is managerroot, executed via the one-shot migration script in migrate.rs (idempotent, marker-guarded).
  • auto_update::ensure_manager runs at hive-c0re boot and spawns hm1nd if missing. Becomes "ensure the root agent exists" once any agent can be at the root. Topology: root has no parent, so hive-c0re itself owns its lifecycle (no parent to delegate to).

B — wire-protocol privileges

The ManagerRequest::* variants in hive-sh4re/src/lib.rs are operations the manager flavour socket can make that sub-agent sockets can't:

variant semantic post-milestone
RequestInitConfig seed an agent's proposed config repo topology — descendants only
RequestApplyCommit submit a commit sha for operator approval topology — descendants only
RequestSpawn (deprecated) shortcut for spawn topology — descendants only
Kill / Start / Restart / Update container lifecycle on an existing agent topology — descendants only
RequestUpdateMetaInputs bump meta flake.lock per-agent cap (root-only today; a future "let coder bump its own input" might grant it)
GetLogs journalctl scrape of a sub-agent topology — descendants only
Wake inject a from: <X> message into self's inbox not really privileged — the wire surface exists because daemon co-processes (e.g. forge_notify) need it. Sub-agents have the same via their own socket.

C — storage / mounts (hive-c0re::lifecycle)

The manager container's nspawn bind set:

  • HOST_AGENTS_ROOT (/var/lib/hyperhive/agents) → /agents RW — so the manager can edit any agent's proposed config repo
  • HOST_APPLIED_ROOT (/var/lib/hyperhive/applied) → /applied RO — so the manager can diff against what's deployed
  • HOST_META_ROOT (/var/lib/hyperhive/meta) → /meta RO — so the manager can read the system-wide deploy log

Tree-shape version:

  • Each agent gets RW to /agents/<descendant>/ for every descendant in its subtree. The root agent (today: manager) gets RW to the full forest as a special case of "the root has every other agent as a descendant".
  • RO /meta access if the agent holds a "meta read" cap.
  • request_update_meta_inputs is the only path that actually writes flake.lock, gated by the cap; everyone else stays RO.

D — drop legacy /state for manager

lifecycle.rs::notes_mount currently ternary's /state for the manager and /agents/<name>/state for everyone else (because the manager pre-dates the per-agent state-dir layout). Milestone bullet: unify on /agents/<name>/state for everyone. One-time mv of /var/lib/hyperhive/manager/state/var/lib/hyperhive/agents/manager/state in migrate.rs (idempotent, marker-guarded).

E — prompt + tools

  • prompts/manager.md vs prompts/agent.md — two separate system prompts. Per-agent cap list of what the agent can do, rendered into a single parametrised prompt at boot.
  • mcp.rs::Flavor::{Agent, Manager} controls which MCP tools claude sees. Already structured this way internally — the per-flavour allow-list becomes a per-cap-set lookup.

F — drive-by checks across c0re

(grep -n MANAGER_AGENT produced ~28 hits)

  • loose_ends.rs: manager sees hive-wide loose-ends, sub-agents only their own. Topology — every agent sees its own + its descendants'.
  • operator_questions.rs + broker.rs: "manager can cancel any question" override on the owner check. Topology — agents can moderate threads of their descendants. (per mara's https://localhost:3000/hyperhive/hyperhive/issues/361#issuecomment-3344)
  • reminder_scheduler.rs: same override pattern for reminder cancel. Topology — descendants only.
  • actions.rs: destroy refuses to act on MANAGER_NAME (no foot-shooting). Topology — agents can destroy descendants but never themselves or ancestors.
  • crash_watch.rs: skips ContainerCrash for the manager (it auto-restarts via systemd). Topology — the root container has different recovery semantics, every other agent falls into the same watch loop.

G — sub-agents inside the same container

Future work mentioned in #361: when enabled for an agent, it can spawn temporary "sub-agents" that run inside its own container. Lighter than a full nspawn agent. Open questions, not yet wired:

  • Inherit caps from parent, or take an explicit narrower set?
  • Survive container restart, or always ephemeral?
  • Inbox: separate from parent, or shared?
  • Filesystem: share parent's /state RW, or a sub-dir?
  • Identity: distinct broker recipient name, or address the parent?

Cross-references