damocles 0b03d5bcfb topology: meta-repo agent hierarchy + ContainerView.parent (#361 )

2026-05-24 04:47:55 +02:00

8.3 KiB

Raw Blame History

Agent hierarchy & privileges

Design + audit doc for milestone #6 (the issue tree). The implementation lands in pieces; this doc tracks what's done, what's planned, and what currently special-cases the manager.

Current state (as of this PR)

Topology lives in the hive-c0re-owned meta repo, alongside flake.nix, at /var/lib/hyperhive/meta/topology.json:

{
  "manager": null,
  "alice":   "manager",
  "bob":     "alice"
}

null = root-level agent. Today only the manager qualifies. Other agents default to "manager" as parent on first sync. Operator/manager re-parenting via the write API + dashboard UI lands in a follow-up.

Why meta, not per-agent `agent.nix`

An agent shouldn't be able to claim a parent without that parent's consent, and operator-driven re-parenting shouldn't require touching the moved agent's config. Topology IS a system-level concern; meta is where system-level facts live.

Flow

Read: topology::read() parses topology.json into a BTreeMap<String, Option<String>>. Missing / unparsable file → empty map → every agent treated as root (safe degradation for fresh installs that haven't run meta::sync_agents yet).
Reconcile: meta::sync_agents calls topology::reconcile alongside its flake.nix regeneration. New agents land at their default position (manager as parent, manager itself as root); removed agents drop. Existing entries are preserved as-is so operator overrides stick across regenerations.
Inject: meta::render_flake looks up each agent's parent and passes it to mkAgent. When non-null, the mkAgent body sets HIVE_PARENT = parent in the agent's systemd service environment so the harness / claude prompts can see it.
Surface: container_view::build_all reads topology.json and populates ContainerView.parent: Option<String> on every rescan. The dashboard renders the field as a tree (#363 follow-up).

Target topology semantics

Once enforcement lands the rules collapse into:

operation	who can do it
`kill` / `start` / `restart` / `update` (any descendant)	any ancestor
`request_init_config` (spawn a new child)	any agent, child added under self
`request_apply_commit` (any descendant's config)	any ancestor
`get_logs` (any descendant)	any ancestor
moderate questions / reminders (cancel any open thread of a descendant)	any ancestor
`send` / `recv` routing	parent ↔ same-parent siblings ↔ self ↔ descendants; explicit allow-list for anyone else
`request_update_meta_inputs` (bump meta lock)	root agents only (today: just `manager`)

"Ancestor" walks ContainerView.parent chains; cycles are guarded by a visited-set at dispatch time (a malformed topology.json can't lock the dispatcher into a loop).

Current manager special-casings — the audit

What currently makes the manager different from every other agent, and which axis the post-milestone version reads each special-case along:

A — naming + bootstrap

MANAGER_AGENT = "manager" (broker recipient name) and MANAGER_NAME = "hm1nd" (container name). ~28 grep hits across hive-c0re/src/. Just a name — the rename plan is manager → root, executed via the one-shot migration script in migrate.rs (idempotent, marker-guarded).
auto_update::ensure_manager runs at hive-c0re boot and spawns hm1nd if missing. Becomes "ensure the root agent exists" once any agent can be at the root. Topology: root has no parent, so hive-c0re itself owns its lifecycle (no parent to delegate to).

B — wire-protocol privileges

The ManagerRequest::* variants in hive-sh4re/src/lib.rs are operations the manager flavour socket can make that sub-agent sockets can't:

variant	semantic	post-milestone
`RequestInitConfig`	seed an agent's proposed config repo	topology — descendants only
`RequestApplyCommit`	submit a commit sha for operator approval	topology — descendants only
`RequestSpawn` (deprecated)	shortcut for spawn	topology — descendants only
`Kill` / `Start` / `Restart` / `Update`	container lifecycle on an existing agent	topology — descendants only
`RequestUpdateMetaInputs`	bump meta `flake.lock`	per-agent cap (root-only today; a future "let coder bump its own input" might grant it)
`GetLogs`	journalctl scrape of a sub-agent	topology — descendants only
`Wake`	inject a `from: <X>` message into self's inbox	not really privileged — the wire surface exists because daemon co-processes (e.g. `forge_notify`) need it. Sub-agents have the same via their own socket.

C — storage / mounts (`hive-c0re::lifecycle`)

The manager container's nspawn bind set:

HOST_AGENTS_ROOT (/var/lib/hyperhive/agents) → /agents RW — so the manager can edit any agent's proposed config repo
HOST_APPLIED_ROOT (/var/lib/hyperhive/applied) → /applied RO — so the manager can diff against what's deployed
HOST_META_ROOT (/var/lib/hyperhive/meta) → /meta RO — so the manager can read the system-wide deploy log

Tree-shape version:

Each agent gets RW to /agents/<descendant>/ for every descendant in its subtree. The root agent (today: manager) gets RW to the full forest as a special case of "the root has every other agent as a descendant".
RO /meta access if the agent holds a "meta read" cap.
request_update_meta_inputs is the only path that actually writes flake.lock, gated by the cap; everyone else stays RO.

D — drop legacy `/state` for manager

lifecycle.rs::notes_mount currently ternary's /state for the manager and /agents/<name>/state for everyone else (because the manager pre-dates the per-agent state-dir layout). Milestone bullet: unify on /agents/<name>/state for everyone. One-time mv of /var/lib/hyperhive/manager/state → /var/lib/hyperhive/agents/manager/state in migrate.rs (idempotent, marker-guarded).

E — prompt + tools

prompts/manager.md vs prompts/agent.md — two separate system prompts. Per-agent cap list of what the agent can do, rendered into a single parametrised prompt at boot.
mcp.rs::Flavor::{Agent, Manager} controls which MCP tools claude sees. Already structured this way internally — the per-flavour allow-list becomes a per-cap-set lookup.

F — drive-by checks across c0re

(grep -n MANAGER_AGENT produced ~28 hits)

loose_ends.rs: manager sees hive-wide loose-ends, sub-agents only their own. Topology — every agent sees its own + its descendants'.
operator_questions.rs + broker.rs: "manager can cancel any question" override on the owner check. Topology — agents can moderate threads of their descendants. (per mara's https://localhost:3000/hyperhive/hyperhive/issues/361#issuecomment-3344)
reminder_scheduler.rs: same override pattern for reminder cancel. Topology — descendants only.
actions.rs: destroy refuses to act on MANAGER_NAME (no foot-shooting). Topology — agents can destroy descendants but never themselves or ancestors.
crash_watch.rs: skips ContainerCrash for the manager (it auto-restarts via systemd). Topology — the root container has different recovery semantics, every other agent falls into the same watch loop.

G — sub-agents inside the same container

Future work mentioned in #361: when enabled for an agent, it can spawn temporary "sub-agents" that run inside its own container. Lighter than a full nspawn agent. Open questions, not yet wired:

Inherit caps from parent, or take an explicit narrower set?
Survive container restart, or always ephemeral?
Inbox: separate from parent, or shared?
Filesystem: share parent's /state RW, or a sub-dir?
Identity: distinct broker recipient name, or address the parent?

Cross-references

Milestone: #361 "Agent privileges and sub-agents"
Dashboard render: #363 "show agent topology in container list"
Audit table source: comment 3335 on #361
Operator/agent trust boundary (orthogonal axis): boundary.md

8.3 KiB Raw Blame History