topology: meta-repo agent hierarchy + ContainerView.parent (#361)
This commit is contained in:
parent
e931c08739
commit
0b03d5bcfb
6 changed files with 403 additions and 3 deletions
178
docs/agent-hierarchy.md
Normal file
178
docs/agent-hierarchy.md
Normal file
|
|
@ -0,0 +1,178 @@
|
|||
# Agent hierarchy & privileges
|
||||
|
||||
Design + audit doc for milestone #6 (the
|
||||
[issue](http://localhost:3000/hyperhive/hyperhive/issues/361) tree).
|
||||
The implementation lands in pieces; this doc tracks what's done, what's
|
||||
planned, and what currently special-cases the manager.
|
||||
|
||||
## Current state (as of this PR)
|
||||
|
||||
Topology lives in the hive-c0re-owned **meta repo**, alongside
|
||||
`flake.nix`, at `/var/lib/hyperhive/meta/topology.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"manager": null,
|
||||
"alice": "manager",
|
||||
"bob": "alice"
|
||||
}
|
||||
```
|
||||
|
||||
`null` = root-level agent. Today only the manager qualifies. Other
|
||||
agents default to `"manager"` as parent on first sync. Operator/manager
|
||||
re-parenting via the write API + dashboard UI lands in a follow-up.
|
||||
|
||||
### Why meta, not per-agent `agent.nix`
|
||||
|
||||
An agent shouldn't be able to claim a parent without that parent's
|
||||
consent, and operator-driven re-parenting shouldn't require touching
|
||||
the moved agent's config. Topology IS a system-level concern; meta is
|
||||
where system-level facts live.
|
||||
|
||||
### Flow
|
||||
|
||||
1. **Read**: `topology::read()` parses `topology.json` into a
|
||||
`BTreeMap<String, Option<String>>`. Missing / unparsable file →
|
||||
empty map → every agent treated as root (safe degradation for
|
||||
fresh installs that haven't run `meta::sync_agents` yet).
|
||||
2. **Reconcile**: `meta::sync_agents` calls `topology::reconcile`
|
||||
alongside its `flake.nix` regeneration. New agents land at their
|
||||
default position (manager as parent, manager itself as root);
|
||||
removed agents drop. Existing entries are preserved as-is so
|
||||
operator overrides stick across regenerations.
|
||||
3. **Inject**: `meta::render_flake` looks up each agent's parent and
|
||||
passes it to `mkAgent`. When non-null, the mkAgent body sets
|
||||
`HIVE_PARENT = parent` in the agent's systemd service environment
|
||||
so the harness / claude prompts can see it.
|
||||
4. **Surface**: `container_view::build_all` reads `topology.json` and
|
||||
populates `ContainerView.parent: Option<String>` on every rescan.
|
||||
The dashboard renders the field as a tree (#363 follow-up).
|
||||
|
||||
## Target topology semantics
|
||||
|
||||
Once enforcement lands the rules collapse into:
|
||||
|
||||
| operation | who can do it |
|
||||
|---|---|
|
||||
| `kill` / `start` / `restart` / `update` (any descendant) | any ancestor |
|
||||
| `request_init_config` (spawn a new child) | any agent, child added under self |
|
||||
| `request_apply_commit` (any descendant's config) | any ancestor |
|
||||
| `get_logs` (any descendant) | any ancestor |
|
||||
| moderate questions / reminders (cancel any open thread of a descendant) | any ancestor |
|
||||
| `send` / `recv` routing | parent ↔ same-parent siblings ↔ self ↔ descendants; explicit allow-list for anyone else |
|
||||
| `request_update_meta_inputs` (bump meta lock) | root agents only (today: just `manager`) |
|
||||
|
||||
"Ancestor" walks `ContainerView.parent` chains; cycles are guarded by a
|
||||
visited-set at dispatch time (a malformed topology.json can't lock the
|
||||
dispatcher into a loop).
|
||||
|
||||
## Current manager special-casings — the audit
|
||||
|
||||
What currently makes the manager different from every other agent, and
|
||||
which axis the post-milestone version reads each special-case along:
|
||||
|
||||
### A — naming + bootstrap
|
||||
|
||||
- `MANAGER_AGENT = "manager"` (broker recipient name) and
|
||||
`MANAGER_NAME = "hm1nd"` (container name). ~28 grep hits across
|
||||
`hive-c0re/src/`. **Just a name** — the rename plan is `manager` →
|
||||
`root`, executed via the one-shot migration script in
|
||||
`migrate.rs` (idempotent, marker-guarded).
|
||||
- `auto_update::ensure_manager` runs at hive-c0re boot and spawns
|
||||
`hm1nd` if missing. Becomes "ensure the root agent exists" once any
|
||||
agent can be at the root. **Topology**: root has no parent, so
|
||||
hive-c0re itself owns its lifecycle (no parent to delegate to).
|
||||
|
||||
### B — wire-protocol privileges
|
||||
|
||||
The `ManagerRequest::*` variants in `hive-sh4re/src/lib.rs` are
|
||||
operations the manager flavour socket can make that sub-agent sockets
|
||||
can't:
|
||||
|
||||
| variant | semantic | post-milestone |
|
||||
|---|---|---|
|
||||
| `RequestInitConfig` | seed an agent's proposed config repo | **topology** — descendants only |
|
||||
| `RequestApplyCommit` | submit a commit sha for operator approval | **topology** — descendants only |
|
||||
| `RequestSpawn` (deprecated) | shortcut for spawn | **topology** — descendants only |
|
||||
| `Kill` / `Start` / `Restart` / `Update` | container lifecycle on an existing agent | **topology** — descendants only |
|
||||
| `RequestUpdateMetaInputs` | bump meta `flake.lock` | **per-agent cap** (root-only today; a future "let coder bump its own input" might grant it) |
|
||||
| `GetLogs` | journalctl scrape of a sub-agent | **topology** — descendants only |
|
||||
| `Wake` | inject a `from: <X>` message into self's inbox | **not really privileged** — the wire surface exists because daemon co-processes (e.g. `forge_notify`) need it. Sub-agents have the same via their own socket. |
|
||||
|
||||
### C — storage / mounts (`hive-c0re::lifecycle`)
|
||||
|
||||
The manager container's nspawn bind set:
|
||||
|
||||
- `HOST_AGENTS_ROOT (/var/lib/hyperhive/agents) → /agents` RW — so the
|
||||
manager can edit any agent's proposed config repo
|
||||
- `HOST_APPLIED_ROOT (/var/lib/hyperhive/applied) → /applied` RO — so
|
||||
the manager can diff against what's deployed
|
||||
- `HOST_META_ROOT (/var/lib/hyperhive/meta) → /meta` RO — so the
|
||||
manager can read the system-wide deploy log
|
||||
|
||||
Tree-shape version:
|
||||
- Each agent gets RW to `/agents/<descendant>/` for every descendant in
|
||||
its subtree. The root agent (today: manager) gets RW to the full
|
||||
forest as a special case of "the root has every other agent as a
|
||||
descendant".
|
||||
- RO `/meta` access if the agent holds a "meta read" cap.
|
||||
- `request_update_meta_inputs` is the only path that actually writes
|
||||
`flake.lock`, gated by the cap; everyone else stays RO.
|
||||
|
||||
### D — drop legacy `/state` for manager
|
||||
|
||||
`lifecycle.rs::notes_mount` currently ternary's `/state` for the
|
||||
manager and `/agents/<name>/state` for everyone else (because the
|
||||
manager pre-dates the per-agent state-dir layout). Milestone bullet:
|
||||
unify on `/agents/<name>/state` for everyone. One-time `mv` of
|
||||
`/var/lib/hyperhive/manager/state` → `/var/lib/hyperhive/agents/manager/state`
|
||||
in `migrate.rs` (idempotent, marker-guarded).
|
||||
|
||||
### E — prompt + tools
|
||||
|
||||
- `prompts/manager.md` vs `prompts/agent.md` — two separate system
|
||||
prompts. **Per-agent cap list** of what the agent can do, rendered
|
||||
into a single parametrised prompt at boot.
|
||||
- `mcp.rs::Flavor::{Agent, Manager}` controls which MCP tools claude
|
||||
sees. Already structured this way internally — the per-flavour
|
||||
allow-list becomes a per-cap-set lookup.
|
||||
|
||||
### F — drive-by checks across c0re
|
||||
|
||||
(`grep -n MANAGER_AGENT` produced ~28 hits)
|
||||
|
||||
- `loose_ends.rs`: manager sees hive-wide loose-ends, sub-agents only
|
||||
their own. **Topology** — every agent sees its own + its
|
||||
descendants'.
|
||||
- `operator_questions.rs` + `broker.rs`: "manager can cancel any
|
||||
question" override on the owner check. **Topology** — agents can
|
||||
moderate threads of their descendants. (per mara's
|
||||
https://localhost:3000/hyperhive/hyperhive/issues/361#issuecomment-3344)
|
||||
- `reminder_scheduler.rs`: same override pattern for reminder cancel.
|
||||
**Topology** — descendants only.
|
||||
- `actions.rs`: `destroy` refuses to act on `MANAGER_NAME` (no
|
||||
foot-shooting). **Topology** — agents can destroy descendants but
|
||||
never themselves or ancestors.
|
||||
- `crash_watch.rs`: skips `ContainerCrash` for the manager (it
|
||||
auto-restarts via systemd). **Topology** — the root container has
|
||||
different recovery semantics, every other agent falls into the same
|
||||
watch loop.
|
||||
|
||||
### G — sub-agents inside the same container
|
||||
|
||||
Future work mentioned in #361: when enabled for an agent, it can spawn
|
||||
temporary "sub-agents" that run inside its own container. Lighter than
|
||||
a full nspawn agent. Open questions, not yet wired:
|
||||
|
||||
- Inherit caps from parent, or take an explicit narrower set?
|
||||
- Survive container restart, or always ephemeral?
|
||||
- Inbox: separate from parent, or shared?
|
||||
- Filesystem: share parent's `/state` RW, or a sub-dir?
|
||||
- Identity: distinct broker recipient name, or address the parent?
|
||||
|
||||
## Cross-references
|
||||
|
||||
- Milestone: [#361 "Agent privileges and sub-agents"](http://localhost:3000/hyperhive/hyperhive/issues/361)
|
||||
- Dashboard render: [#363 "show agent topology in container list"](http://localhost:3000/hyperhive/hyperhive/issues/363)
|
||||
- Audit table source: [comment 3335 on #361](http://localhost:3000/hyperhive/hyperhive/issues/361#issuecomment-3335)
|
||||
- Operator/agent trust boundary (orthogonal axis): [`boundary.md`](boundary.md)
|
||||
Loading…
Add table
Add a link
Reference in a new issue