crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).
166 lines
7.2 KiB
Markdown
166 lines
7.2 KiB
Markdown
# Approvals + manager + helper events
|
|
|
|
The approval queue is hyperhive's pivot: nothing that changes the
|
|
shape of an agent (its config, whether it exists) happens without an
|
|
operator click. The manager (`hm1nd`) is the policy gate in front of
|
|
that queue; helper events are how it stays informed about what
|
|
happens after a decision lands.
|
|
|
|
## End-to-end approval flow
|
|
|
|
1. Manager edits `/agents/<name>/config/agent.nix` (bind-mounted
|
|
from the host's per-agent `proposed` repo) and commits.
|
|
2. Manager submits the commit sha via `request_apply_commit(agent,
|
|
commit_ref)`.
|
|
3. Operator sees the diff on the dashboard, clicks ◆ APPR0VE (or
|
|
`hive-c0re approve <id>` on the CLI).
|
|
4. hive-c0re reads the file at that sha from `proposed`, applies
|
|
into `applied`, commits there, runs `nixos-container update`.
|
|
5. `HelperEvent::ApprovalResolved` lands in the manager's inbox.
|
|
|
|
`Spawn` approvals follow the same shape but skip the commit-diff
|
|
step — the operator just sees the name. On approve, hive-c0re
|
|
creates the container in a background task while the dashboard
|
|
shows a spinner.
|
|
|
|
## Two repos per agent
|
|
|
|
```
|
|
/var/lib/hyperhive/agents/<name>/config/ proposed
|
|
└── agent.nix # the only file the
|
|
# manager can change
|
|
# (initial commit by
|
|
# hive-c0re on first
|
|
# spawn, never touched
|
|
# again).
|
|
|
|
/var/lib/hyperhive/applied/<name>/ applied — hive-c0re-only
|
|
├── flake.nix # auto-generated
|
|
└── agent.nix # overwritten by approve
|
|
# from the proposed commit
|
|
```
|
|
|
|
The container's `--flake` ref is `<applied_dir>#default`. The flake
|
|
extends `hyperhive.nixosConfigurations.{agent-base|manager}` with
|
|
`./agent.nix` plus an inline module setting
|
|
`programs.git.config.user` (committer identity = the agent's name)
|
|
and `systemd.services.<harness>.environment` (`HIVE_PORT`,
|
|
`HIVE_LABEL`, `HIVE_DASHBOARD_PORT`).
|
|
|
|
## Manager (`hm1nd`) is hive-c0re-managed
|
|
|
|
The manager container runs through the **same lifecycle as
|
|
sub-agents**. On `hive-c0re serve` startup, if `hm1nd` is missing,
|
|
hive-c0re creates it. The manager's flake lives at
|
|
`/var/lib/hyperhive/applied/hm1nd/`; its proposed config at
|
|
`/var/lib/hyperhive/agents/hm1nd/config/`. Manager can edit its own
|
|
`agent.nix` (visible inside the container at `/agents/hm1nd/config/`)
|
|
and submit `request_apply_commit("hm1nd", <sha>)` for operator
|
|
approval.
|
|
|
|
Differences from sub-agents:
|
|
|
|
- `flake.nix` extends `hyperhive.nixosConfigurations.manager`
|
|
(vs `agent-base`).
|
|
- Container name is `hm1nd` (no `h-` prefix).
|
|
- Fixed web UI port (`MANAGER_PORT = 8000`).
|
|
- `set_nspawn_flags` adds an extra bind:
|
|
`/var/lib/hyperhive/agents` → `/agents` (RW), so the manager can
|
|
edit per-agent proposed repos.
|
|
- First-deploy spawn bypasses the approval queue (manager is
|
|
required infrastructure).
|
|
- Per-agent socket lives at `/run/hyperhive/manager/`, owned by
|
|
`manager_server::start`.
|
|
|
|
**Migration note** (for older hosts): drop any `containers.hm1nd =
|
|
{ ... }` block from your host NixOS config. hyperhive creates and
|
|
updates the manager itself.
|
|
|
|
## Manager policy
|
|
|
|
From `hive-ag3nt/prompts/manager.md`: the manager does NOT
|
|
rubber-stamp sub-agent config requests. It verifies (role match,
|
|
package legitimacy, cheaper alternative, blast radius) before
|
|
committing and calling `request_apply_commit`.
|
|
|
|
For ambiguous cases or anything that needs human signal, the
|
|
manager calls `ask_operator(question, options?, multi?,
|
|
ttl_seconds?)` — queues the question on the dashboard and returns
|
|
the id immediately. The operator's answer arrives later as
|
|
`HelperEvent::OperatorAnswered` in the manager inbox. Storage is
|
|
`hive-c0re::operator_questions` (sqlite); the answer flow is:
|
|
|
|
```
|
|
POST /answer-question/{id}
|
|
→ OperatorQuestions::answer
|
|
→ notify_manager(OperatorAnswered { id, question, answer })
|
|
```
|
|
|
|
Two more paths resolve a pending question with a sentinel answer:
|
|
|
|
- `POST /cancel-question/{id}` (✗ CANC3L button on the dashboard)
|
|
resolves with `[cancelled]`. The manager sees a terminal state
|
|
and can fall back.
|
|
- `ttl_seconds` deadline: a tokio watchdog spawned at submit time
|
|
fires `answer(id, "[expired]")` once the ttl runs out. Already-
|
|
resolved races no-op. The dashboard surfaces a `⏳ MM:SS` chip
|
|
on each pending question with a deadline.
|
|
|
|
## Helper events to the manager
|
|
|
|
`Coordinator::notify_manager(&HelperEvent)` enqueues an inbox
|
|
message from sender `system` with the event JSON in the body. The
|
|
manager harness no longer short-circuits these — they drive a
|
|
regular claude turn so the manager can react. Variants
|
|
(`hive_sh4re::HelperEvent`):
|
|
|
|
- `ApprovalResolved { id, agent, commit_ref, status, note }` —
|
|
fired by `actions::approve` + `actions::deny` whenever an
|
|
approval transitions to its terminal state.
|
|
- `Spawned { agent, ok, note }` — `actions::approve` (Spawn-kind)
|
|
+ admin `HostRequest::Spawn`.
|
|
- `Rebuilt { agent, ok, note }` — `auto_update::rebuild_agent`
|
|
(covers startup scan + manual `/rebuild` from dashboard) +
|
|
`actions::approve` (ApplyCommit).
|
|
- `Killed { agent }` — admin `HostRequest::Kill` + dashboard
|
|
`/kill` + manager `Kill` MCP tool.
|
|
- `Destroyed { agent }` — `actions::destroy`.
|
|
- `ContainerCrash { agent, note }` — `crash_watch`: a previously-
|
|
running container went away with no operator-initiated transient
|
|
state (Stopping / Restarting / Destroying / Rebuilding). Manager
|
|
can `start` it again or escalate.
|
|
- `NeedsLogin { agent }` — sub-agent has no claude session yet.
|
|
Manager can't act directly (interactive OAuth); typically flags
|
|
the operator.
|
|
- `LoggedIn { agent }` — sub-agent just completed login. Manager
|
|
often greets the agent on this event.
|
|
- `NeedsUpdate { agent }` — sub-agent's recorded flake rev is
|
|
stale. Manager calls `update(name)` to rebuild — idempotent,
|
|
no approval required.
|
|
- `OperatorAnswered { id, question, answer }` — dashboard
|
|
`/answer-question/{id}` after the operator submits the answer
|
|
form.
|
|
|
|
To add a new event: new `HelperEvent` variant + call sites + update
|
|
`prompts/manager.md` so the manager knows the new shape.
|
|
|
|
## Auto-update on startup
|
|
|
|
`hive-c0re serve` runs `auto_update::run` in a background task right
|
|
after opening the coordinator. It enumerates managed containers and
|
|
rebuilds any whose recorded hyperhive rev differs from the current
|
|
one — sub-agents and manager go through the same `lifecycle::rebuild`
|
|
path.
|
|
|
|
"Rev" = canonical filesystem path of `cfg.hyperhiveFlake`. Marker
|
|
file: `/var/lib/hyperhive/applied/.<name>.hyperhive-rev`. If the
|
|
flake input has no canonical path (e.g. a `github:` URL),
|
|
auto-update is a no-op — rebuild manually.
|
|
|
|
The dashboard surfaces pending updates per agent: a clickable
|
|
"needs update ↻" badge appears whenever the marker differs from
|
|
current rev. The badge POSTs `/rebuild/<name>`, calling the same
|
|
`auto_update::rebuild_agent` path so manual triggers and the
|
|
startup scan can't drift. When at least one container is stale, a
|
|
top-level `↻ UPD4TE 4LL` button appears that loops over every
|
|
stale container.
|