müde 80229c6af9 manager: needs_login / logged_in / needs_update events + update tool

crash_watch grows two more state-axes alongside running/stopped:

- logged-in (claude session dir populated for the agent)
- up-to-date (recorded flake rev matches current)

per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn /
NeedsUpdate. seed-on-first-tick semantics retained — nothing fires
on harness boot for agents that were already in their state. only
needs_update fires the 'stale appeared' direction; the resolved
direction is already covered by Rebuilt.

new mcp__hyperhive__update(name) on the manager surface: idempotent
rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding)
so the dashboard shows the spinner. login intentionally has NO tool
— it's interactive OAuth, only the operator can complete it.

prompts + approvals doc + turn-loop doc updated. todo grows a
'show per-agent applied config in dashboard' entry (separate
follow-up).

2026-05-15 21:42:13 +02:00

7.2 KiB

Raw Blame History

Approvals + manager + helper events

The approval queue is hyperhive's pivot: nothing that changes the shape of an agent (its config, whether it exists) happens without an operator click. The manager (hm1nd) is the policy gate in front of that queue; helper events are how it stays informed about what happens after a decision lands.

End-to-end approval flow

Manager edits /agents/<name>/config/agent.nix (bind-mounted from the host's per-agent proposed repo) and commits.
Manager submits the commit sha via request_apply_commit(agent, commit_ref).
Operator sees the diff on the dashboard, clicks ◆ APPR0VE (or hive-c0re approve <id> on the CLI).
hive-c0re reads the file at that sha from proposed, applies into applied, commits there, runs nixos-container update.
HelperEvent::ApprovalResolved lands in the manager's inbox.

Spawn approvals follow the same shape but skip the commit-diff step — the operator just sees the name. On approve, hive-c0re creates the container in a background task while the dashboard shows a spinner.

Two repos per agent

/var/lib/hyperhive/agents/<name>/config/    proposed
└── agent.nix                               # the only file the
                                            # manager can change
                                            # (initial commit by
                                            # hive-c0re on first
                                            # spawn, never touched
                                            # again).

/var/lib/hyperhive/applied/<name>/          applied — hive-c0re-only
├── flake.nix                               # auto-generated
└── agent.nix                               # overwritten by approve
                                            # from the proposed commit

The container's --flake ref is <applied_dir>#default. The flake extends hyperhive.nixosConfigurations.{agent-base|manager} with ./agent.nix plus an inline module setting programs.git.config.user (committer identity = the agent's name) and systemd.services.<harness>.environment (HIVE_PORT, HIVE_LABEL, HIVE_DASHBOARD_PORT).

Manager (`hm1nd`) is hive-c0re-managed

The manager container runs through the same lifecycle as sub-agents. On hive-c0re serve startup, if hm1nd is missing, hive-c0re creates it. The manager's flake lives at /var/lib/hyperhive/applied/hm1nd/; its proposed config at /var/lib/hyperhive/agents/hm1nd/config/. Manager can edit its own agent.nix (visible inside the container at /agents/hm1nd/config/) and submit request_apply_commit("hm1nd", <sha>) for operator approval.

Differences from sub-agents:

flake.nix extends hyperhive.nixosConfigurations.manager (vs agent-base).
Container name is hm1nd (no h- prefix).
Fixed web UI port (MANAGER_PORT = 8000).
set_nspawn_flags adds an extra bind: /var/lib/hyperhive/agents → /agents (RW), so the manager can edit per-agent proposed repos.
First-deploy spawn bypasses the approval queue (manager is required infrastructure).
Per-agent socket lives at /run/hyperhive/manager/, owned by manager_server::start.

Migration note (for older hosts): drop any containers.hm1nd = { ... } block from your host NixOS config. hyperhive creates and updates the manager itself.

Manager policy

From hive-ag3nt/prompts/manager.md: the manager does NOT rubber-stamp sub-agent config requests. It verifies (role match, package legitimacy, cheaper alternative, blast radius) before committing and calling request_apply_commit.

For ambiguous cases or anything that needs human signal, the manager calls ask_operator(question, options?, multi?, ttl_seconds?) — queues the question on the dashboard and returns the id immediately. The operator's answer arrives later as HelperEvent::OperatorAnswered in the manager inbox. Storage is hive-c0re::operator_questions (sqlite); the answer flow is:

POST /answer-question/{id}
  → OperatorQuestions::answer
  → notify_manager(OperatorAnswered { id, question, answer })

Two more paths resolve a pending question with a sentinel answer:

POST /cancel-question/{id} (✗ CANC3L button on the dashboard) resolves with [cancelled]. The manager sees a terminal state and can fall back.
ttl_seconds deadline: a tokio watchdog spawned at submit time fires answer(id, "[expired]") once the ttl runs out. Already- resolved races no-op. The dashboard surfaces a ⏳ MM:SS chip on each pending question with a deadline.

Helper events to the manager

Coordinator::notify_manager(&HelperEvent) enqueues an inbox message from sender system with the event JSON in the body. The manager harness no longer short-circuits these — they drive a regular claude turn so the manager can react. Variants (hive_sh4re::HelperEvent):

ApprovalResolved { id, agent, commit_ref, status, note } — fired by actions::approve + actions::deny whenever an approval transitions to its terminal state.
Spawned { agent, ok, note } — actions::approve (Spawn-kind)
- admin HostRequest::Spawn.
Rebuilt { agent, ok, note } — auto_update::rebuild_agent (covers startup scan + manual /rebuild from dashboard) + actions::approve (ApplyCommit).
Killed { agent } — admin HostRequest::Kill + dashboard /kill + manager Kill MCP tool.
Destroyed { agent } — actions::destroy.
ContainerCrash { agent, note } — crash_watch: a previously- running container went away with no operator-initiated transient state (Stopping / Restarting / Destroying / Rebuilding). Manager can start it again or escalate.
NeedsLogin { agent } — sub-agent has no claude session yet. Manager can't act directly (interactive OAuth); typically flags the operator.
LoggedIn { agent } — sub-agent just completed login. Manager often greets the agent on this event.
NeedsUpdate { agent } — sub-agent's recorded flake rev is stale. Manager calls update(name) to rebuild — idempotent, no approval required.
OperatorAnswered { id, question, answer } — dashboard /answer-question/{id} after the operator submits the answer form.

To add a new event: new HelperEvent variant + call sites + update prompts/manager.md so the manager knows the new shape.

Auto-update on startup

hive-c0re serve runs auto_update::run in a background task right after opening the coordinator. It enumerates managed containers and rebuilds any whose recorded hyperhive rev differs from the current one — sub-agents and manager go through the same lifecycle::rebuild path.

"Rev" = canonical filesystem path of cfg.hyperhiveFlake. Marker file: /var/lib/hyperhive/applied/.<name>.hyperhive-rev. If the flake input has no canonical path (e.g. a github: URL), auto-update is a no-op — rebuild manually.

The dashboard surfaces pending updates per agent: a clickable "needs update ↻" badge appears whenever the marker differs from current rev. The badge POSTs /rebuild/<name>, calling the same auto_update::rebuild_agent path so manual triggers and the startup scan can't drift. When at least one container is stale, a top-level ↻ UPD4TE 4LL button appears that loops over every stale container.

7.2 KiB Raw Blame History