crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).
6.6 KiB
Turn loop + MCP
How the harness wakes up, what it asks claude to do, and what tools claude has access to in return.
The loop
Each agent harness (hive-ag3nt serve or hive-m1nd serve) runs:
- Long-poll
Recvon its socket. The host-side broker (broker.rs::recv_blocking) returns immediately if there's a pending message, otherwise waits up to 30 s for a brokerSentevent for this recipient. - Pop one message. Peek the remaining inbox depth with
Status. - Emit
LiveEvent::TurnStart { from, body, unread }onto the SSE bus. - Spawn claude (one process per turn) and pipe the wake prompt over stdin.
- Stream stdout (JSON lines) into the bus as
LiveEvent::Stream(value). Pump stderr asNote. - Wait for claude to exit. On
Prompt is too long, run/compacton the session once and retry the turn. - Emit
LiveEvent::TurnEnd { ok, note }. Sleeppoll_msto avoid tight loops on transient failures.
The claude invocation
claude --print --verbose --output-format stream-json --model <name> \
--continue --settings /run/hive/claude-settings.json \
--system-prompt-file /run/hive/claude-system-prompt.md \
--mcp-config /run/hive/claude-mcp-config.json --strict-mcp-config \
--tools <builtins> --allowedTools <builtins+mcp>
# wake prompt piped over stdin
<name> is read from Bus::model() on each turn, default
haiku. Operator can flip it at runtime with /model <name> in
the web terminal — the next turn picks it up. The choice is
persisted to /state/hyperhive-model so it survives restart;
override path: HYPERHIVE_MODEL_FILE env var for tests.
--continue keeps a persistent session per agent (claude stores
sessions in ~/.claude/projects/, which is bind-mounted
persistently). Auto-compact and auto-memory are disabled via
--settings because hyperhive owns compaction (/compact on
overflow, retry once; operator can also force one via /api/compact).
The wake prompt is intentionally minimal: just the popped message's
from/body, plus an inline ({unread} more pending — drain via …) hint when unread > 0. Claude drives any further recv/send
itself via the embedded MCP server.
Whenever hive-c0re starts / restarts / rebuilds a container, it
also drops a system message into the agent's inbox via
Coordinator::kick_agent — a one-line "you were just (re)started,
check /state/ for your notes, --continue session is intact". The
next turn picks it up like any other inbox message.
On-boot files
hive_ag3nt::turn::write_* writes three files next to the per-agent
socket at /run/hive/ once at startup:
claude-mcp-config.json— re-invokes the running binary asmcpchild (so the same binary serves as harness + as claude's MCP child process).claude-settings.json— the--settingsblob (auto-compact and auto-memory off, effortLevel medium).claude-system-prompt.md— rendered fromhive-ag3nt/prompts/{agent,manager}.mdwith{label}substituted. Passed via--system-prompt-file.
The shared per-turn plumbing lives in hive_ag3nt::turn::{write_mcp_config, write_settings, write_system_prompt, run_turn, drive_turn, emit_turn_end, wait_for_login, compact_session} so the two binaries
can't drift.
MCP surface
The harness ships an embedded MCP server (rmcp 1.7). Claude launches
it as a stdio child via --mcp-config. The hyperhive socket name is
hyperhive, so the tools land in claude as mcp__hyperhive__<tool>.
Sub-agent tools
send(to, body)— message a peer (logical agent name), another agent, or the operator (recipientoperator, surfaces in the dashboard inbox).recv(wait_seconds?)— drain one inbox message. Long-polls server-side;wait_secondsis capped at 180 (default 30 when omitted). Agents use a long wait to park their turn waiting for work instead of busy-looping with short polls — they wake instantly when a message arrives.
Manager tools (in addition to send/recv)
request_spawn(name)— queue a Spawn approval for a brand-new sub-agent (≤9 char name). Operator approves on the dashboard.kill(name)— graceful stop. No approval required.start(name)— start a stopped sub-agent. No approval.restart(name)— stop + start. No approval.update(name)— rebuild (re-applies the current hyperhive flake- agent.nix, restarts). No approval, idempotent. Manager calls
this on receipt of a
needs_updatesystem event.
- agent.nix, restarts). No approval, idempotent. Manager calls
this on receipt of a
request_apply_commit(agent, commit_ref)— submit a config change for any agent (hm1ndfor the manager's own config) for operator approval.ask_operator(question, options?, multi?, ttl_seconds?)— surface a question on the dashboard. Non-blocking — returns the queued question id; the operator's answer arrives later asHelperEvent::OperatorAnsweredin the manager inbox. Options always render alongside a free-text fallback;multi=truerenders options as checkboxes.ttl_secondsauto-cancels with answer[expired]after the deadline (useful for time-sensitive decisions that become moot if the operator hasn't responded). The operator can also manually cancel with[cancelled]via the dashboard.
The boundary: lifecycle ops on existing sub-agents
(kill/start/restart) are at the manager's discretion — no
operator approval. Creating a new agent (request_spawn) and
changing any agent's config (request_apply_commit) still go
through the approval queue.
Authoritative state
hive_ag3nt::events::Bus carries the current turn-loop state in
addition to the broadcast channel and the events history. Variants:
Idle— sitting onRecvwaiting for mail.Thinking—claude --printis running for a turn.Compacting— operator-triggered/compactis in flight.
The harness flips state at the relevant transitions
(set_state(Thinking) before drive_turn, set_state(Idle)
after; set_state(Compacting) around compact_session). Exposed
via /api/state.turn_state + turn_state_since (unix seconds);
the agent page renders this rather than deriving from SSE events.
Tool envelope
mcp::run_tool_envelope: every MCP tool handler logs the request,
runs the body, logs the result. Pre-/post-log only — the inbox
status hint moved to the wake prompt + UI header.
Tool whitelist (mcp::ALLOWED_BUILTIN_TOOLS)
- Allowed built-ins:
Bash,Edit,Glob,Grep,Read,TodoWrite,Write. - Denied by omission:
WebFetch,WebSearch,Task,NotebookEdit. - Allowed MCP tools: as listed above per flavor.
Bash is on the allow-list pending a finer-grained pattern allow-list
(Bash(git *)-style) — see TODO.