hyperhive/docs/turn-loop.md
müde 62d1a74929 docs sync + revert auto-unfree removal
revert the earlier 'operator must set allowUnfree' move:
per-agent containers evaluate their own nixpkgs and the operator's
host-level allowUnfree doesn't propagate in. restoring the scoped
allowUnfreePredicate inside both the claude-unstable overlay and
harness-base.nix; documented in README + gotchas as 'nothing to
set on the operator side'.

docs:
- claude.md file map adds crash_watch.rs, kick_agent on coordinator,
  /api/model + journald viewer + bind-with-retry references.
- scratchpad rewritten to reflect the recent run.
- web-ui.md: notification row + browser notifications section,
  state row (badge + model chip + last-turn chip + cancel button),
  per-agent inbox, /model slash, /cancel-question + journald
  endpoints, focus-preservation on refresh.
- turn-loop.md: --model is read from Bus::model() per turn (runtime
  override via /model); recv(wait_seconds) up to 180s with the
  rationale; ask_operator gains ttl_seconds; new TurnState section;
  kick_agent inbox-on-startup hint.
- approvals.md: ttl/cancel resolution paths for operator questions.
- persistence.md: /state/hyperhive-model file.
- gotchas.md: web UI port collision policy (rename, don't probe);
  bind retry + SO_REUSEADDR shape; auto-unfree restored.
- todo.md: cleaned up empty sections and stale entries; /model
  shipped, dropped from the list.
2026-05-15 21:26:13 +02:00

6.4 KiB

Turn loop + MCP

How the harness wakes up, what it asks claude to do, and what tools claude has access to in return.

The loop

Each agent harness (hive-ag3nt serve or hive-m1nd serve) runs:

  1. Long-poll Recv on its socket. The host-side broker (broker.rs::recv_blocking) returns immediately if there's a pending message, otherwise waits up to 30 s for a broker Sent event for this recipient.
  2. Pop one message. Peek the remaining inbox depth with Status.
  3. Emit LiveEvent::TurnStart { from, body, unread } onto the SSE bus.
  4. Spawn claude (one process per turn) and pipe the wake prompt over stdin.
  5. Stream stdout (JSON lines) into the bus as LiveEvent::Stream(value). Pump stderr as Note.
  6. Wait for claude to exit. On Prompt is too long, run /compact on the session once and retry the turn.
  7. Emit LiveEvent::TurnEnd { ok, note }. Sleep poll_ms to avoid tight loops on transient failures.

The claude invocation

claude --print --verbose --output-format stream-json --model <name> \
  --continue --settings /run/hive/claude-settings.json \
  --system-prompt-file /run/hive/claude-system-prompt.md \
  --mcp-config /run/hive/claude-mcp-config.json --strict-mcp-config \
  --tools <builtins> --allowedTools <builtins+mcp>
# wake prompt piped over stdin

<name> is read from Bus::model() on each turn, default haiku. Operator can flip it at runtime with /model <name> in the web terminal — the next turn picks it up. The choice is persisted to /state/hyperhive-model so it survives restart; override path: HYPERHIVE_MODEL_FILE env var for tests.

--continue keeps a persistent session per agent (claude stores sessions in ~/.claude/projects/, which is bind-mounted persistently). Auto-compact and auto-memory are disabled via --settings because hyperhive owns compaction (/compact on overflow, retry once; operator can also force one via /api/compact).

The wake prompt is intentionally minimal: just the popped message's from/body, plus an inline ({unread} more pending — drain via …) hint when unread > 0. Claude drives any further recv/send itself via the embedded MCP server.

Whenever hive-c0re starts / restarts / rebuilds a container, it also drops a system message into the agent's inbox via Coordinator::kick_agent — a one-line "you were just (re)started, check /state/ for your notes, --continue session is intact". The next turn picks it up like any other inbox message.

On-boot files

hive_ag3nt::turn::write_* writes three files next to the per-agent socket at /run/hive/ once at startup:

  • claude-mcp-config.json — re-invokes the running binary as mcp child (so the same binary serves as harness + as claude's MCP child process).
  • claude-settings.json — the --settings blob (auto-compact and auto-memory off, effortLevel medium).
  • claude-system-prompt.md — rendered from hive-ag3nt/prompts/{agent,manager}.md with {label} substituted. Passed via --system-prompt-file.

The shared per-turn plumbing lives in hive_ag3nt::turn::{write_mcp_config, write_settings, write_system_prompt, run_turn, drive_turn, emit_turn_end, wait_for_login, compact_session} so the two binaries can't drift.

MCP surface

The harness ships an embedded MCP server (rmcp 1.7). Claude launches it as a stdio child via --mcp-config. The hyperhive socket name is hyperhive, so the tools land in claude as mcp__hyperhive__<tool>.

Sub-agent tools

  • send(to, body) — message a peer (logical agent name), another agent, or the operator (recipient operator, surfaces in the dashboard inbox).
  • recv(wait_seconds?) — drain one inbox message. Long-polls server-side; wait_seconds is capped at 180 (default 30 when omitted). Agents use a long wait to park their turn waiting for work instead of busy-looping with short polls — they wake instantly when a message arrives.

Manager tools (in addition to send/recv)

  • request_spawn(name) — queue a Spawn approval for a brand-new sub-agent (≤9 char name). Operator approves on the dashboard.
  • kill(name) — graceful stop. No approval required.
  • start(name) — start a stopped sub-agent. No approval.
  • restart(name) — stop + start. No approval.
  • request_apply_commit(agent, commit_ref) — submit a config change for any agent (hm1nd for the manager's own config) for operator approval.
  • ask_operator(question, options?, multi?, ttl_seconds?) — surface a question on the dashboard. Non-blocking — returns the queued question id; the operator's answer arrives later as HelperEvent::OperatorAnswered in the manager inbox. Options always render alongside a free-text fallback; multi=true renders options as checkboxes. ttl_seconds auto-cancels with answer [expired] after the deadline (useful for time-sensitive decisions that become moot if the operator hasn't responded). The operator can also manually cancel with [cancelled] via the dashboard.

The boundary: lifecycle ops on existing sub-agents (kill/start/restart) are at the manager's discretion — no operator approval. Creating a new agent (request_spawn) and changing any agent's config (request_apply_commit) still go through the approval queue.

Authoritative state

hive_ag3nt::events::Bus carries the current turn-loop state in addition to the broadcast channel and the events history. Variants:

  • Idle — sitting on Recv waiting for mail.
  • Thinkingclaude --print is running for a turn.
  • Compacting — operator-triggered /compact is in flight.

The harness flips state at the relevant transitions (set_state(Thinking) before drive_turn, set_state(Idle) after; set_state(Compacting) around compact_session). Exposed via /api/state.turn_state + turn_state_since (unix seconds); the agent page renders this rather than deriving from SSE events.

Tool envelope

mcp::run_tool_envelope: every MCP tool handler logs the request, runs the body, logs the result. Pre-/post-log only — the inbox status hint moved to the wake prompt + UI header.

Tool whitelist (mcp::ALLOWED_BUILTIN_TOOLS)

  • Allowed built-ins: Bash, Edit, Glob, Grep, Read, TodoWrite, Write.
  • Denied by omission: WebFetch, WebSearch, Task, NotebookEdit.
  • Allowed MCP tools: as listed above per flavor.

Bash is on the allow-list pending a finer-grained pattern allow-list (Bash(git *)-style) — see TODO.