Commit graph

50 commits

Author SHA1 Message Date
müde
772fdd8320 forward plugin install failures to manager from sub-agents
install_configured now takes an optional notify recipient. on a
non-zero or spawn-failed 'claude plugin install', sub-agents send
the spec + stderr to manager via the hyperhive socket; manager
passes None so it doesn't message itself. boot still proceeds either
way — notification is best-effort.
2026-05-16 17:24:04 +02:00
müde
3e040d5b16 agent: forward unhandled turn failures to manager
run_claude now keeps a 20-line stderr ring buffer and bails with it
inline (was just 'exit <status>'). agent serve loop, on Failed (not
PromptTooLong — that's already absorbed by drive_turn's compaction
retry), sends the error body to manager via the normal hyperhive
send. swallows transport errors — failure is already in journald
and the events sqlite. manager-only harness (hive-m1nd) is unchanged
so it doesn't try to notify itself.
2026-05-16 16:04:35 +02:00
damocles
a6d1464071 refactor: per-agent state paths (/agents/{label}/state), centralize in paths.rs 2026-05-16 15:18:32 +02:00
müde
6dd17864ac auto-install claude plugins at harness boot
new hyperhive.claudePlugins NixOS option (list of strings) rendered
to /etc/hyperhive/claude-plugins.json. both hive-ag3nt and hive-m1nd
shell out 'claude plugin install <spec>' for each entry once at
startup before the turn loop opens. failures log a warning but don't
abort boot.
2026-05-16 15:17:34 +02:00
damocles
3d2a7ffec7 fix: auto-wake after turn if pending messages exist, don't block on recv 2026-05-16 13:50:11 +02:00
damocles
d99e0812d0 fix: move sleep to only occur when recv returns empty, avoid message delivery delay 2026-05-16 13:48:04 +02:00
müde
06af23c8a4 recv: None = peek, positive value = opt-in long-poll
old behavior: omitted wait_seconds fell through to the 30s
RECV_LONG_POLL_DEFAULT — claude calling 'is there anything in
my inbox right now?' between actions blocked the turn for half
a minute. flip the semantics: None (or 0) returns immediately,
positive value parks up to MAX (180s, unchanged). cleaner
'peek vs wait' distinction; tool descriptions + agent/manager
prompts updated to point at the new shape.

harness's own serve loops in hive-ag3nt + hive-m1nd relied on
the old default for their inbox poll. they now explicitly pass
wait_seconds: Some(180) to opt into the full park — same
effective behavior as before, just spelled out.

retires the matching TODO under Turn loop.
2026-05-16 03:22:42 +02:00
müde
90df2106bf agent socket: external wake-up path for in-container MCP servers
new AgentRequest::Wake { from, body } drops a message into
this agent's inbox via the per-agent socket. matrix-style MCP
servers can use it when they receive an external event
(matrix message, webhook, scrape result) to nudge claude
into running a turn. broker.send wakes whatever Recv is
currently long-polling, the harness picks the message up,
formats a wake prompt with the caller's chosen from label
('matrix: new dm', 'webhook: deploy succeeded', etc.).

new `hive-ag3nt wake --from <label> --body <text>` subcommand
on the harness binary so MCP servers can shell out instead of
implementing the line-JSON protocol themselves; body=='-'
reads from stdin for multi-line / quoting-friendly payloads.

identity = socket: anything that can connect to /run/hive/mcp
.sock is implicitly trusted to inject. that's fine because the
bind-mount is the agent's own container; no new auth surface
opens up.

docs/turn-loop.md gets a new 'Waking the agent from inside
the container' section pointing at both paths (CLI + raw
JSON).
2026-05-16 03:15:58 +02:00
müde
2a6d084718 ask_operator: any agent can call it, answer routes by asker
new AgentRequest::AskOperator + AgentResponse::QuestionQueued on
the per-agent socket — same shape as the manager flavor, agent
gets the same wire surface (still uses the same operator_questions
table). agent_server::dispatch wires AskOperator through coord
.questions.submit(agent, ...) so the row's asker is the sub-agent
name; the ttl watchdog already in manager_server gets shared and
spawn_question_watchdog goes pub.

answer routing: operator_questions::answer now returns (question,
asker). post_answer_question + post_cancel_question + the watchdog
fire OperatorAnswered through new coord.notify_agent(asker, event)
instead of always notify_manager — the event lands in whichever
agent originally asked. notify_manager is now a thin wrapper.

agent socket plumbing: agent_server::start takes Arc<Coordinator>
instead of Arc<Broker> so dispatch has access to questions +
notify path; coordinator::{register_agent,ensure_runtime} take
self: &Arc<Self>. mcp::AgentServer grows the ask_operator tool;
allowed_mcp_tools(Agent) adds it; prompts/agent.md replaces the
'message the manager to ask the operator' guidance with the
direct tool description.
2026-05-16 01:48:10 +02:00
müde
d94712bde8 turn: unify run_turn / compact_session via TurnFiles
new TurnFiles bundle (mcp_config + settings + system_prompt +
flavor) materialised once per harness boot, passed to drive_turn
and compact_session alike. operator-initiated /compact now uses
the exact same session shape as a normal turn — same MCP
surface, same allowed tools, same role prompt — only the stdin
payload differs (/compact vs the wake-up body). web_ui's
AppState carries the TurnFiles instead of (label + socket +
flavor + ad-hoc file writes per click). bin/hive-ag3nt and
bin/hive-m1nd prepare TurnFiles before spawning the web UI and
pass them to both surfaces. web_ui::Flavor folds into a type
alias for mcp::Flavor — no two-stage enum mapping.

removes ClaudeMode + the run_claude variant fork (system prompt
was Option, mcp args were skipped on Compact). dead 'mode'
plumbing gone.
2026-05-16 00:57:58 +02:00
müde
f65ee88269 recv: optional wait_seconds parameter, capped at 60s
AgentRequest::Recv and ManagerRequest::Recv grow an optional
wait_seconds field (default None → 30s, capped at 60s server-side).
agent_server / manager_server clamp via recv_timeout(). MCP tool
schemas advertise the param so claude can pick its own poll window
— useful when an agent wants to throttle wakes without entering a
distinct nap state.

both harness loops still pass None, keeping the existing 30s
default behaviour for system-level Recvs.
2026-05-15 20:49:33 +02:00
müde
637085644d server-side TurnState in the harness, exposed via /api/state
new TurnState { Idle, Thinking, Compacting } on hive_ag3nt::events::Bus
with set_state + state_snapshot. the turn loops in hive-ag3nt and
hive-m1nd flip Thinking before drive_turn and Idle after; the
web_ui's /api/compact handler flips Compacting around compact_session.

per-agent /api/state grows turn_state + turn_state_since (unix
seconds). frontend prefers the server-reported state over the
client-derived one — setStateAbs takes the absolute since-time so
the 'last turn' chip reads the actual server-side duration instead
of the client's perceived gap between SSE events. SSE turn_start /
turn_end still drive state instantly between renders; /api/state
re-anchors on each turn_end refresh.

new compacting state gets its own purple badge with pulse
animation (mirrors thinking's amber). napping will slot in the
same way once the nap tool lands.
2026-05-15 20:46:38 +02:00
müde
538e0446d7 agent page: inbox view of last 30 messages addressed to this agent
new wire request AgentRequest::Recent { limit } / ManagerRequest::Recent
(plus matching responses with Vec<InboxRow>). InboxRow moved to
hive-sh4re so it lives on both surfaces without an internal-to-wire
conversion. host-side dispatch in agent_server / manager_server
calls broker.recent_for(name, limit).

per-agent web_ui /api/state grew an inbox: Vec<InboxRow> populated
via the same per-agent socket (best-effort; transport failure
returns empty). frontend renders as a collapsible <details> section
between the state row and the terminal — fmt timestamp / from /
body in a tight grid, capped at 16em scrollable. only visible when
there are rows.
2026-05-15 20:32:19 +02:00
müde
89ccc5e6c5 events.sqlite vacuum moves host-side
retention is a host concern — agents have no business doing their
own cleanup, and a misbehaving harness could skip it. drop
spawn_events_vacuum from both hive-ag3nt and hive-m1nd, drop the
matching Bus::vacuum + EventStore::vacuum methods. new
hive_c0re::events_vacuum module sweeps every existing
agents/<name>/state/hyperhive-events.sqlite on the same hourly
cadence as the broker vacuum. same two-stage delete (older than 7
days, trim to 2000 newest). called from main alongside broker
vacuum.

also: server-side state badge entered into todo.md (today's badge
is derived client-side from sse, fine for idle/thinking but a
state machine that grows compacting/napping wants authoritative
status from the harness).
2026-05-15 20:10:34 +02:00
müde
300be8afa9 operator control: /cancel slash command + cancel button
new POST /api/cancel on the per-agent web UI: shells out
pkill -INT claude (procps added to harness-base.nix). emits a Note
on the bus so the operator sees the cancel landed; state goes back
to idle when run_claude wakes and emits TurnEnd as usual.

frontend:
- /cancel slash command in the terminal input
- ■ cancel turn button in the state row, visible only while
  state === 'thinking' (driven from the same SSE-based state
  machine). disabled briefly during the POST.

claude gets SIGINT (not TERM) so it flushes anything in-flight and
emits a final result row before exiting.
2026-05-15 19:45:37 +02:00
müde
de09503b59 events: persist to sqlite, survive harness restart
hive_ag3nt::events::Bus replaces its in-memory VecDeque with a sqlite-
backed store at /state/hyperhive-events.sqlite (overridable via
HYPERHIVE_EVENTS_DB). emit() inserts a row; history() reads back the
most recent 2000 events. survives harness restart now — operator reload
mid-investigation no longer wipes the trail.

vacuum runs hourly (immediate first sweep): drop rows older than 7
days, then trim to 2000 newest. two-stage so a quiet agent keeps a
useful tail and a chatty one stays bounded. wired into both
hive-ag3nt and hive-m1nd via spawn_events_vacuum.

if the db open fails (e.g. no /state mount in dev), Bus runs in
no-store mode — events still broadcast, just nothing persisted.
2026-05-15 19:42:57 +02:00
müde
0cc25d33d8 drop debug-only cli subcommands from hive-ag3nt + hive-m1nd
drop the one-shot send/recv/kill/start/restart/request-spawn/request-
apply-commit subcommands from both in-container binaries. they were
debug-only — the host admin socket (`hive-c0re ...`) exposes the
same verbs and the manager mcp surface covers the rest from inside
claude. now each binary's --help shows just `serve` and `mcp`,
which are the only commands either is meant to be started with.

removes the `one_shot` helper and the `render` / `check` glue.
2026-05-15 19:34:58 +02:00
müde
ac1b5fde8e manager: start/restart at will, no approval; refuse self
new manager tools mcp__hyperhive__{start,restart} that delegate to the
existing lifecycle::start / lifecycle::restart on the host. kill was
already at the manager's discretion; rounding out start + restart for
parity so day-to-day container care doesn't have to round-trip through
the operator.

guard: refuse self-targeting on kill/start/restart — the manager would
just be cutting its own legs. spawn (request_spawn) and config changes
(request_apply_commit) still go through the approval queue, since those
are the actual gate. prompt + claude.md updated to make the boundary
explicit. kill now also emits HelperEvent::Killed (it didn't before).
2026-05-15 18:57:25 +02:00
müde
2770630f33 ask_operator tool: non-blocking; operator answer arrives as helper event
new mcp tool on the manager surface that queues a question on the
dashboard and returns the question id immediately. operator submits an
answer via /answer-question/<id>; the dashboard fires
HelperEvent::OperatorAnswered { id, question, answer } into the manager
inbox so the next turn picks it up.

also: fix async-form button stuck on spinner after successful submit
(refreshState skipped re-rendering, so the button was never re-enabled).
2026-05-15 18:44:42 +02:00
müde
ace13cd785 agent ui: terminal-themed live panel; pretty tool calls; collapsed results
- tool_use renders per-tool (Read /path, Bash $ cmd, send → operator: ...)
- tool_result with >120 chars collapses into <details>; short ones inline
- session_init / result / rate_limit dropped from the panel
- thinking content shown inline if present, fallback indicator otherwise
- TurnStart carries unread count → header badge "· 3 unread"
- per-tool [status] line dropped from envelope; lives in wake prompt + UI
- send form moved below the live panel
- live panel themed as a terminal (crust bg, inset shadow, monospace)
2026-05-15 18:20:58 +02:00
müde
d8807b8e8c claude: pass --settings as a file path (avoid argv length limit) 2026-05-15 18:12:07 +02:00
müde
68fe66c0ef claude: static role/tools moved to --system-prompt-file 2026-05-15 17:44:15 +02:00
müde
37c6504462 manager events: Spawned/Rebuilt/Killed/Destroyed + start button 2026-05-15 17:38:41 +02:00
müde
e9b213690e dedupe: lift drive_turn/emit_turn_end/wait_for_login into hive_ag3nt::turn 2026-05-15 16:27:51 +02:00
müde
e1289a3e4c nix templates: factor harness-base.nix (shared scaffolding incl. gitconfig) 2026-05-15 16:10:55 +02:00
müde
f83c0aa717 agent prompt: tell sub-agents they can ask manager for config changes 2026-05-15 16:01:58 +02:00
müde
70af56e050 turn loop: --continue, disable claude auto-compact, /compact on overflow 2026-05-15 15:40:51 +02:00
müde
409263f1c9 operator input: per-agent /send form (dashboard T4LK removed) 2026-05-15 15:28:17 +02:00
müde
09787659ab manager: same agent loop, ManagerServer MCP surface 2026-05-15 15:13:26 +02:00
müde
accb1445e3 claude: pipe prompt via stdin (variadic --allowedTools was eating it); + ManagerRequest::Status 2026-05-15 15:06:09 +02:00
müde
9eab28a716 agent ui: live event panel via SSE + stream-json 2026-05-15 15:01:26 +02:00
müde
3c9d42b2a7 agent loop: claude drives; tool envelope (log/run/status/log) 2026-05-15 14:54:10 +02:00
müde
26298470a5 model: "haiku" shorthand 2026-05-15 14:44:25 +02:00
müde
ae4a3f1b77 drop in-code todo comment (lives in CLAUDE.md) 2026-05-15 14:44:11 +02:00
müde
625bae1b31 turn loop: pin to haiku 4.5 (todo: per-agent override) 2026-05-15 14:43:43 +02:00
müde
37efb0889f turn loop: tool whitelist (no web/task), no skip-permissions 2026-05-15 14:41:38 +02:00
müde
65a10a3c2b agent: embedded MCP server (rmcp) with send/recv tools 2026-05-15 14:29:57 +02:00
müde
d81a845dbe login: default command is claude auth login 2026-05-15 13:32:51 +02:00
müde
78fae44ee5 phase 8 step 3: needs-login partial-run mode + dashboard badge 2026-05-15 12:57:06 +02:00
müde
c59fa8541c phase 8 step 2: approval-gated spawn + dashboard spinner 2026-05-15 12:53:13 +02:00
müde
2267800c51 hive-m1nd: if-let-else instead of match 2026-05-15 00:27:47 +02:00
müde
1ceabae892 Phase 7c: ApprovalResolved helper events into manager's inbox 2026-05-15 00:26:42 +02:00
müde
d0f954bbc1 Phase 6a: per-container web UI (axum); per-agent port hashed from name 2026-05-14 23:39:06 +02:00
müde
f12837fe32 Phase 5a: approval queue (request_apply_commit, pending/approve/deny) 2026-05-14 22:50:19 +02:00
müde
4a73340150 fmt 2026-05-14 22:38:07 +02:00
müde
17092961a2 Phase 4: hive-m1nd harness + manager nixos template; devshell sqlite 2026-05-14 22:36:34 +02:00
müde
6e7fd2e897 Phase 3c: nixpkgs-unstable for claude-code; harness calls claude --print, falls back to echo 2026-05-14 22:26:14 +02:00
müde
2fe9e91005 hive-ag3nt: echo turn (placeholder until claude integration) 2026-05-14 22:19:05 +02:00
müde
61407f41c9 hive-ag3nt: serve loop + send/recv CLI; template runs serve 2026-05-14 21:44:05 +02:00
müde
6686df93a5 cargo workspace skeleton 2026-05-14 20:24:55 +02:00