Commit graph

130 commits

Author SHA1 Message Date
müde
58c3cd853b container crash watcher → HelperEvent::ContainerCrash
new hive_c0re::crash_watch task polls every 10s, builds the set of
currently-running containers, and on running→stopped transitions
checks the transient snapshot: if no Stopping / Restarting /
Destroying / Rebuilding flag is set, the container exited
unexpectedly and we fire HelperEvent::ContainerCrash into the
manager's inbox so it can react (typically: start it again).

first poll is a seeding pass — no events on harness startup. dbus
subscription would be lower-latency but polling is honest and
debuggable, and a 10s delay on crash detection is fine for our
scale.

manager prompt + approvals doc updated to advertise the new
event variant. todo drops the entry (and the journald-viewer
entry that already shipped).
2026-05-15 21:02:05 +02:00
müde
6db38cf70c model: runtime override via /model slash; fixes for port + bind
- runtime model override: Bus::{model,set_model} + POST /api/model
  (form-encoded {model: name}). turn.rs reads bus.model() per turn
  so a flip lands on the next claude invocation. /api/state grows
  a model field; agent page shows a 'model · <name>' chip in the
  state row. '/model <name>' slash command POSTs to the endpoint
  and refreshes state.

- port regression fix: agent_web_port no longer probes forward for
  *existing* agents (the previous fix shifted ports for any agent
  without a port file, including legacy ones whose container was
  already bound to the bare hashed port — dashboard rendered the
  new port, container was still on the old one, conn errors). new
  rule: port file exists → use it; absent + applied flake present
  → legacy, persist port_hash without probing; absent + no applied
  flake → fresh spawn, probe forward.

- SO_REUSEADDR on both the dashboard and per-agent web UI binds
  via tokio::net::TcpSocket. operator hit 12 retries failing on
  manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly
  without a new dep; retry still covers the genuine
  process-still-alive overlap.

todo: drops the model-override entry (shipped); adds two new
items — model persistence (optional, future), and custom
per-agent MCP tools (groundwork for moving bitburner-agent into
hyperhive).
2026-05-15 20:59:45 +02:00
müde
7d93dd9db4 no nap tool — recv with long wait_seconds replaces it; max raised to 180s
recv-with-timeout is strictly better than a fixed sleep because it
wakes instantly on incoming messages. drop the half-written nap MCP
tool, raise the recv wait_seconds cap from 60s to 180s on both
agent and manager sockets.

prompts updated: agent.md + manager.md now spell out the pattern —
when there's nothing else useful to do, call recv with
wait_seconds=180 to park the turn; do NOT use Bash sleep for the
same purpose. todo drops the nap entry and the napping-state-badge
follow-up; both replaced by 'just use a long recv'.
2026-05-15 20:53:15 +02:00
müde
f65ee88269 recv: optional wait_seconds parameter, capped at 60s
AgentRequest::Recv and ManagerRequest::Recv grow an optional
wait_seconds field (default None → 30s, capped at 60s server-side).
agent_server / manager_server clamp via recv_timeout(). MCP tool
schemas advertise the param so claude can pick its own poll window
— useful when an agent wants to throttle wakes without entering a
distinct nap state.

both harness loops still pass None, keeping the existing 30s
default behaviour for system-level Recvs.
2026-05-15 20:49:33 +02:00
müde
637085644d server-side TurnState in the harness, exposed via /api/state
new TurnState { Idle, Thinking, Compacting } on hive_ag3nt::events::Bus
with set_state + state_snapshot. the turn loops in hive-ag3nt and
hive-m1nd flip Thinking before drive_turn and Idle after; the
web_ui's /api/compact handler flips Compacting around compact_session.

per-agent /api/state grows turn_state + turn_state_since (unix
seconds). frontend prefers the server-reported state over the
client-derived one — setStateAbs takes the absolute since-time so
the 'last turn' chip reads the actual server-side duration instead
of the client's perceived gap between SSE events. SSE turn_start /
turn_end still drive state instantly between renders; /api/state
re-anchors on each turn_end refresh.

new compacting state gets its own purple badge with pulse
animation (mirrors thinking's amber). napping will slot in the
same way once the nap tool lands.
2026-05-15 20:46:38 +02:00
müde
754db7830e ask_operator: ttl_seconds auto-cancel + remaining-time chip
manager can pass ttl_seconds to ask_operator. on submit, host
stores deadline_at = now + ttl in operator_questions (new column,
migrated via existing pragma_table_info pattern), spawns a tokio
task that sleeps until the deadline then resolves the question with
answer '[expired]' and fires the same OperatorAnswered helper event.
already-resolved races no-op silently.

dashboard renders a ' MM:SS' chip on the question row when
deadline_at is set. format collapses seconds → s, < 1h → m s, ≥ 1h
→ h m. heartbeat refresh (5s) keeps the chip current; the operator
sees it tick down.

manager prompt + mcp tool description updated. journald viewer per
container queued in todo (separate task).
2026-05-15 20:38:02 +02:00
müde
2146e47770 web ui: retry binding on AddrInUse during restart races
operator hit 'Address already in use (os error 98)' on a harness
restart — the new harness raced the old socket's release. add a
bind_with_retry helper that backs off (250ms doubling, capped at
2s, 12 tries ≈ 22s total) on AddrInUse before giving up. applied
to both the per-agent web UI and the hive-c0re dashboard.

proper fix would be SO_REUSEADDR via socket2 but retry covers the
TIME_WAIT case fine and keeps the dep count down. Other bind errors
still fail immediately (port permission, fd exhaustion).
2026-05-15 20:33:51 +02:00
müde
538e0446d7 agent page: inbox view of last 30 messages addressed to this agent
new wire request AgentRequest::Recent { limit } / ManagerRequest::Recent
(plus matching responses with Vec<InboxRow>). InboxRow moved to
hive-sh4re so it lives on both surfaces without an internal-to-wire
conversion. host-side dispatch in agent_server / manager_server
calls broker.recent_for(name, limit).

per-agent web_ui /api/state grew an inbox: Vec<InboxRow> populated
via the same per-agent socket (best-effort; transport failure
returns empty). frontend renders as a collapsible <details> section
between the state row and the terminal — fmt timestamp / from /
body in a tight grid, capped at 16em scrollable. only visible when
there are rows.
2026-05-15 20:32:19 +02:00
müde
bd7d2d4860 agent page: dashboard back-link + last-turn timing chip
title bar grows a '↑ DASHB04RD' link next to the rebuild button —
opens the host dashboard in a new tab so the operator can pivot
between agents without losing the live tail. uses the dashboardPort
already plumbed via /api/state.

state row picks up a 'last turn 12.3s' chip that fills in when
state transitions away from thinking. format: ms / s.s / m s.
hidden until the first turn completes.
2026-05-15 20:27:09 +02:00
müde
bc87ff80d2 agent terminal: inline +/- diffs on Write and Edit tool calls
Write and Edit tool_use rows used to render as the bare file path. now
they're collapsed <details> blocks with the actual change inside —
Write shows every content line prefixed '+', Edit shows old_string as
'-' lines then new_string as '+' lines. summary carries the file path
+ counts ('→  Edit /foo · -3 +5'). lines colored via diff-add /
diff-del / diff-ctx; click to expand the full body.

renderFileWriteEdit returns null for any other tool so the existing
flat-row path (fmtToolUse) is untouched.
2026-05-15 20:23:22 +02:00
müde
7b4adea325 dashboard: lifecycle_action helper collapses start/stop/restart/rebuild
five POST handlers (post_kill / post_restart / post_start / post_rebuild)
were all repeating the same boilerplate: strip prefix, set_transient,
call lifecycle::X, clear_transient, match the result. extract one
helper that takes the transient kind, error-message verb, the work
body, and an optional 'on success' tail (used by kill to also
unregister + emit HelperEvent::Killed). each handler shrinks to a
single lifecycle_action(..) call. zero behavior change.
2026-05-15 20:12:03 +02:00
müde
89ccc5e6c5 events.sqlite vacuum moves host-side
retention is a host concern — agents have no business doing their
own cleanup, and a misbehaving harness could skip it. drop
spawn_events_vacuum from both hive-ag3nt and hive-m1nd, drop the
matching Bus::vacuum + EventStore::vacuum methods. new
hive_c0re::events_vacuum module sweeps every existing
agents/<name>/state/hyperhive-events.sqlite on the same hourly
cadence as the broker vacuum. same two-stage delete (older than 7
days, trim to 2000 newest). called from main alongside broker
vacuum.

also: server-side state badge entered into todo.md (today's badge
is derived client-side from sse, fine for idle/thinking but a
state machine that grows compacting/napping wants authoritative
status from the harness).
2026-05-15 20:10:34 +02:00
müde
c9647f4106 operator control: /compact slash command + endpoint
new POST /api/compact on the per-agent web UI: spawns
turn::compact_session in the background so the http handler returns
immediately. claude runs '/compact' over the persistent --continue
session; output streams into the live panel like any other turn.

slash command /compact wired to the new endpoint. SLASH_COMMANDS list
now lists all four (/help /clear /cancel /compact). postCancelTurn +
postCompact share a postSimple() helper.

deliberately not gated against an in-flight turn — claude's own
session lock will reject a concurrent compact and the failure
surfaces as a Note in the live panel.
2026-05-15 19:56:53 +02:00
müde
8344dd9ab7 ask_operator: multi-select + free-text fallback
ask_operator now accepts a multi: bool. when true and options is
non-empty, the dashboard renders the choices as checkboxes — operator
picks any subset, answer comes back as a ', '-joined string. when
false (default), options are radio buttons.

independent of multi, a free-text input ('or type your own…') is
always rendered alongside options so the operator is never trapped
by an incomplete list. submit merges checked options + free text into
the single 'answer' field.

schema migration: operator_questions grows a multi INTEGER column
with a one-shot ALTER TABLE on open. backward compatible — old rows
default to 0 (not multi).

prompt + mcp tool description updated; existing dashboard css for
.qform was rewritten around the new vertical layout.
2026-05-15 19:52:44 +02:00
müde
300be8afa9 operator control: /cancel slash command + cancel button
new POST /api/cancel on the per-agent web UI: shells out
pkill -INT claude (procps added to harness-base.nix). emits a Note
on the bus so the operator sees the cancel landed; state goes back
to idle when run_claude wakes and emits TurnEnd as usual.

frontend:
- /cancel slash command in the terminal input
- ■ cancel turn button in the state row, visible only while
  state === 'thinking' (driven from the same SSE-based state
  machine). disabled briefly during the POST.

claude gets SIGINT (not TERM) so it flushes anything in-flight and
emits a final result row before exiting.
2026-05-15 19:45:37 +02:00
müde
de09503b59 events: persist to sqlite, survive harness restart
hive_ag3nt::events::Bus replaces its in-memory VecDeque with a sqlite-
backed store at /state/hyperhive-events.sqlite (overridable via
HYPERHIVE_EVENTS_DB). emit() inserts a row; history() reads back the
most recent 2000 events. survives harness restart now — operator reload
mid-investigation no longer wipes the trail.

vacuum runs hourly (immediate first sweep): drop rows older than 7
days, then trim to 2000 newest. two-stage so a quiet agent keeps a
useful tail and a chatty one stays bounded. wired into both
hive-ag3nt and hive-m1nd via spawn_events_vacuum.

if the db open fails (e.g. no /state mount in dev), Bus runs in
no-store mode — events still broadcast, just nothing persisted.
2026-05-15 19:42:57 +02:00
müde
211599c589 agent state badge: idle / thinking / offline + age timer
new badge between the status line and the terminal. shows current
state with a glyph + label + age suffix (e.g. '🧠 thinking · 12s').
state transitions are driven from existing SSE turn_start/turn_end —
no harness changes needed. on page load, history backfill detects an
in-flight turn (turn_start without matching turn_end) and starts in
thinking. state-just-changed flash kicks in on each transition. age
timer ticks client-side every 1s.

compacting/napping states will be added when /compact and nap land —
their slots are reserved in the state enum, just unused for now.
2026-05-15 19:36:29 +02:00
müde
0cc25d33d8 drop debug-only cli subcommands from hive-ag3nt + hive-m1nd
drop the one-shot send/recv/kill/start/restart/request-spawn/request-
apply-commit subcommands from both in-container binaries. they were
debug-only — the host admin socket (`hive-c0re ...`) exposes the
same verbs and the manager mcp surface covers the rest from inside
claude. now each binary's --help shows just `serve` and `mcp`,
which are the only commands either is meant to be started with.

removes the `one_shot` helper and the `render` / `check` glue.
2026-05-15 19:34:58 +02:00
müde
08f2ec5232 agent terminal: sticky-bottom auto-scroll with new-row pill
new rows no longer yank the view if the operator is scrolled up.
threshold for 'near bottom' is 48px. when not near bottom, an amber
'↓ N new' pill appears in the bottom-right of the terminal-wrap;
clicking it jumps to bottom. scrolling back near bottom clears the
counter. backfilled (history-replay) rows always scroll to bottom
since the operator hasn't started reading yet.
2026-05-15 19:30:34 +02:00
müde
875a8f5be4 agent terminal: take up real screen space
terminal height is now min(72vh, 60em) instead of a 32em strip — on a
1080p screen that's ~3x more visible lines. body max-width raised
to 110em so a wide window doesn't waste the available width on the
margin.
2026-05-15 19:29:36 +02:00
müde
8d3df656de agent terminal: slash commands /help and /clear, tab-completion
intercept any line starting with / before sending it to the agent inbox.
two commands today:
- /help — render command list locally
- /clear — wipe the local terminal view (server-side event history kept,
  so a page reload restores it)

unknown /xxx surfaces an error row instead of being silently sent. tab
on a /prefix cycles through matching command names. submit-hint
mentions /help so the operator can discover it.

scaffolding for the bigger commands (/compact /cancel /model) is in
place — adding them later is a switch arm plus harness work.
2026-05-15 19:22:14 +02:00
müde
85e1f1a8f4 agent terminal: multi-line textarea input
swap the single-line <input> for an auto-growing <textarea>. enter
submits, shift+enter newlines, ime composition respected (skip submit
while isComposing). height caps at ~12em then scrolls. submit-hint
updates to '↵ send · ⇧↵ newline'. async-form handler now also clears
textareas on success.
2026-05-15 19:21:00 +02:00
müde
fd39226883 visuals: frosted-glass terminal/msgflow, row fade-in, badge pulses
agent terminal-wrap + dashboard msgflow get a translucent bg with
backdrop-filter blur+saturate so page-bg glow softens behind them.
new rows in the live panel and the dashboard message flow fade in
with a 4px slide-up. unread badge pulses; pending-operator-questions
section pulses its glow. history-backfilled rows skip the animation
(.no-anim) so the page doesn't stagger 100 fade-ins on load.
2026-05-15 19:20:15 +02:00
müde
ac1b5fde8e manager: start/restart at will, no approval; refuse self
new manager tools mcp__hyperhive__{start,restart} that delegate to the
existing lifecycle::start / lifecycle::restart on the host. kill was
already at the manager's discretion; rounding out start + restart for
parity so day-to-day container care doesn't have to round-trip through
the operator.

guard: refuse self-targeting on kill/start/restart — the manager would
just be cutting its own legs. spawn (request_spawn) and config changes
(request_apply_commit) still go through the approval queue, since those
are the actual gate. prompt + claude.md updated to make the boundary
explicit. kill now also emits HelperEvent::Killed (it didn't before).
2026-05-15 18:57:25 +02:00
müde
d943bddd9e agent ui: input lives in terminal section, banner shimmer on activity
agent page restructure:
- send form moves into the terminal panel as a prompt-style row beneath
  the live tail (status line stays above so it still reads as a header).
- live panel + prompt share a single bordered 'terminal-wrap' box.
- harness-alive / login-state status lines drop their decorative ascii
  bookends; just a leading dot/glyph remains.
- banner gradient is now a real css gradient with a shimmer animation
  toggled by an .active class. turn_start adds it, turn_end removes it.
  dashboard side mirrors this: each broker sse event nudges a 4s
  shimmer window.
- dashboard container rows drop their static ▓█▓▒░ / ▒░▒░░ glyph
  prefixes; the role chips already disambiguate m1nd vs ag3nt.
- empty-state placeholders drop the ▓ bookends.

terminal pre-fill: hive-ag3nt::events::Bus grows a 500-event ring
buffer; new GET /events/history endpoint returns it. The agent JS
fetches history before opening the SSE stream so opening the page mid-
turn shows the last N events instead of a blank panel. The replay
walks turn_start/turn_end pairs to seed the banner-active state
correctly if a turn was still open.
2026-05-15 18:54:19 +02:00
müde
2770630f33 ask_operator tool: non-blocking; operator answer arrives as helper event
new mcp tool on the manager surface that queues a question on the
dashboard and returns the question id immediately. operator submits an
answer via /answer-question/<id>; the dashboard fires
HelperEvent::OperatorAnswered { id, question, answer } into the manager
inbox so the next turn picks it up.

also: fix async-form button stuck on spinner after successful submit
(refreshState skipped re-rendering, so the button was never re-enabled).
2026-05-15 18:44:42 +02:00
müde
ace13cd785 agent ui: terminal-themed live panel; pretty tool calls; collapsed results
- tool_use renders per-tool (Read /path, Bash $ cmd, send → operator: ...)
- tool_result with >120 chars collapses into <details>; short ones inline
- session_init / result / rate_limit dropped from the panel
- thinking content shown inline if present, fallback indicator otherwise
- TurnStart carries unread count → header badge "· 3 unread"
- per-tool [status] line dropped from envelope; lives in wake prompt + UI
- send form moved below the live panel
- live panel themed as a terminal (crust bg, inset shadow, monospace)
2026-05-15 18:20:58 +02:00
müde
d8807b8e8c claude: pass --settings as a file path (avoid argv length limit) 2026-05-15 18:12:07 +02:00
müde
cf8f1e64b1 claude settings: extract to prompts/claude-settings.json (formatted) 2026-05-15 18:09:05 +02:00
müde
71bf8bf47e claude: pin effortLevel=medium in inline settings 2026-05-15 18:07:53 +02:00
müde
6e75d8e6db manager: don't trust agents on config asks; sketch ask_operator tool in TODO 2026-05-15 18:06:01 +02:00
müde
ac4a978846 prompts: nudge agents to keep messages short, drop big payloads in /state 2026-05-15 18:01:36 +02:00
müde
ff8f8c7c56 per-agent /state dir for durable notes; manager sees them via /agents 2026-05-15 18:00:08 +02:00
müde
7be64c5e66 theme: bring back the vibec0re glow on catppuccin mocha 2026-05-15 17:51:36 +02:00
müde
f33fc3dd50 theme: catppuccin mocha across dashboard + agent UI 2026-05-15 17:49:54 +02:00
müde
68fe66c0ef claude: static role/tools moved to --system-prompt-file 2026-05-15 17:44:15 +02:00
müde
37c6504462 manager events: Spawned/Rebuilt/Killed/Destroyed + start button 2026-05-15 17:38:41 +02:00
müde
06ea0cf283 operator inbox view on dashboard; agent ui doesn't clobber typing 2026-05-15 17:23:53 +02:00
müde
124fd97288 agent ui: SPA shell — static index.html + app.js, /api/state JSON 2026-05-15 17:15:28 +02:00
müde
4f91dfef99 module: thread hyperhive package directly — operators don't apply overlays 2026-05-15 16:51:18 +02:00
müde
970f645461 docs: README + TODO split; trim CLAUDE.md; fix async form 415 2026-05-15 16:41:15 +02:00
müde
392a448656 mcp: SocketReply + format_{ack,recv,status} helpers — dedupe tool wrappers 2026-05-15 16:35:18 +02:00
müde
edf42b7e93 extract dashboard + agent CSS/JS to assets/ (include_str!) 2026-05-15 16:32:35 +02:00
müde
e9b213690e dedupe: lift drive_turn/emit_turn_end/wait_for_login into hive_ag3nt::turn 2026-05-15 16:27:51 +02:00
müde
e1289a3e4c nix templates: factor harness-base.nix (shared scaffolding incl. gitconfig) 2026-05-15 16:10:55 +02:00
müde
0a24946c1e send form: submit via fetch so live view stays put 2026-05-15 16:05:51 +02:00
müde
f83c0aa717 agent prompt: tell sub-agents they can ask manager for config changes 2026-05-15 16:01:58 +02:00
müde
f1fd787f17 rebuild button on agent UI (cross-origin POST to dashboard /rebuild) 2026-05-15 15:57:11 +02:00
müde
edc1de3197 tools: drop NotebookEdit from the agent whitelist 2026-05-15 15:47:58 +02:00
müde
9716f20f81 live panel: pretty rendering (per-event rows, turn blocks, color-coded) 2026-05-15 15:41:24 +02:00