hyperhive

Author	SHA1	Message	Date
müde	7d6d8e96c1	per-agent extra MCP servers via hyperhive.extraMcpServers new NixOS option in harness-base.nix: hyperhive.extraMcpServers.<key> = { command = "/path/to/server"; args = [ ... ]; env = { KEY = "value"; }; allowedTools = [ "send_message" "join_room" ]; # or [""] }; declared as attrsOf submodule so agents stack arbitrarily many. the module writes the whole map as JSON to /etc/hyperhive/extra-mcp.json at activation; the harness reads that file in mcp::render_claude_config and merges each entry into the rendered --mcp-config under its own mcpServers.<key> block. allowed_mcp_tools(flavor) extends the --allowedTools arg with mcp__<key>__<pattern> for every entry — "" (the default) becomes mcp__<key>__* so every tool from that server is auto-approved, or pass a concrete list to tighten. collision guard: an extra server keyed "hyperhive" is dropped with a warn-log so user config can't shadow the built-in surface. malformed JSON / missing file fall back to "no extras" silently. prompt note added: agents see "(some agents only) extra MCP tools surfaced as mcp__<server>__<tool>" and learn they're declared via agent.nix. retires the matching TODO under Per-agent extension. matrix-chat agents + bitburner-agent migration unblocked.	2026-05-16 02:10:11 +02:00
müde	6b3ef4549c	manager_server: reject proposals that modify flake.nix submit_apply_commit now diffs the freshly-tagged proposal/<id> against applied/main and refuses if flake.nix is in the changeset. flake.nix is fixed boilerplate the meta flake depends on (it exports nixosModules.default = import ./agent .nix); silent edits there would break the nixosConfiguration in subtle ways. the manager prompt already says don't touch it; this is the host-side belt — clear error to the manager on submit, row marked failed in sqlite, no orphan pending approval to chase. diff-failure is logged + ignored: the build path surfaces concrete errors if flake.nix is actually broken.	2026-05-16 01:42:11 +02:00
müde	68ef6ab433	todo: stream nixos-container output so slow != stuck surfaced by a real hang investigation today — lifecycle::run uses .output() which buffers stdout/stderr until exit, so a multi-minute nix build through nixos-container update looks identical to a wedged daemon. line-buffered streaming into tracing (and ideally the per-agent live event bus when the agent is known) makes 'still building, just slow' visible without strace gymnastics.	2026-05-16 01:38:02 +02:00
müde	65bdde898e	todo: tag retention, flake.nix tamper-check, sync_agents nix call three things surfaced by the meta-flake overhaul + the nix CLI deprecation we just fixed worth tracking explicitly. extend the web-UI-for-config-repos entry to also cover the /meta deploy log now that meta's git history is the swarm-wide audit trail.	2026-05-16 01:21:27 +02:00
müde	df9da4d6e1	todo: recv default should not sleep, agent opts into wait	2026-05-15 23:00:25 +02:00
müde	497cd15137	docs: tag-driven config-apply plan + migration story scratchpad in claude.md marks this as in-flight; docs/approvals.md gets the new tag state machine (proposal/approved/building/deployed/ failed/denied) and the manager applied.git read-only mount. todo picks up the unprivileged-containers git-identity caveat and a web ui for config repos as a downstream follow-up.	2026-05-15 22:43:47 +02:00
müde	75e7faff0c	docs: full sync ahead of compaction + config-management overhaul readme: manager mcp surface picks up update; operator-surface recap mentions /model + last-turn + model chip + the three collapsibles (inbox / journald / agent.nix). web-ui.md: details-restore-key story under shape; port-conflict banner mention on containers; agent.nix viewer alongside journald; notifications use per-event tags + console.debug log on block/show; deny endpoint takes note=<reason>; data-prompt / data-prompt-field generalisation noted. conventions.md: data-prompt and snapshot/restoreOpenDetails added to the async-forms section. persistence.md: operator_questions row picks up deadline_at (ttl) column with a migration note. todo.md: new 'Bugs' section captures the manager-question not-rendering issue with three suspect paths to chase. claude.md scratchpad rewritten as a clean handoff for the compaction + the upcoming config-git overhaul. flags the two-repo (proposed/ + applied/) split as the thing to reconsider.	2026-05-15 22:12:40 +02:00
müde	91c78d626f	dashboard: per-container applied agent.nix viewer new GET /api/agent-config/{name} returns the contents of /var/lib/hyperhive/applied/<name>/agent.nix — the file the container actually builds against. validated against the live container list to avoid arbitrary filesystem reads. frontend mirrors the journald viewer: collapsed <details> on each container row, lazy-fetches on expand, refresh button re-fetches. restore-keyed (agent-config:<name>) so it survives the dashboard heartbeat refresh. read-only — mutating the applied config goes through the existing request_apply_commit + operator approval flow.	2026-05-15 21:46:25 +02:00
müde	80229c6af9	manager: needs_login / logged_in / needs_update events + update tool crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).	2026-05-15 21:42:13 +02:00
müde	62d1a74929	docs sync + revert auto-unfree removal revert the earlier 'operator must set allowUnfree' move: per-agent containers evaluate their own nixpkgs and the operator's host-level allowUnfree doesn't propagate in. restoring the scoped allowUnfreePredicate inside both the claude-unstable overlay and harness-base.nix; documented in README + gotchas as 'nothing to set on the operator side'. docs: - claude.md file map adds crash_watch.rs, kick_agent on coordinator, /api/model + journald viewer + bind-with-retry references. - scratchpad rewritten to reflect the recent run. - web-ui.md: notification row + browser notifications section, state row (badge + model chip + last-turn chip + cancel button), per-agent inbox, /model slash, /cancel-question + journald endpoints, focus-preservation on refresh. - turn-loop.md: --model is read from Bus::model() per turn (runtime override via /model); recv(wait_seconds) up to 180s with the rationale; ask_operator gains ttl_seconds; new TurnState section; kick_agent inbox-on-startup hint. - approvals.md: ttl/cancel resolution paths for operator questions. - persistence.md: /state/hyperhive-model file. - gotchas.md: web UI port collision policy (rename, don't probe); bind retry + SO_REUSEADDR shape; auto-unfree restored. - todo.md: cleaned up empty sections and stale entries; /model shipped, dropped from the list.	2026-05-15 21:26:13 +02:00
müde	237b215c55	dashboard: browser notifications for operator-bound events three signals fire OS notifications: - new approval lands in the queue (per id, via /api/state delta) - new ask_operator question queued (per id) - broker message sent to operator (live via SSE) first /api/state render after page load seeds the 'seen' sets without firing — only items that arrive while the page is open count. controls in a row under the banner: 🔔 enable notifications (calls requestPermission, hides on grant), 🔕 mute / 🔔 unmute toggle (localStorage-backed so operator can silence without revoking the permission), inline status text when blocked or unsupported. notification tag='hyperhive' collapses rapid bursts; onclick focuses the dashboard tab. requires secure context (HTTPS or localhost) — on other origins the API is unavailable and the controls hide themselves. todo: entry dropped.	2026-05-15 21:10:20 +02:00
müde	a67aada7c9	todo: browser notifications for approvals / questions / operator msgs pure frontend — Notification API + existing /api/state and /messages/stream signals. Caveats: secure-context requirement (HTTPS or localhost), per-browser permission grant. Includes a sketch of the implementation: request-permission button, count deltas on refreshState, SSE hook on operator-bound sends, localStorage 'muted' toggle.	2026-05-15 21:07:21 +02:00
müde	8b9f7d21b7	model persisted to /state; stop auto-allowing claude-code unfree model persistence: /model <name> now writes to /state/hyperhive-model (in-container), Bus::new reads it on init. operator override survives harness restart and container rebuild; gone on --purge like every other piece of agent state. path overridable via HYPERHIVE_MODEL_FILE for tests. failure to persist is a warn, not fatal — runtime override still applies, just won't survive a restart. unfree opt-in: drop the auto-allowUnfreePredicate from harness-base.nix and the claude-unstable overlay. operator now has to set nixpkgs.config.allowUnfree (or a predicate listing claude-code) in their own host config. silent unfree bypass was sketchy; this is honest. readme + gotchas updated to spell out the snippet. todo: drops model-persistence + container-crash + journald (all shipped); adds per-agent send allow-list (constrain who an agent can message).	2026-05-15 21:05:40 +02:00
müde	58c3cd853b	container crash watcher → HelperEvent::ContainerCrash new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).	2026-05-15 21:02:05 +02:00
müde	6db38cf70c	model: runtime override via /model slash; fixes for port + bind - runtime model override: Bus::{model,set_model} + POST /api/model (form-encoded {model: name}). turn.rs reads bus.model() per turn so a flip lands on the next claude invocation. /api/state grows a model field; agent page shows a 'model · <name>' chip in the state row. '/model <name>' slash command POSTs to the endpoint and refreshes state. - port regression fix: agent_web_port no longer probes forward for existing agents (the previous fix shifted ports for any agent without a port file, including legacy ones whose container was already bound to the bare hashed port — dashboard rendered the new port, container was still on the old one, conn errors). new rule: port file exists → use it; absent + applied flake present → legacy, persist port_hash without probing; absent + no applied flake → fresh spawn, probe forward. - SO_REUSEADDR on both the dashboard and per-agent web UI binds via tokio::net::TcpSocket. operator hit 12 retries failing on manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly without a new dep; retry still covers the genuine process-still-alive overlap. todo: drops the model-override entry (shipped); adds two new items — model persistence (optional, future), and custom per-agent MCP tools (groundwork for moving bitburner-agent into hyperhive).	2026-05-15 20:59:45 +02:00
müde	7d93dd9db4	no nap tool — recv with long wait_seconds replaces it; max raised to 180s recv-with-timeout is strictly better than a fixed sleep because it wakes instantly on incoming messages. drop the half-written nap MCP tool, raise the recv wait_seconds cap from 60s to 180s on both agent and manager sockets. prompts updated: agent.md + manager.md now spell out the pattern — when there's nothing else useful to do, call recv with wait_seconds=180 to park the turn; do NOT use Bash sleep for the same purpose. todo drops the nap entry and the napping-state-badge follow-up; both replaced by 'just use a long recv'.	2026-05-15 20:53:15 +02:00
müde	637085644d	server-side TurnState in the harness, exposed via /api/state new TurnState { Idle, Thinking, Compacting } on hive_ag3nt::events::Bus with set_state + state_snapshot. the turn loops in hive-ag3nt and hive-m1nd flip Thinking before drive_turn and Idle after; the web_ui's /api/compact handler flips Compacting around compact_session. per-agent /api/state grows turn_state + turn_state_since (unix seconds). frontend prefers the server-reported state over the client-derived one — setStateAbs takes the absolute since-time so the 'last turn' chip reads the actual server-side duration instead of the client's perceived gap between SSE events. SSE turn_start / turn_end still drive state instantly between renders; /api/state re-anchors on each turn_end refresh. new compacting state gets its own purple badge with pulse animation (mirrors thinking's amber). napping will slot in the same way once the nap tool lands.	2026-05-15 20:46:38 +02:00
müde	0385d96bf3	dashboard: per-container journald viewer new GET /api/journal/{name}?unit=&lines= shells out journalctl -M <container> -b --no-pager --output=short-iso --lines=<N> (cap 5000). optional unit filter, restricted to hive-ag3nt.service / hive-m1nd.service so the shell-out can't be coerced into reading unrelated units. validates the container name against the live list before invoking journalctl. frontend renders a collapsed '↳ logs · <container>' details block on each container row. expanding triggers a lazy fetch; refresh button re-fetches; unit dropdown switches between the harness service (default) and the full machine journal. output sits in a 24em-tall monospace pre, auto-scrolled to the bottom on fresh fetch. hive-c0re's systemd unit already runs as root, so journalctl has the access it needs.	2026-05-15 20:42:56 +02:00
müde	754db7830e	ask_operator: ttl_seconds auto-cancel + remaining-time chip manager can pass ttl_seconds to ask_operator. on submit, host stores deadline_at = now + ttl in operator_questions (new column, migrated via existing pragma_table_info pattern), spawns a tokio task that sleeps until the deadline then resolves the question with answer '[expired]' and fires the same OperatorAnswered helper event. already-resolved races no-op silently. dashboard renders a '⏳ MM:SS' chip on the question row when deadline_at is set. format collapses seconds → s, < 1h → m s, ≥ 1h → h m. heartbeat refresh (5s) keeps the chip current; the operator sees it tick down. manager prompt + mcp tool description updated. journald viewer per container queued in todo (separate task).	2026-05-15 20:38:02 +02:00
müde	538e0446d7	agent page: inbox view of last 30 messages addressed to this agent new wire request AgentRequest::Recent { limit } / ManagerRequest::Recent (plus matching responses with Vec<InboxRow>). InboxRow moved to hive-sh4re so it lives on both surfaces without an internal-to-wire conversion. host-side dispatch in agent_server / manager_server calls broker.recent_for(name, limit). per-agent web_ui /api/state grew an inbox: Vec<InboxRow> populated via the same per-agent socket (best-effort; transport failure returns empty). frontend renders as a collapsible <details> section between the state row and the terminal — fmt timestamp / from / body in a tight grid, capped at 16em scrollable. only visible when there are rows.	2026-05-15 20:32:19 +02:00
müde	bd7d2d4860	agent page: dashboard back-link + last-turn timing chip title bar grows a '↑ DASHB04RD' link next to the rebuild button — opens the host dashboard in a new tab so the operator can pivot between agents without losing the live tail. uses the dashboardPort already plumbed via /api/state. state row picks up a 'last turn 12.3s' chip that fills in when state transitions away from thinking. format: ms / s.s / m s. hidden until the first turn completes.	2026-05-15 20:27:09 +02:00
müde	ee5b85716d	ask_operator: operator-side ✗ CANC3L on pending questions new POST /cancel-question/{id} resolves a pending operator question with the sentinel answer '[cancelled]' and fires the usual HelperEvent::OperatorAnswered so the manager sees a terminal state and can fall back. uses the same OperatorQuestions::answer path — no special handling, the manager already has to deal with arbitrary answer strings. dashboard renders the cancel as a separate <form> below the main qform so the answer-merge submit handler on the main form doesn't inadvertently fire when the operator clicks cancel. confirm dialog spells out what the manager will see. ttl-based auto-cancel is still on the todo (would spawn a tokio task per submitted question).	2026-05-15 20:25:11 +02:00
müde	bc87ff80d2	agent terminal: inline +/- diffs on Write and Edit tool calls Write and Edit tool_use rows used to render as the bare file path. now they're collapsed <details> blocks with the actual change inside — Write shows every content line prefixed '+', Edit shows old_string as '-' lines then new_string as '+' lines. summary carries the file path + counts ('→ Edit /foo · -3 +5'). lines colored via diff-add / diff-del / diff-ctx; click to expand the full body. renderFileWriteEdit returns null for any other tool so the existing flat-row path (fmtToolUse) is untouched.	2026-05-15 20:23:22 +02:00
müde	89ccc5e6c5	events.sqlite vacuum moves host-side retention is a host concern — agents have no business doing their own cleanup, and a misbehaving harness could skip it. drop spawn_events_vacuum from both hive-ag3nt and hive-m1nd, drop the matching Bus::vacuum + EventStore::vacuum methods. new hive_c0re::events_vacuum module sweeps every existing agents/<name>/state/hyperhive-events.sqlite on the same hourly cadence as the broker vacuum. same two-stage delete (older than 7 days, trim to 2000 newest). called from main alongside broker vacuum. also: server-side state badge entered into todo.md (today's badge is derived client-side from sse, fine for idle/thinking but a state machine that grows compacting/napping wants authoritative status from the harness).	2026-05-15 20:10:34 +02:00
müde	897e7c07ae	dashboard: spawn form moves under approvals; docs synced submitting R3QU3ST SP4WN immediately queues an approval that lands in the very next list. the form belonged with that list, not at the top of containers — the agent doesn't exist yet at form time anyway. docs: claude.md grows operator_questions.rs / events.rs sqlite / broker vacuum to the file map; web-ui shape lists the actual current endpoint set (per-agent cancel/compact/history, dashboard tombstone purge/answer/spawn); live-view section now describes the state badge, sticky-bottom scroll, history backfill, and the terminal- embedded prompt with its slash commands; dashboard-action-surface rewritten around the new six-section page (containers / kept-state / questions / inbox / approvals / message-flow) and the two-line container row. new 'persistence + retention' section documenting both sqlite databases and their vacuum cadences. readme picks up the new mgr mcp surface (start/restart/ask_operator) + operator-side features list + ask_operator answer flow. todo trimmed of shipped items (bigger terminal / sticky scroll / cancel button / /compact trigger / /cancel command). new entry for the two-step spawn-with-preconfig flow.	2026-05-15 20:02:54 +02:00
müde	de09503b59	events: persist to sqlite, survive harness restart hive_ag3nt::events::Bus replaces its in-memory VecDeque with a sqlite- backed store at /state/hyperhive-events.sqlite (overridable via HYPERHIVE_EVENTS_DB). emit() inserts a row; history() reads back the most recent 2000 events. survives harness restart now — operator reload mid-investigation no longer wipes the trail. vacuum runs hourly (immediate first sweep): drop rows older than 7 days, then trim to 2000 newest. two-stage so a quiet agent keeps a useful tail and a chatty one stays bounded. wired into both hive-ag3nt and hive-m1nd via spawn_events_vacuum. if the db open fails (e.g. no /state mount in dev), Bus runs in no-store mode — events still broadcast, just nothing persisted.	2026-05-15 19:42:57 +02:00
müde	6d52f67292	broker: hourly vacuum of delivered messages older than 30 days undelivered rows are always kept regardless of age (still in flight). sweep runs immediately on serve start then every hour. logs row count when non-zero. keep_secs is hard-coded for now (30 days); can be config-driven later if a host wants to retain more / less for audit.	2026-05-15 19:40:38 +02:00
müde	a9ed33d94f	todo: trim state-badge entry to what's left (compacting/napping)	2026-05-15 19:36:42 +02:00
müde	0cc25d33d8	drop debug-only cli subcommands from hive-ag3nt + hive-m1nd drop the one-shot send/recv/kill/start/restart/request-spawn/request- apply-commit subcommands from both in-container binaries. they were debug-only — the host admin socket (`hive-c0re ...`) exposes the same verbs and the manager mcp surface covers the rest from inside claude. now each binary's --help shows just `serve` and `mcp`, which are the only commands either is meant to be started with. removes the `one_shot` helper and the `render` / `check` glue.	2026-05-15 19:34:58 +02:00
müde	48ebfefd1a	destroy --purge: also wipe agent state dirs new --purge flag on the destroy verb (cli + admin socket + dashboard). default destroy still keeps /var/lib/hyperhive/{agents,applied}/<name>/ so recreating with the same name reuses prior config + creds. with --purge, both dirs go too (config history, claude creds, /state/ notes). no undo. dashboard adds a separate PURG3 button with an explicit confirmation copy; the existing DESTR0Y button keeps the soft semantics. claude.md dashboard-action-surface section updated; todo entry dropped.	2026-05-15 19:29:14 +02:00
müde	3f2aba4adc	todo: parity gaps vs bitburner-agent — state badge, slash cmds, stats, nap, viz polish, persistent event history	2026-05-15 19:14:35 +02:00
müde	2770630f33	ask_operator tool: non-blocking; operator answer arrives as helper event new mcp tool on the manager surface that queues a question on the dashboard and returns the question id immediately. operator submits an answer via /answer-question/<id>; the dashboard fires HelperEvent::OperatorAnswered { id, question, answer } into the manager inbox so the next turn picks it up. also: fix async-form button stuck on spinner after successful submit (refreshState skipped re-rendering, so the button was never re-enabled).	2026-05-15 18:44:42 +02:00
müde	abfd2cce4b	docs: refresh CLAUDE.md for system-prompt-file, helper events, dashboard buttons, ui shape; TODO.md drop operator-inbox (done)	2026-05-15 18:25:14 +02:00
müde	6e75d8e6db	manager: don't trust agents on config asks; sketch ask_operator tool in TODO	2026-05-15 18:06:01 +02:00
müde	ff8f8c7c56	per-agent /state dir for durable notes; manager sees them via /agents	2026-05-15 18:00:08 +02:00
müde	070b237d03	docs: SPA pattern noted, todo cleared; harness-base git config mkDefault programs.git.config.user.{name,email} in harness-base.nix now mkDefault so the per-agent applied flake's override merges without mkForce.	2026-05-15 17:17:48 +02:00
müde	970f645461	docs: README + TODO split; trim CLAUDE.md; fix async form 415	2026-05-15 16:41:15 +02:00

37 commits