hyperhive

Author	SHA1	Message	Date
müde	e3b5837378	todo: security section — privsep + state-file hardening	2026-05-17 22:13:18 +02:00
müde	b3970c439c	todo: drop landed-dashboard comments	2026-05-17 22:10:32 +02:00
müde	1db6b8ffed	dashboard: queued reminders surface new 'qu3u3d r3m1nd3rs' section between approvals and operator inbox. lists every pending reminder with agent, due-relative timestamp, body, payload path (path-linkified), and a cancel button. drives off a new /api/reminders endpoint and a POST /cancel-reminder/{id} that hard-deletes the row. failure surface (last_error / attempt_count + retry) deferred — needs a sqlite migration; tracked in TODO.md.	2026-05-17 22:10:02 +02:00
müde	cb71a07300	dashboard: clickable file-path previews agents constantly emit pointer strings to /agents/<n>/state/foo.md since broker bodies cap at 1 KiB. now those tokens linkify in the message flow, question bodies, answer text, and operator inbox; clicking expands an inline <details> that lazy-fetches via the new /api/state-file?path=... endpoint. endpoint allow-list: per-agent state dirs + shared docs, both in their container-mount form (/agents/<n>/state, /shared) and host form (/var/lib/hyperhive/...). 1 MiB read cap; canonicalises before the prefix check so `..` / symlinks can't escape. legacy bare `/state/...` is deliberately not matched — ambiguous from the host's perspective (we'd need to know which agent the message references to translate). agents should use the qualified form going forward.	2026-05-17 22:08:15 +02:00
müde	a15fafb5de	dashboard: surface peer questions + operator override questions pane now shows both operator-targeted threads (target IS NULL) and agent-to-agent threads (target = some agent). filter chips above the list: all / @operator / @peer / per-participant. peer rows get a mauve left rule + a 0V3RR1D3 button that POSTs the same /answer-question endpoint (OperatorQuestions::answer already permits the operator as answerer on any target). wire changes: OperatorQuestions gains pending_all + recent_answered_all; QuestionAdded + QuestionResolved events carry target: Option<String>; emit sites drop their target.is_none() guard. answered-history rows show the answerer prefix so override answers are auditable at a glance.	2026-05-17 22:06:53 +02:00
müde	e7ce35c503	phase 6: container events + drop the 5s /api/state poll new DashboardEvent::ContainerStateChanged + ContainerRemoved close the last refetch loop on the dashboard. Coordinator's rescan_containers_and_emit diffs a fresh container_view::build_all against a cached last_containers map and fires per-row events. called from actions::approve (post-spawn), actions::destroy, the lifecycle_action wrapper, auto_update::rebuild_agent, and the existing 10s crash_watch poll. ContainerView extracted to its own module so coordinator and dashboard can both build it. dashboard endpoints flip to 200; container-lifecycle forms carry data-no-refresh. client drops the periodic poll entirely — initial cold load + SSE for everything afterwards. pending overlay reads from the existing transientsState since the new event payload doesn't carry it. PURG3 + meta-update keep the post-submit refetch since tombstones + meta_inputs aren't event-derived yet; tracked in TODO.md.	2026-05-17 22:01:15 +02:00
damocles	c423ce9e39	todo: lock down get_open_threads scope (asker + target questions)	2026-05-17 14:43:08 +02:00
müde	e4438d1a6e	todo: phase 6 event-covered redirects converted	2026-05-17 14:27:03 +02:00
müde	62784d4933	todo: prune resolved items	2026-05-17 14:22:47 +02:00
müde	88a1f4c146	todo: mark phase 5b done; note remaining phase 6 conversions now unblocked	2026-05-17 14:21:12 +02:00
damocles	291f1fce42	todo: clickable file paths in dashboard message bodies	2026-05-17 13:20:33 +02:00
damocles	82b0877c47	ask: rename ask_operator → ask + optional 'to' for agent-to-agent Q&A	2026-05-17 13:20:32 +02:00
müde	87f8f8a123	todo: phase 5b — mutation events for approvals/questions/transients	2026-05-17 13:15:32 +02:00
müde	b60774a66c	events: LiveEvent::Note becomes struct variant so serde can actually serialize it	2026-05-17 13:14:09 +02:00
müde	d48cee7c2d	approvals: ship raw diff text instead of pre-rendered html; client classifies per-line	2026-05-17 12:30:45 +02:00
damocles	1770b51845	manager mcp: expose 'remind' tool sharing storage helper with agent surface	2026-05-17 11:43:14 +02:00
damocles	3da6912e73	todo: open-threads list also rendered on the per-agent web ui	2026-05-17 11:20:01 +02:00
damocles	0c606fd2dd	todo: post-rebuild missed-wake bug + ask rename + open-threads tracker	2026-05-17 11:20:01 +02:00
damocles	6ce85bd6f2	reminder: file_path delivery + extract scheduler into own module	2026-05-17 11:05:29 +02:00
damocles	f2484b5e78	agent mcp: expose 'remind' tool for self-scheduled wakes	2026-05-17 10:54:36 +02:00
damocles	271c524e66	agent_server: reminder body size cap + extract Remind/AskOperator handlers	2026-05-17 02:59:51 +02:00
damocles	dba3badeae	todo: mark orphan-reminder + unbounded-batch items as fixed	2026-05-17 02:59:51 +02:00
damocles	e45d161cb8	todo: mark recv_blocking race bug as fixed	2026-05-17 02:59:51 +02:00
müde	9fc7cae132	prompts: tell agents + manager about the code forge; todo: shared docs repo system prompts now describe the hyperhive Forgejo at localhost:3000, the per-agent user, the pre-configured tea CLI, and the REST API fallback with /state/forge-token. todo gains the shared docs/skills RO-repo follow-up (org-shared + per-agent read membership).	2026-05-16 23:36:05 +02:00
damocles	4ceae6cf67	todo: add bug - pending message wake-up issue	2026-05-16 13:43:41 +02:00
damocles	3642ae1a61	todo: add dashboard ui for pending reminders	2026-05-16 13:40:15 +02:00
damocles	a57e500f48	todo: add multi-agent restart coordination item	2026-05-16 13:14:17 +02:00
damocles	3b8cdc7e20	todo: add broadcast messaging feature	2026-05-16 13:07:31 +02:00
damocles	24eec69418	fix reminder tool issues: error on time overflow, optimize scheduler query	2026-05-16 13:00:56 +02:00
damocles	bc27113967	docs: add hyperhive feature TODOs	2026-05-16 12:52:08 +02:00
müde	67e4242b9f	per-agent send allow-list via hyperhive.allowedRecipients new NixOS option in harness-base.nix: hyperhive.allowedRecipients = [ 'alice' 'manager' ]; # whitelist hyperhive.allowedRecipients = [ ]; # default = unrestricted module writes the list as JSON to /etc/hyperhive/send-allow .json at activation. AgentServer::send reads the file before issuing the broker request; if the list is non-empty and `to` isn't on it, the tool returns a claude-readable refusal string without touching the broker. the manager is always implicitly permitted regardless of the list — otherwise a misconfigured allow-list could strand a sub-agent without an escalation path. enforcement is in the in-container MCP server (not on the host's per-agent socket) because the agent's nix config is the trust boundary anyway — the operator audits agent.nix at deploy time, the activation-time /etc/hyperhive/send-allow .json is r/o under /nix/store, so the agent can't tamper at runtime without going through a new approval. agent prompt mentions the option + tells claude to route through the manager when refused. retires the matching TODO under Permissions / policy.	2026-05-16 03:59:28 +02:00
müde	d1c69b134a	dashboard: reorder sections into grouped sequence after reverting the 3-column attempt (`74ba8a6`), keep the single-column layout but put related sections adjacent: swarm: containers → kept-state → meta-inputs decisions: questions → approvals messages: operator-inbox → message-flow + compose this is a free improvement — the operator scrolls through one logical group at a time instead of bouncing between swarm / decisions / messages mid-page. follow-up improvements (collapsing rarely-active sections, multi-column at wide viewports done less aggressively) captured in TODO under 'Dashboard layout overhaul'.	2026-05-16 03:54:53 +02:00
müde	40938d8b54	dashboard: surface silent unwrap_or_default in api_state every snapshot source backing /api/state used .unwrap_or_default() — sqlite errors, broker errors, nixos-container list failures, operator_questions decode crashes all degraded to empty lists without a log line. the 'pending question doesn't render' bug we've been chasing was likely a row-decode panic in OperatorQuestions::pending() being swallowed this way. new log_default(what, result) replaces each call site: same default value on Err but emits target=api_state warn with the source name + dbg error first. five sources covered: nixos-container list, approvals.pending, approvals.recent_resolved, broker.recent_for(operator), questions.pending. next time the question goes missing the journal will say which source failed and how. todo updated — pending-question entry now points at the new log instead of three suspect paths.	2026-05-16 03:49:49 +02:00
müde	06af23c8a4	recv: None = peek, positive value = opt-in long-poll old behavior: omitted wait_seconds fell through to the 30s RECV_LONG_POLL_DEFAULT — claude calling 'is there anything in my inbox right now?' between actions blocked the turn for half a minute. flip the semantics: None (or 0) returns immediately, positive value parks up to MAX (180s, unchanged). cleaner 'peek vs wait' distinction; tool descriptions + agent/manager prompts updated to point at the new shape. harness's own serve loops in hive-ag3nt + hive-m1nd relied on the old default for their inbox poll. they now explicitly pass wait_seconds: Some(180) to opt into the full park — same effective behavior as before, just spelled out. retires the matching TODO under Turn loop.	2026-05-16 03:22:42 +02:00
müde	96cb9f84c9	dashboard: approval history tab on P3NDING APPR0VALS new tabs above the approvals list: 'pending · N' and 'history · M'. active tab persists in localStorage so the operator can park on history if they prefer. on a fresh dashboard the default is pending (matches the prior shape). history view shows the last 30 resolved approvals — newest first by resolved_at — with one row per approval: status glyph (✓ approved / ✗ denied / ⚠ failed), id, agent, kind, short sha, status label, and a relative time chip. when the row has a note (deny reason or build error), it renders below in a muted block with line wraps preserved. backend: Approvals::recent_resolved(limit) queries by status IN ('approved', 'denied', 'failed') ORDER BY resolved_at DESC. StateSnapshot gets approval_history (a lean ApprovalHistoryView without diff_html — rendering 30 git diffs per state poll would be expensive and the operator already saw the diff at decision time). dashboard's history_view fn projects the sqlite row. retires the matching TODO entry.	2026-05-16 03:07:50 +02:00
müde	c2bf0aa4f1	todo: approval history tab; retire streaming-output entry new entry under UI/UX for an approval history tab on the P3NDING APPR0VALS section — sqlite already has every row + the applied repo's annotated denied/failed tags carry the human-readable reasons, so this is a render-side change. retire the 'stream nixos-container stdout' entry — landed in `6f1b664`. run() now pipes child output line-by-line into tracing so 'slow build' no longer looks like 'wedged daemon'.	2026-05-16 02:59:02 +02:00
müde	7d6d8e96c1	per-agent extra MCP servers via hyperhive.extraMcpServers new NixOS option in harness-base.nix: hyperhive.extraMcpServers.<key> = { command = "/path/to/server"; args = [ ... ]; env = { KEY = "value"; }; allowedTools = [ "send_message" "join_room" ]; # or [""] }; declared as attrsOf submodule so agents stack arbitrarily many. the module writes the whole map as JSON to /etc/hyperhive/extra-mcp.json at activation; the harness reads that file in mcp::render_claude_config and merges each entry into the rendered --mcp-config under its own mcpServers.<key> block. allowed_mcp_tools(flavor) extends the --allowedTools arg with mcp__<key>__<pattern> for every entry — "" (the default) becomes mcp__<key>__* so every tool from that server is auto-approved, or pass a concrete list to tighten. collision guard: an extra server keyed "hyperhive" is dropped with a warn-log so user config can't shadow the built-in surface. malformed JSON / missing file fall back to "no extras" silently. prompt note added: agents see "(some agents only) extra MCP tools surfaced as mcp__<server>__<tool>" and learn they're declared via agent.nix. retires the matching TODO under Per-agent extension. matrix-chat agents + bitburner-agent migration unblocked.	2026-05-16 02:10:11 +02:00
müde	6b3ef4549c	manager_server: reject proposals that modify flake.nix submit_apply_commit now diffs the freshly-tagged proposal/<id> against applied/main and refuses if flake.nix is in the changeset. flake.nix is fixed boilerplate the meta flake depends on (it exports nixosModules.default = import ./agent .nix); silent edits there would break the nixosConfiguration in subtle ways. the manager prompt already says don't touch it; this is the host-side belt — clear error to the manager on submit, row marked failed in sqlite, no orphan pending approval to chase. diff-failure is logged + ignored: the build path surfaces concrete errors if flake.nix is actually broken.	2026-05-16 01:42:11 +02:00
müde	68ef6ab433	todo: stream nixos-container output so slow != stuck surfaced by a real hang investigation today — lifecycle::run uses .output() which buffers stdout/stderr until exit, so a multi-minute nix build through nixos-container update looks identical to a wedged daemon. line-buffered streaming into tracing (and ideally the per-agent live event bus when the agent is known) makes 'still building, just slow' visible without strace gymnastics.	2026-05-16 01:38:02 +02:00
müde	65bdde898e	todo: tag retention, flake.nix tamper-check, sync_agents nix call three things surfaced by the meta-flake overhaul + the nix CLI deprecation we just fixed worth tracking explicitly. extend the web-UI-for-config-repos entry to also cover the /meta deploy log now that meta's git history is the swarm-wide audit trail.	2026-05-16 01:21:27 +02:00
müde	df9da4d6e1	todo: recv default should not sleep, agent opts into wait	2026-05-15 23:00:25 +02:00
müde	497cd15137	docs: tag-driven config-apply plan + migration story scratchpad in claude.md marks this as in-flight; docs/approvals.md gets the new tag state machine (proposal/approved/building/deployed/ failed/denied) and the manager applied.git read-only mount. todo picks up the unprivileged-containers git-identity caveat and a web ui for config repos as a downstream follow-up.	2026-05-15 22:43:47 +02:00
müde	75e7faff0c	docs: full sync ahead of compaction + config-management overhaul readme: manager mcp surface picks up update; operator-surface recap mentions /model + last-turn + model chip + the three collapsibles (inbox / journald / agent.nix). web-ui.md: details-restore-key story under shape; port-conflict banner mention on containers; agent.nix viewer alongside journald; notifications use per-event tags + console.debug log on block/show; deny endpoint takes note=<reason>; data-prompt / data-prompt-field generalisation noted. conventions.md: data-prompt and snapshot/restoreOpenDetails added to the async-forms section. persistence.md: operator_questions row picks up deadline_at (ttl) column with a migration note. todo.md: new 'Bugs' section captures the manager-question not-rendering issue with three suspect paths to chase. claude.md scratchpad rewritten as a clean handoff for the compaction + the upcoming config-git overhaul. flags the two-repo (proposed/ + applied/) split as the thing to reconsider.	2026-05-15 22:12:40 +02:00
müde	91c78d626f	dashboard: per-container applied agent.nix viewer new GET /api/agent-config/{name} returns the contents of /var/lib/hyperhive/applied/<name>/agent.nix — the file the container actually builds against. validated against the live container list to avoid arbitrary filesystem reads. frontend mirrors the journald viewer: collapsed <details> on each container row, lazy-fetches on expand, refresh button re-fetches. restore-keyed (agent-config:<name>) so it survives the dashboard heartbeat refresh. read-only — mutating the applied config goes through the existing request_apply_commit + operator approval flow.	2026-05-15 21:46:25 +02:00
müde	80229c6af9	manager: needs_login / logged_in / needs_update events + update tool crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).	2026-05-15 21:42:13 +02:00
müde	62d1a74929	docs sync + revert auto-unfree removal revert the earlier 'operator must set allowUnfree' move: per-agent containers evaluate their own nixpkgs and the operator's host-level allowUnfree doesn't propagate in. restoring the scoped allowUnfreePredicate inside both the claude-unstable overlay and harness-base.nix; documented in README + gotchas as 'nothing to set on the operator side'. docs: - claude.md file map adds crash_watch.rs, kick_agent on coordinator, /api/model + journald viewer + bind-with-retry references. - scratchpad rewritten to reflect the recent run. - web-ui.md: notification row + browser notifications section, state row (badge + model chip + last-turn chip + cancel button), per-agent inbox, /model slash, /cancel-question + journald endpoints, focus-preservation on refresh. - turn-loop.md: --model is read from Bus::model() per turn (runtime override via /model); recv(wait_seconds) up to 180s with the rationale; ask_operator gains ttl_seconds; new TurnState section; kick_agent inbox-on-startup hint. - approvals.md: ttl/cancel resolution paths for operator questions. - persistence.md: /state/hyperhive-model file. - gotchas.md: web UI port collision policy (rename, don't probe); bind retry + SO_REUSEADDR shape; auto-unfree restored. - todo.md: cleaned up empty sections and stale entries; /model shipped, dropped from the list.	2026-05-15 21:26:13 +02:00
müde	237b215c55	dashboard: browser notifications for operator-bound events three signals fire OS notifications: - new approval lands in the queue (per id, via /api/state delta) - new ask_operator question queued (per id) - broker message sent to operator (live via SSE) first /api/state render after page load seeds the 'seen' sets without firing — only items that arrive while the page is open count. controls in a row under the banner: 🔔 enable notifications (calls requestPermission, hides on grant), 🔕 mute / 🔔 unmute toggle (localStorage-backed so operator can silence without revoking the permission), inline status text when blocked or unsupported. notification tag='hyperhive' collapses rapid bursts; onclick focuses the dashboard tab. requires secure context (HTTPS or localhost) — on other origins the API is unavailable and the controls hide themselves. todo: entry dropped.	2026-05-15 21:10:20 +02:00
müde	a67aada7c9	todo: browser notifications for approvals / questions / operator msgs pure frontend — Notification API + existing /api/state and /messages/stream signals. Caveats: secure-context requirement (HTTPS or localhost), per-browser permission grant. Includes a sketch of the implementation: request-permission button, count deltas on refreshState, SSE hook on operator-bound sends, localStorage 'muted' toggle.	2026-05-15 21:07:21 +02:00
müde	8b9f7d21b7	model persisted to /state; stop auto-allowing claude-code unfree model persistence: /model <name> now writes to /state/hyperhive-model (in-container), Bus::new reads it on init. operator override survives harness restart and container rebuild; gone on --purge like every other piece of agent state. path overridable via HYPERHIVE_MODEL_FILE for tests. failure to persist is a warn, not fatal — runtime override still applies, just won't survive a restart. unfree opt-in: drop the auto-allowUnfreePredicate from harness-base.nix and the claude-unstable overlay. operator now has to set nixpkgs.config.allowUnfree (or a predicate listing claude-code) in their own host config. silent unfree bypass was sketchy; this is honest. readme + gotchas updated to spell out the snippet. todo: drops model-persistence + container-crash + journald (all shipped); adds per-agent send allow-list (constrain who an agent can message).	2026-05-15 21:05:40 +02:00
müde	58c3cd853b	container crash watcher → HelperEvent::ContainerCrash new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).	2026-05-15 21:02:05 +02:00

1 2

73 commits