hyperhive

Author	SHA1	Message	Date
damocles	24b10becc9	get_logs: resolve the broker-logical 'manager' alias to the hm1nd machine	2026-05-20 10:48:24 +02:00
damocles	0a79912b67	get_logs: resolve machine name via container_name like every other verb	2026-05-20 10:48:24 +02:00
müde	5aad2d67e1	forge: mirror applied config repos to a private agent-configs org on startup (and after every applied-repo ref mutation) core pushes each agent's hive-c0re-owned applied repo — main plus every proposal/approved/building/deployed/failed/denied tag — to agent-configs/<name> on the local forge. the org is private and agents are not members, so core is the only principal that can read it. the tokenised push url is passed inline, never stored as a named remote: the applied repo is bind-mounted read-only into the manager, so a token in .git/config would leak the core admin credential to an agent. push_config is best-effort at every site (ensure_all, spawn, approve, deny, submit) — a missing or down forge never blocks a deploy.	2026-05-20 10:24:50 +02:00
damocles	f8795dc029	fix: request_apply_commit resolves sha locally + rejects non-sha refs	2026-05-20 09:48:05 +02:00
damocles	5d27ae3048	recv: fold batch drain into recv(max) — one tool, uniform list response	2026-05-19 01:07:30 +02:00
damocles	77b89bf2c6	broker: recv_batch(max) — drain a bursty inbox in one round-trip	2026-05-19 00:47:21 +02:00
damocles	f9f1346eae	clippy: zero pedantic warnings across the tree	2026-05-18 22:09:34 +02:00
damocles	690cb5ab5b	broker: lease-style delivery — ack_turn + requeue_inflight close the no-drop loop	2026-05-18 22:01:48 +02:00
damocles	6e23d087d2	rename: open_threads → loose_ends + cancel_thread → cancel_loose_end across wire / tools / web ui	2026-05-18 18:24:09 +02:00
damocles	b1d0a62cb9	cancel_thread: new mcp tool — unify reminder + question cancel on both surfaces	2026-05-18 18:24:09 +02:00
damocles	d395bdc945	whoami: drop operator_pronouns (redundant — already in system prompts at boot)	2026-05-18 00:04:58 +02:00
damocles	3c66cb6707	whoami: new mcp tool returning name/role/pronouns/hyperhive_rev on both surfaces	2026-05-18 00:04:58 +02:00
müde	8f5752980f	turn_stats: per-turn analytics sink new sqlite table at /state/hyperhive-turn-stats.sqlite on each agent's state dir. one row per claude turn captures identity (model, wake_from, result_kind), timing (started/ended_at, duration_ms), cost (input/output/cache_read/cache_creation token counts), behaviour (tool_call_count + per-tool breakdown JSON), and post-turn snapshot metrics (open_threads_count, open_reminders_count). wire additions: - AgentRequest/ManagerRequest::CountPendingReminders + Broker::count_pending_reminders_for(agent) - Bus::observe_stream + take_tool_calls — pumps the existing stdout stream-json, picks out tool_use blocks, accumulates per turn. bin loops fold the breakdown into each row. - TurnStats::open_default + TurnStatRow + record() — best-effort inserts; failures log + don't block the harness. both ag3nt and m1nd bins capture started_at + duration via Instant::elapsed, fetch open-thread + reminder counts from hive-c0re via the existing socket (post-turn, best-effort), and record one row at turn_end. record_kind splits ok / failed / prompt_too_long; failures carry the error message in note. todo entries for host-side vacuum sweep + reading the table back into agent/dashboard badges.	2026-05-17 23:00:41 +02:00
damocles	dc1ce1f236	open_threads: new get_open_threads MCP tool on agent + manager surfaces	2026-05-17 22:52:08 +02:00
müde	a15fafb5de	dashboard: surface peer questions + operator override questions pane now shows both operator-targeted threads (target IS NULL) and agent-to-agent threads (target = some agent). filter chips above the list: all / @operator / @peer / per-participant. peer rows get a mauve left rule + a 0V3RR1D3 button that POSTs the same /answer-question endpoint (OperatorQuestions::answer already permits the operator as answerer on any target). wire changes: OperatorQuestions gains pending_all + recent_answered_all; QuestionAdded + QuestionResolved events carry target: Option<String>; emit sites drop their target.is_none() guard. answered-history rows show the answerer prefix so override answers are auditable at a glance.	2026-05-17 22:06:53 +02:00
müde	1879b2f485	dashboard: question_added / question_resolved mutation events + client derived state	2026-05-17 13:33:02 +02:00
müde	56d615b51f	dashboard: approval_added / approval_resolved mutation events + client derived state	2026-05-17 13:30:25 +02:00
damocles	82b0877c47	ask: rename ask_operator → ask + optional 'to' for agent-to-agent Q&A	2026-05-17 13:20:32 +02:00
damocles	1770b51845	manager mcp: expose 'remind' tool sharing storage helper with agent surface	2026-05-17 11:43:14 +02:00
damocles	0e6bac8388	limits: unified 1 KiB cap on send/ask + reminder auto-file on overflow	2026-05-17 11:36:12 +02:00
müde	411cf86632	nix fmt + rustfmt sweep	2026-05-17 01:40:28 +02:00
damocles	1023acf69f	add get_logs tool to manager mcp surface	2026-05-16 20:45:19 +02:00
müde	313121a6e9	fix: transient state leak via RAII guard bare set_transient/clear_transient pairs leak the in-memory transient on task cancellation, panics, or any early return between the two calls — dashboard then shows the agent stuck in 'rebuilding…' forever (coder hit this today). add Coordinator::transient_guard returning a TransientGuard whose Drop clears, and convert every caller (dashboard lifecycle_action, auto_update::rebuild_agent, manager_server Update, actions::destroy, actions Spawn task, migrate phase 4). destroy() now takes &Arc<Coordinator> so it can hold a guard. existing stuck transients clear on next hive-c0re restart since transient state is in-memory only.	2026-05-16 19:47:52 +02:00
damocles	1a36c38a54	fix broadcast send for manager, deduplicate into coordinator.broadcast_send	2026-05-16 19:31:53 +02:00
damocles	4a8a668348	feat: add optional description to request_apply_commit and request_spawn	2026-05-16 15:18:32 +02:00
müde	06af23c8a4	recv: None = peek, positive value = opt-in long-poll old behavior: omitted wait_seconds fell through to the 30s RECV_LONG_POLL_DEFAULT — claude calling 'is there anything in my inbox right now?' between actions blocked the turn for half a minute. flip the semantics: None (or 0) returns immediately, positive value parks up to MAX (180s, unchanged). cleaner 'peek vs wait' distinction; tool descriptions + agent/manager prompts updated to point at the new shape. harness's own serve loops in hive-ag3nt + hive-m1nd relied on the old default for their inbox poll. they now explicitly pass wait_seconds: Some(180) to opt into the full park — same effective behavior as before, just spelled out. retires the matching TODO under Turn loop.	2026-05-16 03:22:42 +02:00
müde	3db33b0fe5	agent flake.nix: forward inputs as flakeInputs module arg new boilerplate wraps agent.nix as a sub-module + passes every flake input (minus self) through to it via _module.args.flake Inputs. manager edits the inputs block of flake.nix to pull in out-of-tree flakes (MCP servers etc.) and references them in agent.nix as flakeInputs.<name>.packages.${pkgs.system}.default — the new input's pinned sha lands in the agent's own flake .lock (already tracked + part of the proposal flow), and transitively rolls up into meta's lock. migrate's MODULE_FLAKE_MARKER swaps to _module.args.flakeInputs so existing agents on the old 'nixosModules.default = import ./agent.nix' template get re-rendered onto the new shape on next hive-c0re start. manager_server's flake.nix tamper-check goes away — the build path's failed/<id> annotated tag already provides the safety net when a manager edit breaks the flake; enforcing 'no flake.nix edits at all' was overly strict (blocks the inputs- addition pattern that's the whole point of this change). manager prompt updated with a worked example for adding an MCP-server flake input + wiring it through agent.nix.	2026-05-16 02:23:43 +02:00
müde	2a6d084718	ask_operator: any agent can call it, answer routes by asker new AgentRequest::AskOperator + AgentResponse::QuestionQueued on the per-agent socket — same shape as the manager flavor, agent gets the same wire surface (still uses the same operator_questions table). agent_server::dispatch wires AskOperator through coord .questions.submit(agent, ...) so the row's asker is the sub-agent name; the ttl watchdog already in manager_server gets shared and spawn_question_watchdog goes pub. answer routing: operator_questions::answer now returns (question, asker). post_answer_question + post_cancel_question + the watchdog fire OperatorAnswered through new coord.notify_agent(asker, event) instead of always notify_manager — the event lands in whichever agent originally asked. notify_manager is now a thin wrapper. agent socket plumbing: agent_server::start takes Arc<Coordinator> instead of Arc<Broker> so dispatch has access to questions + notify path; coordinator::{register_agent,ensure_runtime} take self: &Arc<Self>. mcp::AgentServer grows the ask_operator tool; allowed_mcp_tools(Agent) adds it; prompts/agent.md replaces the 'message the manager to ask the operator' guidance with the direct tool description.	2026-05-16 01:48:10 +02:00
müde	6b3ef4549c	manager_server: reject proposals that modify flake.nix submit_apply_commit now diffs the freshly-tagged proposal/<id> against applied/main and refuses if flake.nix is in the changeset. flake.nix is fixed boilerplate the meta flake depends on (it exports nixosModules.default = import ./agent .nix); silent edits there would break the nixosConfiguration in subtle ways. the manager prompt already says don't touch it; this is the host-side belt — clear error to the manager on submit, row marked failed in sqlite, no orphan pending approval to chase. diff-failure is logged + ignored: the build path surfaces concrete errors if flake.nix is actually broken.	2026-05-16 01:42:11 +02:00
müde	35b0edaf27	manager_server: fetch+tag at request_apply_commit submit submit_apply_commit (1) queues the approval row, (2) git-fetches the manager-supplied sha from proposed into applied, pins it as refs/tags/proposal/<id>, (3) persists the resolved sha on the row via approvals.set_fetched_sha. from this point on the proposal is immutable from the manager's perspective: amends or force-pushes in proposed do not change what hive-c0re will build. fetch failures mark the row failed and surface the error to the manager so a phantom pending entry can't linger.	2026-05-15 22:57:43 +02:00
müde	80229c6af9	manager: needs_login / logged_in / needs_update events + update tool crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).	2026-05-15 21:42:13 +02:00
müde	7d93dd9db4	no nap tool — recv with long wait_seconds replaces it; max raised to 180s recv-with-timeout is strictly better than a fixed sleep because it wakes instantly on incoming messages. drop the half-written nap MCP tool, raise the recv wait_seconds cap from 60s to 180s on both agent and manager sockets. prompts updated: agent.md + manager.md now spell out the pattern — when there's nothing else useful to do, call recv with wait_seconds=180 to park the turn; do NOT use Bash sleep for the same purpose. todo drops the nap entry and the napping-state-badge follow-up; both replaced by 'just use a long recv'.	2026-05-15 20:53:15 +02:00
müde	f65ee88269	recv: optional wait_seconds parameter, capped at 60s AgentRequest::Recv and ManagerRequest::Recv grow an optional wait_seconds field (default None → 30s, capped at 60s server-side). agent_server / manager_server clamp via recv_timeout(). MCP tool schemas advertise the param so claude can pick its own poll window — useful when an agent wants to throttle wakes without entering a distinct nap state. both harness loops still pass None, keeping the existing 30s default behaviour for system-level Recvs.	2026-05-15 20:49:33 +02:00
müde	754db7830e	ask_operator: ttl_seconds auto-cancel + remaining-time chip manager can pass ttl_seconds to ask_operator. on submit, host stores deadline_at = now + ttl in operator_questions (new column, migrated via existing pragma_table_info pattern), spawns a tokio task that sleeps until the deadline then resolves the question with answer '[expired]' and fires the same OperatorAnswered helper event. already-resolved races no-op silently. dashboard renders a '⏳ MM:SS' chip on the question row when deadline_at is set. format collapses seconds → s, < 1h → m s, ≥ 1h → h m. heartbeat refresh (5s) keeps the chip current; the operator sees it tick down. manager prompt + mcp tool description updated. journald viewer per container queued in todo (separate task).	2026-05-15 20:38:02 +02:00
müde	538e0446d7	agent page: inbox view of last 30 messages addressed to this agent new wire request AgentRequest::Recent { limit } / ManagerRequest::Recent (plus matching responses with Vec<InboxRow>). InboxRow moved to hive-sh4re so it lives on both surfaces without an internal-to-wire conversion. host-side dispatch in agent_server / manager_server calls broker.recent_for(name, limit). per-agent web_ui /api/state grew an inbox: Vec<InboxRow> populated via the same per-agent socket (best-effort; transport failure returns empty). frontend renders as a collapsible <details> section between the state row and the terminal — fmt timestamp / from / body in a tight grid, capped at 16em scrollable. only visible when there are rows.	2026-05-15 20:32:19 +02:00
müde	2413d664a1	agents get a kickoff inbox message on start/restart/rebuild new Coordinator::kick_agent(name, reason) drops a system message into the agent's inbox so the next turn picks it up with a 'you were just (re)started, check /state/ for notes, --continue session is intact' hint. wakes the turn loop without any harness-side handling needed — it's just another inbox message with sender = 'system'. wired from: - dashboard /start /restart /rebuild handlers (via lifecycle_action's on-success tail) - manager mcp_hyperhive_start / restart dashboard: pending approvals + tombstones + questions now refresh on a 5s heartbeat when nothing else is happening. previously refresh only fired on async-form submit or on broker traffic addressed to operator — manager-queued approvals went through neither, so the operator had to reload to see them. 5s is the slow-path; 2s remains for in-flight transients.	2026-05-15 20:19:36 +02:00
müde	8344dd9ab7	ask_operator: multi-select + free-text fallback ask_operator now accepts a multi: bool. when true and options is non-empty, the dashboard renders the choices as checkboxes — operator picks any subset, answer comes back as a ', '-joined string. when false (default), options are radio buttons. independent of multi, a free-text input ('or type your own…') is always rendered alongside options so the operator is never trapped by an incomplete list. submit merges checked options + free text into the single 'answer' field. schema migration: operator_questions grows a multi INTEGER column with a one-shot ALTER TABLE on open. backward compatible — old rows default to 0 (not multi). prompt + mcp tool description updated; existing dashboard css for .qform was rewritten around the new vertical layout.	2026-05-15 19:52:44 +02:00
müde	ac1b5fde8e	manager: start/restart at will, no approval; refuse self new manager tools mcp__hyperhive__{start,restart} that delegate to the existing lifecycle::start / lifecycle::restart on the host. kill was already at the manager's discretion; rounding out start + restart for parity so day-to-day container care doesn't have to round-trip through the operator. guard: refuse self-targeting on kill/start/restart — the manager would just be cutting its own legs. spawn (request_spawn) and config changes (request_apply_commit) still go through the approval queue, since those are the actual gate. prompt + claude.md updated to make the boundary explicit. kill now also emits HelperEvent::Killed (it didn't before).	2026-05-15 18:57:25 +02:00
müde	2770630f33	ask_operator tool: non-blocking; operator answer arrives as helper event new mcp tool on the manager surface that queues a question on the dashboard and returns the question id immediately. operator submits an answer via /answer-question/<id>; the dashboard fires HelperEvent::OperatorAnswered { id, question, answer } into the manager inbox so the next turn picks it up. also: fix async-form button stuck on spinner after successful submit (refreshState skipped re-rendering, so the button was never re-enabled).	2026-05-15 18:44:42 +02:00
müde	06ea0cf283	operator inbox view on dashboard; agent ui doesn't clobber typing	2026-05-15 17:23:53 +02:00
müde	dfbcf2b9d1	agents wake on send: broker.recv_blocking + 30s long-poll on Recv	2026-05-15 16:00:31 +02:00
müde	409263f1c9	operator input: per-agent /send form (dashboard T4LK removed)	2026-05-15 15:28:17 +02:00
müde	accb1445e3	claude: pipe prompt via stdin (variadic --allowedTools was eating it); + ManagerRequest::Status	2026-05-15 15:06:09 +02:00
müde	c59fa8541c	phase 8 step 2: approval-gated spawn + dashboard spinner	2026-05-15 12:53:13 +02:00
müde	a42fdb3a5c	phase 8 step 1: per-agent claude creds bind + destroy keeps state	2026-05-15 12:39:22 +02:00
müde	2fd80dbd68	Phase 5c: separate proposed (manager) and applied (hive-c0re) repos; per-agent gitconfig	2026-05-14 23:20:32 +02:00
müde	433c0d212e	Phase 5b: per-agent config flakes; approve validates + advances commit	2026-05-14 23:09:35 +02:00
müde	fef2dee92a	clippy pedantic clean + wired into flake checks	2026-05-14 22:57:47 +02:00
müde	f12837fe32	Phase 5a: approval queue (request_apply_commit, pending/approve/deny)	2026-05-14 22:50:19 +02:00
müde	aa67e5a481	Phase 4: manager socket + manager_server with privileged tool surface	2026-05-14 22:35:08 +02:00

50 commits