hyperhive

Author	SHA1	Message	Date
müde	411cf86632	nix fmt + rustfmt sweep	2026-05-17 01:40:28 +02:00
müde	597351ca4e	harness: declarative claude plugin marketplaces new `hyperhive.claudeMarketplaces` option (list of strings — URL, path, or github:owner/repo). harness boot adds each via `claude plugin marketplace add` before updating + installing the configured plugins, so specs like `foo@some-marketplace` resolve on a fresh container. idempotent: 'already exists' stderr is treated as success.	2026-05-17 01:36:18 +02:00
müde	4a06615c5c	fix /state paths: sub-agents use /agents/<name>/state, not /state sub-agent containers post-refactor bind their state at /agents/<name>/state (manager keeps the legacy /state — see lifecycle.rs:751). agent.md still said /state/forge-token; corrected to /agents/{label}/state/forge-token (template-substituted at boot). tea-login systemd unit now walks both candidates so the same harness module works for the manager and sub-agents.	2026-05-16 23:37:49 +02:00
müde	9fc7cae132	prompts: tell agents + manager about the code forge; todo: shared docs repo system prompts now describe the hyperhive Forgejo at localhost:3000, the per-agent user, the pre-configured tea CLI, and the REST API fallback with /state/forge-token. todo gains the shared docs/skills RO-repo follow-up (org-shared + per-agent read membership).	2026-05-16 23:36:05 +02:00
damocles	824acee134	include agent label in turn failure notification body	2026-05-16 20:45:19 +02:00
damocles	1023acf69f	add get_logs tool to manager mcp surface	2026-05-16 20:45:19 +02:00
damocles	fca480b86e	add turn lock to prevent /compact racing with in-flight turns	2026-05-16 20:45:19 +02:00
damocles	25508d7399	fix manager loop: pending wake + move sleep into Empty arm only	2026-05-16 20:45:19 +02:00
müde	f2a0dc4107	re-apply TodoWrite removal + deny list (lost in subsequent merge)	2026-05-16 19:47:55 +02:00
damocles	f3739d2b8e	update plugin marketplaces before install at harness boot	2026-05-16 18:51:06 +02:00
damocles	dc53615686	fix stale /state refs in agent and manager prompts	2026-05-16 18:50:15 +02:00
müde	772fdd8320	forward plugin install failures to manager from sub-agents install_configured now takes an optional notify recipient. on a non-zero or spawn-failed 'claude plugin install', sub-agents send the spec + stderr to manager via the hyperhive socket; manager passes None so it doesn't message itself. boot still proceeds either way — notification is best-effort.	2026-05-16 17:24:04 +02:00
müde	3e040d5b16	agent: forward unhandled turn failures to manager run_claude now keeps a 20-line stderr ring buffer and bails with it inline (was just 'exit <status>'). agent serve loop, on Failed (not PromptTooLong — that's already absorbed by drive_turn's compaction retry), sends the error body to manager via the normal hyperhive send. swallows transport errors — failure is already in journald and the events sqlite. manager-only harness (hive-m1nd) is unchanged so it doesn't try to notify itself.	2026-05-16 16:04:35 +02:00
müde	7ec658851a	back out bypassPermissions: claude refuses it under root uid claude-code rejects --dangerously-skip-permissions / defaultMode= bypassPermissions when running as root, which all hyperhive containers do. revert to the previous explicit allow-list plumbing (per-flavor list spliced into permissions.allow + --tools enable list), keep TodoWrite out of the built-in allow set, and keep the deny list (TodoWrite, WebFetch, WebSearch, Task) as belt-and-braces in case anything sneaks past the allow gate.	2026-05-16 15:58:41 +02:00
müde	36c7f3d1c7	mirror claude stderr to tracing so journald captures it bus-only note made post-mortems require the web UI / events sqlite; now stderr lines also land in 'journalctl -M <container> -b' alongside the existing LiveEvent::Note for the dashboard.	2026-05-16 15:30:03 +02:00
müde	7d33da3727	retry hive socket up to 5x over 60s, surface retry count to claude socket client now retries connect/IO failures with 2-4-8-16-30s backoffs (60s total budget). transparent for non-tool callers via request(); tool handlers go through request_retried() which also returns the retry count, then annotate_retries() appends a one-line note to the tool result so claude knows the slow round-trip was a c0re flicker, not a content failure — avoids burning tokens on an LLM-level retry.	2026-05-16 15:28:18 +02:00
damocles	4a8a668348	feat: add optional description to request_apply_commit and request_spawn	2026-05-16 15:18:32 +02:00
damocles	a6d1464071	refactor: per-agent state paths (/agents/{label}/state), centralize in paths.rs	2026-05-16 15:18:32 +02:00
damocles	a82009cf8c	docs: update agent prompt to reference /agents/{label}/state and /agents/{label}/claude	2026-05-16 15:18:19 +02:00
müde	6dd17864ac	auto-install claude plugins at harness boot new hyperhive.claudePlugins NixOS option (list of strings) rendered to /etc/hyperhive/claude-plugins.json. both hive-ag3nt and hive-m1nd shell out 'claude plugin install <spec>' for each entry once at startup before the turn loop opens. failures log a warning but don't abort boot.	2026-05-16 15:17:34 +02:00
müde	8e7405db13	bypass-mode perms + deny list, drop allow-list plumbing claude-settings.json now sets permissions.defaultMode=bypassPermissions with a small deny list (WebFetch, WebSearch, Task, TodoWrite). The per-flavor allow list and --tools / --allowedTools CLI flags are gone — anything not denied auto-approves. mcp.rs loses ALLOWED_BUILTIN_TOOLS, builtin_tools_arg, allow_list, allowed_mcp_tools. The extraMcpServers allowedTools field is parsed for back-compat but no longer wired anywhere; restrict via permissions.deny instead.	2026-05-16 15:17:30 +02:00
damocles	3d2a7ffec7	fix: auto-wake after turn if pending messages exist, don't block on recv	2026-05-16 13:50:11 +02:00
damocles	d99e0812d0	fix: move sleep to only occur when recv returns empty, avoid message delivery delay	2026-05-16 13:48:04 +02:00
damocles	ab0df71068	docs: update system prompts to document /shared directory	2026-05-16 13:43:05 +02:00
damocles	abcf7a0c41	implement broadcast messaging: send to '*' reaches all agents with hint	2026-05-16 13:16:13 +02:00
damocles	286da8980e	Revert "mcp: wire extra server allowedTools into --allowedTools arg" This reverts commit e0b18ff3c2ec5a7f771ab9a1a247ff4a24a8c475.	2026-05-16 12:49:59 +02:00
damocles	caa495aeda	mcp: wire extra server allowedTools into --allowedTools arg	2026-05-16 12:49:59 +02:00
müde	67e4242b9f	per-agent send allow-list via hyperhive.allowedRecipients new NixOS option in harness-base.nix: hyperhive.allowedRecipients = [ 'alice' 'manager' ]; # whitelist hyperhive.allowedRecipients = [ ]; # default = unrestricted module writes the list as JSON to /etc/hyperhive/send-allow .json at activation. AgentServer::send reads the file before issuing the broker request; if the list is non-empty and `to` isn't on it, the tool returns a claude-readable refusal string without touching the broker. the manager is always implicitly permitted regardless of the list — otherwise a misconfigured allow-list could strand a sub-agent without an escalation path. enforcement is in the in-container MCP server (not on the host's per-agent socket) because the agent's nix config is the trust boundary anyway — the operator audits agent.nix at deploy time, the activation-time /etc/hyperhive/send-allow .json is r/o under /nix/store, so the agent can't tamper at runtime without going through a new approval. agent prompt mentions the option + tells claude to route through the manager when refused. retires the matching TODO under Permissions / policy.	2026-05-16 03:59:28 +02:00
müde	06af23c8a4	recv: None = peek, positive value = opt-in long-poll old behavior: omitted wait_seconds fell through to the 30s RECV_LONG_POLL_DEFAULT — claude calling 'is there anything in my inbox right now?' between actions blocked the turn for half a minute. flip the semantics: None (or 0) returns immediately, positive value parks up to MAX (180s, unchanged). cleaner 'peek vs wait' distinction; tool descriptions + agent/manager prompts updated to point at the new shape. harness's own serve loops in hive-ag3nt + hive-m1nd relied on the old default for their inbox poll. they now explicitly pass wait_seconds: Some(180) to opt into the full park — same effective behavior as before, just spelled out. retires the matching TODO under Turn loop.	2026-05-16 03:22:42 +02:00
müde	90df2106bf	agent socket: external wake-up path for in-container MCP servers new AgentRequest::Wake { from, body } drops a message into this agent's inbox via the per-agent socket. matrix-style MCP servers can use it when they receive an external event (matrix message, webhook, scrape result) to nudge claude into running a turn. broker.send wakes whatever Recv is currently long-polling, the harness picks the message up, formats a wake prompt with the caller's chosen from label ('matrix: new dm', 'webhook: deploy succeeded', etc.). new `hive-ag3nt wake --from <label> --body <text>` subcommand on the harness binary so MCP servers can shell out instead of implementing the line-JSON protocol themselves; body=='-' reads from stdin for multi-line / quoting-friendly payloads. identity = socket: anything that can connect to /run/hive/mcp .sock is implicitly trusted to inject. that's fine because the bind-mount is the agent's own container; no new auth surface opens up. docs/turn-loop.md gets a new 'Waking the agent from inside the container' section pointing at both paths (CLI + raw JSON).	2026-05-16 03:15:58 +02:00
müde	3db33b0fe5	agent flake.nix: forward inputs as flakeInputs module arg new boilerplate wraps agent.nix as a sub-module + passes every flake input (minus self) through to it via _module.args.flake Inputs. manager edits the inputs block of flake.nix to pull in out-of-tree flakes (MCP servers etc.) and references them in agent.nix as flakeInputs.<name>.packages.${pkgs.system}.default — the new input's pinned sha lands in the agent's own flake .lock (already tracked + part of the proposal flow), and transitively rolls up into meta's lock. migrate's MODULE_FLAKE_MARKER swaps to _module.args.flakeInputs so existing agents on the old 'nixosModules.default = import ./agent.nix' template get re-rendered onto the new shape on next hive-c0re start. manager_server's flake.nix tamper-check goes away — the build path's failed/<id> annotated tag already provides the safety net when a manager edit breaks the flake; enforcing 'no flake.nix edits at all' was overly strict (blocks the inputs- addition pattern that's the whole point of this change). manager prompt updated with a worked example for adding an MCP-server flake input + wiring it through agent.nix.	2026-05-16 02:23:43 +02:00
müde	7d6d8e96c1	per-agent extra MCP servers via hyperhive.extraMcpServers new NixOS option in harness-base.nix: hyperhive.extraMcpServers.<key> = { command = "/path/to/server"; args = [ ... ]; env = { KEY = "value"; }; allowedTools = [ "send_message" "join_room" ]; # or [""] }; declared as attrsOf submodule so agents stack arbitrarily many. the module writes the whole map as JSON to /etc/hyperhive/extra-mcp.json at activation; the harness reads that file in mcp::render_claude_config and merges each entry into the rendered --mcp-config under its own mcpServers.<key> block. allowed_mcp_tools(flavor) extends the --allowedTools arg with mcp__<key>__<pattern> for every entry — "" (the default) becomes mcp__<key>__* so every tool from that server is auto-approved, or pass a concrete list to tighten. collision guard: an extra server keyed "hyperhive" is dropped with a warn-log so user config can't shadow the built-in surface. malformed JSON / missing file fall back to "no extras" silently. prompt note added: agents see "(some agents only) extra MCP tools surfaced as mcp__<server>__<tool>" and learn they're declared via agent.nix. retires the matching TODO under Per-agent extension. matrix-chat agents + bitburner-agent migration unblocked.	2026-05-16 02:10:11 +02:00
müde	50ef806266	operator pronouns: configurable free-text, threaded into prompts new NixOS module option services.hive-c0re.operatorPronouns (free text, default 'she/her', example 'they/them'). hive-c0re takes it as a CLI flag (--operator-pronouns, lib.escapeShellArg'd in the systemd unit), stores it on Coordinator, threads it into the meta flake's mkAgent so each agent's systemd service gets HIVE_OPERATOR_PRONOUNS set. the harness reads the env at boot and substitutes {operator_pronouns} into the agent / manager system prompt alongside {label}. nix string is escaped against backslash + double-quote so non-ascii / quoted values round-trip safely. prompt addendum: both agent.md and manager.md mention the operator's pronouns up front so claude uses them naturally in third-person reference. propagates on next ↻ R3BU1LD (meta lock bump, no per-agent approval).	2026-05-16 02:05:22 +02:00
müde	2a6d084718	ask_operator: any agent can call it, answer routes by asker new AgentRequest::AskOperator + AgentResponse::QuestionQueued on the per-agent socket — same shape as the manager flavor, agent gets the same wire surface (still uses the same operator_questions table). agent_server::dispatch wires AskOperator through coord .questions.submit(agent, ...) so the row's asker is the sub-agent name; the ttl watchdog already in manager_server gets shared and spawn_question_watchdog goes pub. answer routing: operator_questions::answer now returns (question, asker). post_answer_question + post_cancel_question + the watchdog fire OperatorAnswered through new coord.notify_agent(asker, event) instead of always notify_manager — the event lands in whichever agent originally asked. notify_manager is now a thin wrapper. agent socket plumbing: agent_server::start takes Arc<Coordinator> instead of Arc<Broker> so dispatch has access to questions + notify path; coordinator::{register_agent,ensure_runtime} take self: &Arc<Self>. mcp::AgentServer grows the ask_operator tool; allowed_mcp_tools(Agent) adds it; prompts/agent.md replaces the 'message the manager to ask the operator' guidance with the direct tool description.	2026-05-16 01:48:10 +02:00
müde	d94712bde8	turn: unify run_turn / compact_session via TurnFiles new TurnFiles bundle (mcp_config + settings + system_prompt + flavor) materialised once per harness boot, passed to drive_turn and compact_session alike. operator-initiated /compact now uses the exact same session shape as a normal turn — same MCP surface, same allowed tools, same role prompt — only the stdin payload differs (/compact vs the wake-up body). web_ui's AppState carries the TurnFiles instead of (label + socket + flavor + ad-hoc file writes per click). bin/hive-ag3nt and bin/hive-m1nd prepare TurnFiles before spawning the web UI and pass them to both surfaces. web_ui::Flavor folds into a type alias for mcp::Flavor — no two-stage enum mapping. removes ClaudeMode + the run_claude variant fork (system prompt was Option, mcp args were skipped on Compact). dead 'mode' plumbing gone.	2026-05-16 00:57:58 +02:00
müde	02139efbb1	turn: spawn claude with cwd = /state every claude invocation now runs with current_dir set to /state — relative paths in tool calls (Read notes.md, Bash ls, Write blob) land in the agent's durable bind-mounted dir instead of the harness's systemd cwd. /state is RW + survives destroy/recreate so this matches where the agent is told to keep notes anyway. dev/test setups without the bind silently fall back to the parent cwd.	2026-05-16 00:46:19 +02:00
müde	034b4fde10	force fresh session: ↻ new session button + /new-session bus carries a one-shot AtomicBool armed by POST /api/new-session (or the /new-session slash command). next turn drops --continue, starting a fresh claude session; the flag clears automatically so subsequent turns resume normal behavior. /compact still always uses --continue — compacting a non-existent session is a no-op anyway. per-agent page grows an ↻ new session button next to the cancel-turn one (always visible, amber, confirms before posting since dropping --continue context isn't reversible). slash-command surface picks up /new-session for parity with the button. note row emitted on the live feed both at arm- time and again when the turn actually consumes the flag, so the operator can confirm it landed.	2026-05-16 00:44:45 +02:00
müde	691057d2d3	manager prompt: meta-flake era agent.nix becomes a plain NixOS module function — flake.nix is fixed boilerplate the manager mustn't edit; meta flake at /meta/ owns the wrapper. proposed repos ship with an 'applied' remote pre-wired, so 'git fetch applied' / 'git log applied/main' / 'git show applied/refs/tags/deployed/<id>' all just work without constructing paths by hand. /meta/ exposes the system-wide deploy log (git log /meta) + flake.lock for cross-agent sha introspection.	2026-05-16 00:35:30 +02:00
müde	edb0108ae7	docs+prompt: tag-driven flow + /applied RO mount manager prompt: explain that arbitrary files now travel with the proposal, document the /applied/<n>/.git RO mount and the tag scheme (git show applied/deployed/<id> etc.), call out that applied/main only advances on deployed so a failed build isn't terminal. approvals.md: drop the old per-agent applied.git phrasing in favour of the single /applied RO bind, mention both manager binds together. claude.md scratchpad flips from in-flight to just-landed.	2026-05-15 23:03:48 +02:00
müde	80229c6af9	manager: needs_login / logged_in / needs_update events + update tool crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).	2026-05-15 21:42:13 +02:00
müde	fd0e493bf5	agent terminal: show full body for send tool calls send was truncating to 80 chars in the tool_use row, hiding anything past the first sentence. now renders as a collapsed <details> like Write/Edit — summary still shows the recipient + headline (so the operator can scan), expanding reveals the full body unchanged. recv side was already covered: the wake prompt shows the full incoming body, and explicit recv() tool_result rows expand to the full text via the existing collapsed-results path.	2026-05-15 21:35:48 +02:00
müde	8b9f7d21b7	model persisted to /state; stop auto-allowing claude-code unfree model persistence: /model <name> now writes to /state/hyperhive-model (in-container), Bus::new reads it on init. operator override survives harness restart and container rebuild; gone on --purge like every other piece of agent state. path overridable via HYPERHIVE_MODEL_FILE for tests. failure to persist is a warn, not fatal — runtime override still applies, just won't survive a restart. unfree opt-in: drop the auto-allowUnfreePredicate from harness-base.nix and the claude-unstable overlay. operator now has to set nixpkgs.config.allowUnfree (or a predicate listing claude-code) in their own host config. silent unfree bypass was sketchy; this is honest. readme + gotchas updated to spell out the snippet. todo: drops model-persistence + container-crash + journald (all shipped); adds per-agent send allow-list (constrain who an agent can message).	2026-05-15 21:05:40 +02:00
müde	58c3cd853b	container crash watcher → HelperEvent::ContainerCrash new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).	2026-05-15 21:02:05 +02:00
müde	6db38cf70c	model: runtime override via /model slash; fixes for port + bind - runtime model override: Bus::{model,set_model} + POST /api/model (form-encoded {model: name}). turn.rs reads bus.model() per turn so a flip lands on the next claude invocation. /api/state grows a model field; agent page shows a 'model · <name>' chip in the state row. '/model <name>' slash command POSTs to the endpoint and refreshes state. - port regression fix: agent_web_port no longer probes forward for existing agents (the previous fix shifted ports for any agent without a port file, including legacy ones whose container was already bound to the bare hashed port — dashboard rendered the new port, container was still on the old one, conn errors). new rule: port file exists → use it; absent + applied flake present → legacy, persist port_hash without probing; absent + no applied flake → fresh spawn, probe forward. - SO_REUSEADDR on both the dashboard and per-agent web UI binds via tokio::net::TcpSocket. operator hit 12 retries failing on manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly without a new dep; retry still covers the genuine process-still-alive overlap. todo: drops the model-override entry (shipped); adds two new items — model persistence (optional, future), and custom per-agent MCP tools (groundwork for moving bitburner-agent into hyperhive).	2026-05-15 20:59:45 +02:00
müde	7d93dd9db4	no nap tool — recv with long wait_seconds replaces it; max raised to 180s recv-with-timeout is strictly better than a fixed sleep because it wakes instantly on incoming messages. drop the half-written nap MCP tool, raise the recv wait_seconds cap from 60s to 180s on both agent and manager sockets. prompts updated: agent.md + manager.md now spell out the pattern — when there's nothing else useful to do, call recv with wait_seconds=180 to park the turn; do NOT use Bash sleep for the same purpose. todo drops the nap entry and the napping-state-badge follow-up; both replaced by 'just use a long recv'.	2026-05-15 20:53:15 +02:00
müde	f65ee88269	recv: optional wait_seconds parameter, capped at 60s AgentRequest::Recv and ManagerRequest::Recv grow an optional wait_seconds field (default None → 30s, capped at 60s server-side). agent_server / manager_server clamp via recv_timeout(). MCP tool schemas advertise the param so claude can pick its own poll window — useful when an agent wants to throttle wakes without entering a distinct nap state. both harness loops still pass None, keeping the existing 30s default behaviour for system-level Recvs.	2026-05-15 20:49:33 +02:00
müde	637085644d	server-side TurnState in the harness, exposed via /api/state new TurnState { Idle, Thinking, Compacting } on hive_ag3nt::events::Bus with set_state + state_snapshot. the turn loops in hive-ag3nt and hive-m1nd flip Thinking before drive_turn and Idle after; the web_ui's /api/compact handler flips Compacting around compact_session. per-agent /api/state grows turn_state + turn_state_since (unix seconds). frontend prefers the server-reported state over the client-derived one — setStateAbs takes the absolute since-time so the 'last turn' chip reads the actual server-side duration instead of the client's perceived gap between SSE events. SSE turn_start / turn_end still drive state instantly between renders; /api/state re-anchors on each turn_end refresh. new compacting state gets its own purple badge with pulse animation (mirrors thinking's amber). napping will slot in the same way once the nap tool lands.	2026-05-15 20:46:38 +02:00
müde	754db7830e	ask_operator: ttl_seconds auto-cancel + remaining-time chip manager can pass ttl_seconds to ask_operator. on submit, host stores deadline_at = now + ttl in operator_questions (new column, migrated via existing pragma_table_info pattern), spawns a tokio task that sleeps until the deadline then resolves the question with answer '[expired]' and fires the same OperatorAnswered helper event. already-resolved races no-op silently. dashboard renders a '⏳ MM:SS' chip on the question row when deadline_at is set. format collapses seconds → s, < 1h → m s, ≥ 1h → h m. heartbeat refresh (5s) keeps the chip current; the operator sees it tick down. manager prompt + mcp tool description updated. journald viewer per container queued in todo (separate task).	2026-05-15 20:38:02 +02:00
müde	2146e47770	web ui: retry binding on AddrInUse during restart races operator hit 'Address already in use (os error 98)' on a harness restart — the new harness raced the old socket's release. add a bind_with_retry helper that backs off (250ms doubling, capped at 2s, 12 tries ≈ 22s total) on AddrInUse before giving up. applied to both the per-agent web UI and the hive-c0re dashboard. proper fix would be SO_REUSEADDR via socket2 but retry covers the TIME_WAIT case fine and keeps the dep count down. Other bind errors still fail immediately (port permission, fd exhaustion).	2026-05-15 20:33:51 +02:00
müde	538e0446d7	agent page: inbox view of last 30 messages addressed to this agent new wire request AgentRequest::Recent { limit } / ManagerRequest::Recent (plus matching responses with Vec<InboxRow>). InboxRow moved to hive-sh4re so it lives on both surfaces without an internal-to-wire conversion. host-side dispatch in agent_server / manager_server calls broker.recent_for(name, limit). per-agent web_ui /api/state grew an inbox: Vec<InboxRow> populated via the same per-agent socket (best-effort; transport failure returns empty). frontend renders as a collapsible <details> section between the state row and the terminal — fmt timestamp / from / body in a tight grid, capped at 16em scrollable. only visible when there are rows.	2026-05-15 20:32:19 +02:00

1 2 3

122 commits