hyperhive

Author	SHA1	Message	Date
müde	3d14ddeb7d	lifecycle: bind /meta RO into manager set_nspawn_flags now adds a third manager-only bind alongside /agents (RW) and /applied (RO): --bind-ro=/var/lib/hyperhive/meta :/meta. manager can git log /meta to see every deploy across the swarm and cat /meta/flake.lock to introspect which sha each agent is currently pinned at. defensive create_dir_all on the host side so a cold start with no agents (meta repo not yet seeded) doesn't trip systemd-nspawn's missing-bind-source check before the migration plants the dir.	2026-05-16 00:24:39 +02:00
müde	92822efe16	meta: new hive-c0re module owns /var/lib/hyperhive/meta/ leaf module with no runtime callers yet (every public item is #[allow(dead_code)] until lifecycle / actions / auto_update rewire to use it). API surface: - sync_agents — idempotent: render flake.nix for the given agent set, git-init on first call, nix flake lock, commit if anything changed. - prepare_deploy / finalize_deploy / abort_deploy — two-phase for the request_apply_commit path. prepare runs nix flake lock --update-input agent-<n> without committing; finalize commits with a 'deploy <n> deployed/<id> <sha12>' message; abort git-restores the lock so a failed build leaves no orphan commit. - lock_update_hyperhive — one-shot for the auto-update path. flake.nix template defines mkAgent that pulls each agent's nixosModules.default from its input and wraps with the identity / HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT module — what setup_applied used to generate inline. nix invocations carry --extra-experimental-features as a belt in case flakes aren't enabled in nix.conf.	2026-05-16 00:22:37 +02:00
müde	5b5a93e0c6	lifecycle: module-only agent flake.nix, tracked in proposed setup_proposed now seeds both agent.nix (a regular NixOS module function) and flake.nix (boilerplate exporting nixosModules.default = import ./agent.nix) into the manager-editable proposed repo, committed together. setup_applied's hyperhive_flake + dashboard port wrapper generation is deleted entirely — the meta flake at /var/lib/hyperhive/meta/ now owns the wrapper module. setup_ applied just fetches proposed's main on first spawn and tags deployed/0; subsequent rebuilds touch nothing in applied that the manager didn't author. spawn + rebuild keep their old param list with the now-unused hyperhive_flake + dashboard_port underscored — call sites get cleaned up after the meta module lands and consumes them.	2026-05-16 00:10:06 +02:00
müde	a1cfb60fd0	docs: pre-load meta-flake design scratchpad in claude.md and an in-flight callout at the top of docs/approvals.md describe the upcoming overhaul so subsequent commits can cite the design. covers: module-only agent flake shape, /var/lib/hyperhive/meta/ as a hive-c0re-owned single repo, applied remote pre-wired in proposed for manager git plumbing, /meta RO bind for the system-wide deploy log, auto-migration on hive-c0re startup with HIVE_SKIP_META_MIGRATION kill-switch.	2026-05-16 00:06:42 +02:00
müde	e26143a412	dashboard: diff against applied/proposal/<id>, prefer fetched_sha approval_diff now runs git diff refs/heads/main..refs/tags/ proposal/<id> against the applied repo instead of cobbling a single-file diff from proposed. consequences: multi-file proposals show every change, manager amendments in proposed cannot lie about what'll be deployed, no-op proposals render an explicit '(proposal matches currently-deployed tree)'. displayed sha prefers fetched_sha (hive-c0re-vouched) and falls back to commit_ref only for the brief pre-fetch window. unified_diff helper + similar dep dropped — git diff is the source of truth now. dead-code allows on the lifecycle git helpers + approvals.set_fetched_sha come off since all are wired up. readme picks up the tag flow + /applied RO mount.	2026-05-15 23:18:17 +02:00
müde	fc61cb9310	fmt: clippy doc_markdown backticks	2026-05-15 23:11:10 +02:00
müde	edb0108ae7	docs+prompt: tag-driven flow + /applied RO mount manager prompt: explain that arbitrary files now travel with the proposal, document the /applied/<n>/.git RO mount and the tag scheme (git show applied/deployed/<id> etc.), call out that applied/main only advances on deployed so a failed build isn't terminal. approvals.md: drop the old per-agent applied.git phrasing in favour of the single /applied RO bind, mention both manager binds together. claude.md scratchpad flips from in-flight to just-landed.	2026-05-15 23:03:48 +02:00
müde	4a8204f035	lifecycle: bind /applied into manager read-only set_nspawn_flags now adds --bind-ro=/var/lib/hyperhive/applied :/applied for the manager container alongside the existing /agents RW mount. manager can git-fetch deployed/failed/denied tags out of /applied/<n>/.git to mirror them into its proposed clones; the read-only bind means git plumbing inside the container cannot corrupt the authoritative repos. picked up by the next rebuild of hm1nd (no spawn-time change needed since set_nspawn_flags runs on every spawn + rebuild).	2026-05-15 23:02:31 +02:00
müde	6cf66e23dc	actions: deny plants annotated denied/<id> tag apply-commit denials now leave a git object behind: tag denied/<id> annotated with the operator's note (or empty body if they didn't supply one) at proposal/<id> inside the applied repo. rejected configs become first-class git history — git show denied/<id> in the manager's applied.git mount yields the tree the operator rejected plus the reason. helper event carries the tag for parity with deployed/failed. spawn denials fall through unannotated since they have no proposal commit. deny becomes async (single git plumbing call); dashboard + admin-socket callers grow .await.	2026-05-15 23:01:22 +02:00
müde	df9da4d6e1	todo: recv default should not sleep, agent opts into wait	2026-05-15 23:00:25 +02:00
müde	315d4289c7	actions: tag-driven approve(ApplyCommit) flow run_apply_commit walks the approval through the tag state machine in applied: approved/<id> + building/<id> stamped before the build, then git read-tree --reset to proposal/<id> populates the working dir without moving HEAD. on rebuild success deployed/<id> is planted and refs/heads/main fast- forwards to the proposal. on failure failed/<id> is annotated with the build error and the working tree resets back to main so the agent stays evaluable. helper events Rebuilt + ApprovalResolved both carry the terminal tag so the manager can git-show the exact tree (and read the failure note from an annotated tag) against its read-only applied.git mount. finish_approval grows a terminal_tag param; spawn path passes None. lifecycle::apply_commit deleted.	2026-05-15 23:00:01 +02:00
müde	35b0edaf27	manager_server: fetch+tag at request_apply_commit submit submit_apply_commit (1) queues the approval row, (2) git-fetches the manager-supplied sha from proposed into applied, pins it as refs/tags/proposal/<id>, (3) persists the resolved sha on the row via approvals.set_fetched_sha. from this point on the proposal is immutable from the manager's perspective: amends or force-pushes in proposed do not change what hive-c0re will build. fetch failures mark the row failed and surface the error to the manager so a phantom pending entry can't linger.	2026-05-15 22:57:43 +02:00
müde	8cb8fcedad	lifecycle: setup_applied seeds via fetch + tags deployed/0 new shape: applied is git-init'd at first spawn, fetches proposed's initial commit into its main, tags deployed/0 there. the wrapper flake.nix is regenerated on every spawn/rebuild but no longer tracked — apply churn vanishes, manager-authored files in the proposal flow now survive untouched. setup_applied gains an Option<&Path> for proposed (None on rebuild paths that just refresh the flake). pre-overhaul applied dirs are detected via the missing deployed/0 tag and bail loudly with the destroy --purge migration hint. apply_commit is stubbed with a clear error until the tag-driven approve flow lands.	2026-05-15 22:56:58 +02:00
müde	63ef69674b	lifecycle: git helpers for tag-driven applied repo new plumbing for the upcoming flow: git_fetch_to_tag (pulls a sha from proposed into applied and pins it as a tag in one shot), git_rev_parse (normalises shas + reads back tag targets), git_tag / git_tag_annotated (lightweight vs body- carrying for failed/denied), git_read_tree_reset (replace working tree without moving HEAD — lets main stay on last known-good across an in-flight build), git_update_ref (ff main on deploy). annotated tag bodies go via stdin to avoid escape games. all dead-code-allowed; callers land in subsequent commits.	2026-05-15 22:52:23 +02:00
müde	b32c3d4f98	approvals: persist fetched_sha alongside the queue new column fetched_sha records the canonical sha hive-c0re plans to fetch from the proposed repo into applied at submit time. distinct from commit_ref (manager-supplied, may be amended out from under the queue). set_fetched_sha is unused until manager_server wires the fetch step next commit.	2026-05-15 22:49:04 +02:00
müde	871e7bf3fa	wire types: add sha + tag to Approval and HelperEvent approval grows fetched_sha (canonical hive-c0re-vouched sha, distinct from manager-supplied commit_ref). helperevent {approvalresolved,spawned,rebuilt} grow optional sha + tag so the manager can git-show the exact tree it's hearing about (against the upcoming /agents/<n>/applied.git RO mount) and know which terminal tag landed. all serde-defaulted; existing construction sites pass none until the tag-driven flow lands.	2026-05-15 22:47:39 +02:00
müde	497cd15137	docs: tag-driven config-apply plan + migration story scratchpad in claude.md marks this as in-flight; docs/approvals.md gets the new tag state machine (proposal/approved/building/deployed/ failed/denied) and the manager applied.git read-only mount. todo picks up the unprivileged-containers git-identity caveat and a web ui for config repos as a downstream follow-up.	2026-05-15 22:43:47 +02:00
müde	75e7faff0c	docs: full sync ahead of compaction + config-management overhaul readme: manager mcp surface picks up update; operator-surface recap mentions /model + last-turn + model chip + the three collapsibles (inbox / journald / agent.nix). web-ui.md: details-restore-key story under shape; port-conflict banner mention on containers; agent.nix viewer alongside journald; notifications use per-event tags + console.debug log on block/show; deny endpoint takes note=<reason>; data-prompt / data-prompt-field generalisation noted. conventions.md: data-prompt and snapshot/restoreOpenDetails added to the async-forms section. persistence.md: operator_questions row picks up deadline_at (ttl) column with a migration note. todo.md: new 'Bugs' section captures the manager-question not-rendering issue with three suspect paths to chase. claude.md scratchpad rewritten as a clean handoff for the compaction + the upcoming config-git overhaul. flags the two-repo (proposed/ + applied/) split as the thing to reconsider.	2026-05-15 22:12:40 +02:00
müde	6a2ffd521b	surface agent-vs-agent port collisions (manager:8000 can't collide) manager is fixed at 8000, sub-agents are 8100-8999, so collisions are strictly between two sub-agents hashing to the same value. the colliding container's harness restart-loops on AddrInUse — which the user just hit on :8945. previously the only sign was a buried journalctl warn line. now surfaced two ways: - lifecycle::spawn / rebuild preflight: walks the live container list, computes each agent's hashed port, refuses with 'port N already taken by <other> — rename one of them' if any running sub-agent shares the new agent's port. so the operator sees an actionable error in the dashboard's transient pill / approve-result instead of waiting for the harness to die. - /api/state grows a port_conflicts: [{port, agents: [...]}] array; dashboard renders a pulsing red banner above the containers list listing each cluster. matches the questions panel pulse so it's hard to miss.	2026-05-15 22:08:19 +02:00
müde	2029840671	deny: operator can attach a reason that reaches the manager clicking DENY on the dashboard now prompts for an optional reason ('reason for denying (optional, sent to manager):'). the value rides along as a hidden 'note' form field; backend chain: POST /deny/{id} { note } → actions::deny(coord, id, Some(note)) → Approvals::mark_denied writes it to the row → HelperEvent::ApprovalResolved { ..., note: Some("...") } manager already had note: Option<String> on the event, just never populated for denials before. host admin socket (hive-c0re deny) still passes None. generalized the prompt-on-submit pattern: any form with a data-prompt attribute pops a window.prompt() before the POST and stashes the answer in a hidden input named by data-prompt-field (default 'note'). reusable for future opt-in note fields.	2026-05-15 21:58:42 +02:00
müde	91c78d626f	dashboard: per-container applied agent.nix viewer new GET /api/agent-config/{name} returns the contents of /var/lib/hyperhive/applied/<name>/agent.nix — the file the container actually builds against. validated against the live container list to avoid arbitrary filesystem reads. frontend mirrors the journald viewer: collapsed <details> on each container row, lazy-fetches on expand, refresh button re-fetches. restore-keyed (agent-config:<name>) so it survives the dashboard heartbeat refresh. read-only — mutating the applied config goes through the existing request_apply_commit + operator approval flow.	2026-05-15 21:46:25 +02:00
müde	80229c6af9	manager: needs_login / logged_in / needs_update events + update tool crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).	2026-05-15 21:42:13 +02:00
müde	b374f39b0d	dashboard: preserve <details open> across refresh via data-restore-key generalises the focus-preservation pattern to expanded details sections (journald viewer was collapsing on every 5s refresh; same issue for approval diff blocks). before re-render we snapshot which <details data-restore-key=...> are open; after render we re-apply. setting .open = true programmatically also fires the toggle event, so journald's lazy-fetch listener re-runs cleanly. tagged: journal:<container>, approval-diff:<id>. anything else that should survive a refresh just needs a stable data-restore-key attribute.	2026-05-15 21:37:17 +02:00
müde	fd0e493bf5	agent terminal: show full body for send tool calls send was truncating to 80 chars in the tool_use row, hiding anything past the first sentence. now renders as a collapsed <details> like Write/Edit — summary still shows the recipient + headline (so the operator can scan), expanding reveals the full body unchanged. recv side was already covered: the wake prompt shows the full incoming body, and explicit recv() tool_result rows expand to the full text via the existing collapsed-results path.	2026-05-15 21:35:48 +02:00
müde	3b532753b3	notifications: per-event tags + debug logs bug: all notifications used tag='hyperhive', so each new fire replaced the previous — operator only ever saw one at a time and might miss the fact that a second arrived. now per-event tags (hyperhive:approval:<id>, hyperhive❓<id>, hyperhive:msg:<at>:<rand>) so distinct events stack in the OS notification center. dropped the bogus icon (was pointing at dashboard.css) — some browsers refuse to display a notification with an invalid icon. added console.debug at every block point (not supported, permission not granted, muted) and a 'shown' log on success, so the operator can see in the browser console exactly why a notification didn't fire. note for the operator: most browsers also suppress notifications while the originating tab is FOCUSED. that's a browser-level decision, not ours.	2026-05-15 21:34:21 +02:00
müde	62d1a74929	docs sync + revert auto-unfree removal revert the earlier 'operator must set allowUnfree' move: per-agent containers evaluate their own nixpkgs and the operator's host-level allowUnfree doesn't propagate in. restoring the scoped allowUnfreePredicate inside both the claude-unstable overlay and harness-base.nix; documented in README + gotchas as 'nothing to set on the operator side'. docs: - claude.md file map adds crash_watch.rs, kick_agent on coordinator, /api/model + journald viewer + bind-with-retry references. - scratchpad rewritten to reflect the recent run. - web-ui.md: notification row + browser notifications section, state row (badge + model chip + last-turn chip + cancel button), per-agent inbox, /model slash, /cancel-question + journald endpoints, focus-preservation on refresh. - turn-loop.md: --model is read from Bus::model() per turn (runtime override via /model); recv(wait_seconds) up to 180s with the rationale; ask_operator gains ttl_seconds; new TurnState section; kick_agent inbox-on-startup hint. - approvals.md: ttl/cancel resolution paths for operator questions. - persistence.md: /state/hyperhive-model file. - gotchas.md: web UI port collision policy (rename, don't probe); bind retry + SO_REUSEADDR shape; auto-unfree restored. - todo.md: cleaned up empty sections and stale entries; /model shipped, dropped from the list.	2026-05-15 21:26:13 +02:00
müde	d275b50177	dashboard: don't yank the form away while operator is typing every refreshState tick does root.innerHTML = '' across the managed sections, which destroys any focused input. detect the case before re-rendering: if document.activeElement is an INPUT / TEXTAREA / SELECT inside one of the managed sections, skip this tick and try again in 2s. eventually the operator blurs and the refresh lands. managed section ids: containers / tombstones / questions / inbox / approvals. msgflow + message-flow SSE rows don't have inputs so they're not affected.	2026-05-15 21:19:01 +02:00
müde	acaa0eb895	agent_web_port: back to pure hash, drop port-file dance operator's call: probing-forward + state-file machinery is more brittle than the bug it tried to fix. revert to the original deterministic FNV-1a hash mod 900. collisions are real but rare; operator resolves by renaming (different name → different hash) and rebuilding. no per-agent port file, no scan, no migration path, nothing to drift out of sync with the running container. existing port files on disk are silently ignored — operator rebuilds affected agents to regenerate flakes from the deterministic hash.	2026-05-15 21:17:31 +02:00
müde	c35f566d15	agent_web_port: actually resolve legacy collisions previous attempt was wrong: the legacy branch returned port_hash unconditionally, so two legacies hashing to the same port both wrote that port and the collision persisted (test still trying to bind coder's port). new rule: always probe forward from port_hash, with scan_taken_ports parameterised by include_implicit_hashes: - legacy migration (applied dir exists, no port file): pass false. scan only counts other agents' port files. first-queried legacy claims its hash; subsequent colliders see the first's port file and probe forward. we don't know which legacy originally won the bind race, so first-write-wins; the loser was already crash-looping anyway and gets a fresh port to rebuild to. - fresh spawn (no applied dir): pass true. counts port files AND implicit hashes for not-yet-migrated legacies, so a new spawn doesn't race with an unmigrated peer. migration note for affected users: agents whose port file says something other than their hashed port may have been corrupted by the previous fix. Hit ↻ R3BU1LD on the offender to regenerate the flake (uses the current port file) and the container will bind the right port on restart.	2026-05-15 21:13:17 +02:00
müde	237b215c55	dashboard: browser notifications for operator-bound events three signals fire OS notifications: - new approval lands in the queue (per id, via /api/state delta) - new ask_operator question queued (per id) - broker message sent to operator (live via SSE) first /api/state render after page load seeds the 'seen' sets without firing — only items that arrive while the page is open count. controls in a row under the banner: 🔔 enable notifications (calls requestPermission, hides on grant), 🔕 mute / 🔔 unmute toggle (localStorage-backed so operator can silence without revoking the permission), inline status text when blocked or unsupported. notification tag='hyperhive' collapses rapid bursts; onclick focuses the dashboard tab. requires secure context (HTTPS or localhost) — on other origins the API is unavailable and the controls hide themselves. todo: entry dropped.	2026-05-15 21:10:20 +02:00
müde	a67aada7c9	todo: browser notifications for approvals / questions / operator msgs pure frontend — Notification API + existing /api/state and /messages/stream signals. Caveats: secure-context requirement (HTTPS or localhost), per-browser permission grant. Includes a sketch of the implementation: request-permission button, count deltas on refreshState, SSE hook on operator-bound sends, localStorage 'muted' toggle.	2026-05-15 21:07:21 +02:00
müde	8b9f7d21b7	model persisted to /state; stop auto-allowing claude-code unfree model persistence: /model <name> now writes to /state/hyperhive-model (in-container), Bus::new reads it on init. operator override survives harness restart and container rebuild; gone on --purge like every other piece of agent state. path overridable via HYPERHIVE_MODEL_FILE for tests. failure to persist is a warn, not fatal — runtime override still applies, just won't survive a restart. unfree opt-in: drop the auto-allowUnfreePredicate from harness-base.nix and the claude-unstable overlay. operator now has to set nixpkgs.config.allowUnfree (or a predicate listing claude-code) in their own host config. silent unfree bypass was sketchy; this is honest. readme + gotchas updated to spell out the snippet. todo: drops model-persistence + container-crash + journald (all shipped); adds per-agent send allow-list (constrain who an agent can message).	2026-05-15 21:05:40 +02:00
müde	58c3cd853b	container crash watcher → HelperEvent::ContainerCrash new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).	2026-05-15 21:02:05 +02:00
müde	6db38cf70c	model: runtime override via /model slash; fixes for port + bind - runtime model override: Bus::{model,set_model} + POST /api/model (form-encoded {model: name}). turn.rs reads bus.model() per turn so a flip lands on the next claude invocation. /api/state grows a model field; agent page shows a 'model · <name>' chip in the state row. '/model <name>' slash command POSTs to the endpoint and refreshes state. - port regression fix: agent_web_port no longer probes forward for existing agents (the previous fix shifted ports for any agent without a port file, including legacy ones whose container was already bound to the bare hashed port — dashboard rendered the new port, container was still on the old one, conn errors). new rule: port file exists → use it; absent + applied flake present → legacy, persist port_hash without probing; absent + no applied flake → fresh spawn, probe forward. - SO_REUSEADDR on both the dashboard and per-agent web UI binds via tokio::net::TcpSocket. operator hit 12 retries failing on manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly without a new dep; retry still covers the genuine process-still-alive overlap. todo: drops the model-override entry (shipped); adds two new items — model persistence (optional, future), and custom per-agent MCP tools (groundwork for moving bitburner-agent into hyperhive).	2026-05-15 20:59:45 +02:00
müde	7d93dd9db4	no nap tool — recv with long wait_seconds replaces it; max raised to 180s recv-with-timeout is strictly better than a fixed sleep because it wakes instantly on incoming messages. drop the half-written nap MCP tool, raise the recv wait_seconds cap from 60s to 180s on both agent and manager sockets. prompts updated: agent.md + manager.md now spell out the pattern — when there's nothing else useful to do, call recv with wait_seconds=180 to park the turn; do NOT use Bash sleep for the same purpose. todo drops the nap entry and the napping-state-badge follow-up; both replaced by 'just use a long recv'.	2026-05-15 20:53:15 +02:00
müde	f65ee88269	recv: optional wait_seconds parameter, capped at 60s AgentRequest::Recv and ManagerRequest::Recv grow an optional wait_seconds field (default None → 30s, capped at 60s server-side). agent_server / manager_server clamp via recv_timeout(). MCP tool schemas advertise the param so claude can pick its own poll window — useful when an agent wants to throttle wakes without entering a distinct nap state. both harness loops still pass None, keeping the existing 30s default behaviour for system-level Recvs.	2026-05-15 20:49:33 +02:00
müde	637085644d	server-side TurnState in the harness, exposed via /api/state new TurnState { Idle, Thinking, Compacting } on hive_ag3nt::events::Bus with set_state + state_snapshot. the turn loops in hive-ag3nt and hive-m1nd flip Thinking before drive_turn and Idle after; the web_ui's /api/compact handler flips Compacting around compact_session. per-agent /api/state grows turn_state + turn_state_since (unix seconds). frontend prefers the server-reported state over the client-derived one — setStateAbs takes the absolute since-time so the 'last turn' chip reads the actual server-side duration instead of the client's perceived gap between SSE events. SSE turn_start / turn_end still drive state instantly between renders; /api/state re-anchors on each turn_end refresh. new compacting state gets its own purple badge with pulse animation (mirrors thinking's amber). napping will slot in the same way once the nap tool lands.	2026-05-15 20:46:38 +02:00
müde	0385d96bf3	dashboard: per-container journald viewer new GET /api/journal/{name}?unit=&lines= shells out journalctl -M <container> -b --no-pager --output=short-iso --lines=<N> (cap 5000). optional unit filter, restricted to hive-ag3nt.service / hive-m1nd.service so the shell-out can't be coerced into reading unrelated units. validates the container name against the live list before invoking journalctl. frontend renders a collapsed '↳ logs · <container>' details block on each container row. expanding triggers a lazy fetch; refresh button re-fetches; unit dropdown switches between the harness service (default) and the full machine journal. output sits in a 24em-tall monospace pre, auto-scrolled to the bottom on fresh fetch. hive-c0re's systemd unit already runs as root, so journalctl has the access it needs.	2026-05-15 20:42:56 +02:00
müde	79a46f359a	agent_web_port: collision-aware sticky allocation operator hit 'coder' and 'test' colliding on the same hashed port — fnv-1a mod 900 has ~0.1% collision probability per pair and clearly that's not enough. agent_web_port goes stateful: - per-agent port persisted to /var/lib/hyperhive/agents/<name>/port - on first call, look up the file; if absent, hash, then probe forward through the allocated range skipping any port other agents already claim, then write the chosen value back - subsequent calls return the persisted port (sticky) other agents' ports come from their port file if present, else the fallback is the hashed value — that handles existing deployments without forcing a rebuild-all just to migrate. rebuilding the colliding agent re-runs agent_web_port, sees its peer's implicit hash port as taken, picks the next free slot, persists. range exhaustion (very unlikely — 900 slots) logs a warning and returns the hash; the bind-with-retry on the harness will surface the failure honestly rather than silently looping.	2026-05-15 20:41:18 +02:00
müde	754db7830e	ask_operator: ttl_seconds auto-cancel + remaining-time chip manager can pass ttl_seconds to ask_operator. on submit, host stores deadline_at = now + ttl in operator_questions (new column, migrated via existing pragma_table_info pattern), spawns a tokio task that sleeps until the deadline then resolves the question with answer '[expired]' and fires the same OperatorAnswered helper event. already-resolved races no-op silently. dashboard renders a '⏳ MM:SS' chip on the question row when deadline_at is set. format collapses seconds → s, < 1h → m s, ≥ 1h → h m. heartbeat refresh (5s) keeps the chip current; the operator sees it tick down. manager prompt + mcp tool description updated. journald viewer per container queued in todo (separate task).	2026-05-15 20:38:02 +02:00
müde	2146e47770	web ui: retry binding on AddrInUse during restart races operator hit 'Address already in use (os error 98)' on a harness restart — the new harness raced the old socket's release. add a bind_with_retry helper that backs off (250ms doubling, capped at 2s, 12 tries ≈ 22s total) on AddrInUse before giving up. applied to both the per-agent web UI and the hive-c0re dashboard. proper fix would be SO_REUSEADDR via socket2 but retry covers the TIME_WAIT case fine and keeps the dep count down. Other bind errors still fail immediately (port permission, fd exhaustion).	2026-05-15 20:33:51 +02:00
müde	538e0446d7	agent page: inbox view of last 30 messages addressed to this agent new wire request AgentRequest::Recent { limit } / ManagerRequest::Recent (plus matching responses with Vec<InboxRow>). InboxRow moved to hive-sh4re so it lives on both surfaces without an internal-to-wire conversion. host-side dispatch in agent_server / manager_server calls broker.recent_for(name, limit). per-agent web_ui /api/state grew an inbox: Vec<InboxRow> populated via the same per-agent socket (best-effort; transport failure returns empty). frontend renders as a collapsible <details> section between the state row and the terminal — fmt timestamp / from / body in a tight grid, capped at 16em scrollable. only visible when there are rows.	2026-05-15 20:32:19 +02:00
müde	bd7d2d4860	agent page: dashboard back-link + last-turn timing chip title bar grows a '↑ DASHB04RD' link next to the rebuild button — opens the host dashboard in a new tab so the operator can pivot between agents without losing the live tail. uses the dashboardPort already plumbed via /api/state. state row picks up a 'last turn 12.3s' chip that fills in when state transitions away from thinking. format: ms / s.s / m s. hidden until the first turn completes.	2026-05-15 20:27:09 +02:00
müde	ee5b85716d	ask_operator: operator-side ✗ CANC3L on pending questions new POST /cancel-question/{id} resolves a pending operator question with the sentinel answer '[cancelled]' and fires the usual HelperEvent::OperatorAnswered so the manager sees a terminal state and can fall back. uses the same OperatorQuestions::answer path — no special handling, the manager already has to deal with arbitrary answer strings. dashboard renders the cancel as a separate <form> below the main qform so the answer-merge submit handler on the main form doesn't inadvertently fire when the operator clicks cancel. confirm dialog spells out what the manager will see. ttl-based auto-cancel is still on the todo (would spawn a tokio task per submitted question).	2026-05-15 20:25:11 +02:00
müde	bc87ff80d2	agent terminal: inline +/- diffs on Write and Edit tool calls Write and Edit tool_use rows used to render as the bare file path. now they're collapsed <details> blocks with the actual change inside — Write shows every content line prefixed '+', Edit shows old_string as '-' lines then new_string as '+' lines. summary carries the file path + counts ('→ Edit /foo · -3 +5'). lines colored via diff-add / diff-del / diff-ctx; click to expand the full body. renderFileWriteEdit returns null for any other tool so the existing flat-row path (fmtToolUse) is untouched.	2026-05-15 20:23:22 +02:00
müde	2413d664a1	agents get a kickoff inbox message on start/restart/rebuild new Coordinator::kick_agent(name, reason) drops a system message into the agent's inbox so the next turn picks it up with a 'you were just (re)started, check /state/ for notes, --continue session is intact' hint. wakes the turn loop without any harness-side handling needed — it's just another inbox message with sender = 'system'. wired from: - dashboard /start /restart /rebuild handlers (via lifecycle_action's on-success tail) - manager mcp_hyperhive_start / restart dashboard: pending approvals + tombstones + questions now refresh on a 5s heartbeat when nothing else is happening. previously refresh only fired on async-form submit or on broker traffic addressed to operator — manager-queued approvals went through neither, so the operator had to reload to see them. 5s is the slow-path; 2s remains for in-flight transients.	2026-05-15 20:19:36 +02:00
müde	8b10731aa4	split claude.md into docs/ — per-topic, human-readable claude.md was eating 400 lines of subsystem detail that's useful when you're working on that subsystem and noise the rest of the time. split into: - docs/conventions.md naming, identity, async forms, commit style - docs/gotchas.md nspawn / nixos-container quirks - docs/web-ui.md dashboard + per-agent layouts and endpoints - docs/turn-loop.md claude invocation, wake prompt, mcp surface - docs/approvals.md approval flow, manager policy, helper events - docs/persistence.md sqlite dbs, retention, state dir layout claude.md is now the entry point — file map, reading paths ("pick the doc that matches your task"), quick reminders that fit on one screen, and a small scratchpad section for in-flight context. references the docs; the docs don't reference claude.md. no content was lost — the docs/ files cover everything the old claude.md did, plus things i wrote up better while extracting.	2026-05-15 20:17:11 +02:00
müde	c27111ac32	dashboard: split api_state into per-section builders drops the #[allow(clippy::too_many_lines)] on api_state by extracting four pure helpers: - build_container_views — live containers + any_stale flag - build_transient_views — agents in pre-creation Spawning state only - build_approval_views — pending approvals with diff html - build_tombstone_views — destroyed-but-kept state dirs api_state itself is now ~30 lines of orchestration. zero behavior change. each helper is independently readable + testable.	2026-05-15 20:13:08 +02:00
müde	7b4adea325	dashboard: lifecycle_action helper collapses start/stop/restart/rebuild five POST handlers (post_kill / post_restart / post_start / post_rebuild) were all repeating the same boilerplate: strip prefix, set_transient, call lifecycle::X, clear_transient, match the result. extract one helper that takes the transient kind, error-message verb, the work body, and an optional 'on success' tail (used by kill to also unregister + emit HelperEvent::Killed). each handler shrinks to a single lifecycle_action(..) call. zero behavior change.	2026-05-15 20:12:03 +02:00
müde	89ccc5e6c5	events.sqlite vacuum moves host-side retention is a host concern — agents have no business doing their own cleanup, and a misbehaving harness could skip it. drop spawn_events_vacuum from both hive-ag3nt and hive-m1nd, drop the matching Bus::vacuum + EventStore::vacuum methods. new hive_c0re::events_vacuum module sweeps every existing agents/<name>/state/hyperhive-events.sqlite on the same hourly cadence as the broker vacuum. same two-stage delete (older than 7 days, trim to 2000 newest). called from main alongside broker vacuum. also: server-side state badge entered into todo.md (today's badge is derived client-side from sse, fine for idle/thinking but a state machine that grows compacting/napping wants authoritative status from the harness).	2026-05-15 20:10:34 +02:00

1 2 3 4 5

213 commits