hyperhive

Author	SHA1	Message	Date
müde	3db33b0fe5	agent flake.nix: forward inputs as flakeInputs module arg new boilerplate wraps agent.nix as a sub-module + passes every flake input (minus self) through to it via _module.args.flake Inputs. manager edits the inputs block of flake.nix to pull in out-of-tree flakes (MCP servers etc.) and references them in agent.nix as flakeInputs.<name>.packages.${pkgs.system}.default — the new input's pinned sha lands in the agent's own flake .lock (already tracked + part of the proposal flow), and transitively rolls up into meta's lock. migrate's MODULE_FLAKE_MARKER swaps to _module.args.flakeInputs so existing agents on the old 'nixosModules.default = import ./agent.nix' template get re-rendered onto the new shape on next hive-c0re start. manager_server's flake.nix tamper-check goes away — the build path's failed/<id> annotated tag already provides the safety net when a manager edit breaks the flake; enforcing 'no flake.nix edits at all' was overly strict (blocks the inputs- addition pattern that's the whole point of this change). manager prompt updated with a worked example for adding an MCP-server flake input + wiring it through agent.nix.	2026-05-16 02:23:43 +02:00
müde	50ef806266	operator pronouns: configurable free-text, threaded into prompts new NixOS module option services.hive-c0re.operatorPronouns (free text, default 'she/her', example 'they/them'). hive-c0re takes it as a CLI flag (--operator-pronouns, lib.escapeShellArg'd in the systemd unit), stores it on Coordinator, threads it into the meta flake's mkAgent so each agent's systemd service gets HIVE_OPERATOR_PRONOUNS set. the harness reads the env at boot and substitutes {operator_pronouns} into the agent / manager system prompt alongside {label}. nix string is escaped against backslash + double-quote so non-ascii / quoted values round-trip safely. prompt addendum: both agent.md and manager.md mention the operator's pronouns up front so claude uses them naturally in third-person reference. propagates on next ↻ R3BU1LD (meta lock bump, no per-agent approval).	2026-05-16 02:05:22 +02:00
müde	5208b0112a	dashboard: terminal compose with @-mention sticky recipient new section under MESS4GE FL0W. msgflow already tails only broker traffic (sent + delivered), which is exactly the 'messages through core' view the operator wants; no per-agent thinking leaks through. compose box below: - a prompt span renders the sticky recipient ('@coder>'), rendered outside the textarea so it can't be edited inadvertently. on submit the recipient gets persisted to localStorage so it survives reload. - start the input with '@name body' to redirect — the parser splits at the first whitespace and the new recipient becomes sticky. - typing '@' at the start opens a completion dropdown over the textarea pulled from window.__hyperhive_state.containers; arrow keys cycle, tab/enter selects, escape closes. clicking works too. - manager swap: agents flagged is_manager are surfaced as '@manager' (the broker's recipient string) instead of '@hm1nd' (the container name), so the message actually routes to the manager's inbox. backend: new POST /op-send accepts {to, body} and drops a broker.send({from:'operator', to, body}) — same shape as the per-agent web UI's OperatorMsg, but lets the operator choose the recipient explicitly from the main dashboard.	2026-05-16 01:55:00 +02:00
müde	2a6d084718	ask_operator: any agent can call it, answer routes by asker new AgentRequest::AskOperator + AgentResponse::QuestionQueued on the per-agent socket — same shape as the manager flavor, agent gets the same wire surface (still uses the same operator_questions table). agent_server::dispatch wires AskOperator through coord .questions.submit(agent, ...) so the row's asker is the sub-agent name; the ttl watchdog already in manager_server gets shared and spawn_question_watchdog goes pub. answer routing: operator_questions::answer now returns (question, asker). post_answer_question + post_cancel_question + the watchdog fire OperatorAnswered through new coord.notify_agent(asker, event) instead of always notify_manager — the event lands in whichever agent originally asked. notify_manager is now a thin wrapper. agent socket plumbing: agent_server::start takes Arc<Coordinator> instead of Arc<Broker> so dispatch has access to questions + notify path; coordinator::{register_agent,ensure_runtime} take self: &Arc<Self>. mcp::AgentServer grows the ask_operator tool; allowed_mcp_tools(Agent) adds it; prompts/agent.md replaces the 'message the manager to ask the operator' guidance with the direct tool description.	2026-05-16 01:48:10 +02:00
müde	6b3ef4549c	manager_server: reject proposals that modify flake.nix submit_apply_commit now diffs the freshly-tagged proposal/<id> against applied/main and refuses if flake.nix is in the changeset. flake.nix is fixed boilerplate the meta flake depends on (it exports nixosModules.default = import ./agent .nix); silent edits there would break the nixosConfiguration in subtle ways. the manager prompt already says don't touch it; this is the host-side belt — clear error to the manager on submit, row marked failed in sqlite, no orphan pending approval to chase. diff-failure is logged + ignored: the build path surfaces concrete errors if flake.nix is actually broken.	2026-05-16 01:42:11 +02:00
müde	d202f3785c	suppress crash_watch during background rebuilds + meta repoint crash_watch fires ContainerCrash whenever it sees a previously- running container in a non-running state without a transient flag set. dashboard rebuilds already set Rebuilding via lifecycle_action; the two other rebuild paths didn't: - migrate::repoint_container: phase 4 walks every container, each nixos-container update activation briefly takes the systemd unit down. previously fired ContainerCrash for every agent during the migration; manager would then spuriously call start() on agents that were already coming back up. - auto_update::rebuild_agent: startup scan + admin-socket caller bypass lifecycle_action. both paths now set the Rebuilding transient around the rebuild + clear after. matches what dashboard does.	2026-05-16 01:12:48 +02:00
müde	63e8a98df2	meta: stage before lock, single commit per change git+file://'s dirty-tree fetcher reads tracked + staged content from the index (not the working tree, not untracked files). so staging is enough to make a new flake.nix or flake.lock visible to nix without committing first. sync_agents now stages flake .nix, runs lock, stages the resulting flake.lock, then commits both together in a single 'regenerate meta flake' (or 'seed meta from N agents') commit — no more two-commit churn. prepare_deploy applies the same trick to the two-phase deploy: runs nix flake update, stages flake.lock so nixos-container update sees it, doesn't commit yet. finalize_deploy commits with the deployed/<id> message on build success; abort_deploy git-restores the staged lock back to HEAD on failure. meta history continues to record only successful deploys (and now one commit per success instead of one + amend).	2026-05-16 01:02:47 +02:00
müde	220e9b4af6	meta: commit before lock — git+file:// only sees tracked files runtime error on first deploy attempt: 'source tree referenced by git+file:///var/lib/hyperhive/meta does not contain /flake.nix'. cause: sync_agents wrote flake.nix then ran 'nix flake lock' against a directory nix had just discovered as a git repo (auto-upgraded to git+file://), which only sees TRACKED content. fresh flake.nix was untracked, so nix saw an empty source tree. fix: commit flake.nix before locking. sync_agents now does write → init (if first) → git add + commit → nix flake lock → commit lock if changed. two commits per change — one 'regenerate meta flake' and one 'lock update' — instead of one combined; cleaner history. same git+file:// gotcha bit the two-phase deploy: prepare_ deploy used to write the lock without committing, expecting nixos-container update to read the working tree. it doesn't — it reads the tracked commit. prepare_deploy now commits with a placeholder 'deploy <n> (building)' message; finalize_deploy amends to 'deploy <n> deployed/<id> <sha12>' on success; abort_deploy git-reset --hard HEAD~1's it on failure. meta history still records only successful deploys.	2026-05-16 00:59:35 +02:00
müde	87c7b05b05	meta: use 'nix flake update <input>' instead of removed --update-input current nix CLI removed 'nix flake lock --update-input X' in favour of 'nix flake update X'. switch all three call sites (prepare_deploy, lock_update_for_rebuild, lock_update_hyperhive). 'nix flake lock' with no args still works for the seed path in sync_agents — it resolves missing inputs without bumping existing ones.	2026-05-16 00:49:22 +02:00
müde	14aa7c7acc	final docs + cleanup sync for meta-flake era claude.md flips 'in flight' → 'just landed' for the meta overhaul + extends the file map with meta.rs and migrate.rs. docs/approvals.md replaces the in-flight callout with a proper 'Meta flake' section (two-phase deploy walkthrough, sync_agents semantics, single-phase variants), updates the two-repo box diagram to include the /var/lib/hyperhive/meta/ tree and tracks flake.nix in applied, rewrites the container --flake reference to meta#<name>, replaces the 'Manager view of applied' section with a unified '/agents + /applied + /meta' inventory listing every useful git incantation, and explains the in-place no-state-loss migration that now runs on hive-c0re startup. docs/persistence.md grows entries for the meta repo + the .meta-migration-done marker. readme box diagram picks up the /meta RO bind; approval-flow paragraph rewritten end to end to describe the meta lock dance. lifecycle::flake_base deleted — the meta render hardcodes the manager vs agent-base choice as nix expression.	2026-05-16 00:40:06 +02:00
müde	2f6ecc4dc0	dashboard: deployed sha chip per container ContainerView grows deployed_sha (first 12 chars of the rev that /var/lib/hyperhive/meta/flake.lock currently has locked for agent-<name>). renderContainers appends a 'deployed:<sha12>' chip next to the container name + port — title attribute explains it's the meta-lock sha. degrades gracefully when the meta repo isn't seeded yet (missing / unparsable lock = empty map = no chip). new read_meta_locked_revs helper does the JSON parsing without unwraps.	2026-05-16 00:36:52 +02:00
müde	59a89314f0	startup auto-migration from pre-meta layout new migrate module runs before auto_update on hive-c0re boot. four idempotent phases: 1. for every applied/<n>/ whose flake.nix isn't already the module-only boilerplate, rewrite + commit + relocate deployed/0 to HEAD so setup_applied's existence check passes 2. for every proposed/<n>/config without an 'applied' remote, wire it (delegates to setup_proposed which is now idempotent and adds the remote itself) 3. meta::sync_agents over the current container list — inits the meta repo on first call, rerender + relock if drifted 4. nixos-container update <c> --flake meta#<name> for every container, guarded by /var/lib/hyperhive/.meta-migration-done so phase 4's expensive eval only runs once across restarts env kill-switch HIVE_SKIP_META_MIGRATION=1 defers the whole thing. each agent's failure is logged + skipped so one broken agent doesn't block the rest. runs ahead of ensure_manager so the manager auto-spawn comes up against meta from the first attempt.	2026-05-16 00:34:58 +02:00
müde	87016cd567	auto_update: bump meta hyperhive input before per-agent rebuilds auto_update::run now calls meta::lock_update_hyperhive once up-front so the per-agent rebuilds it kicks off rebuild against the new base. lifecycle::rebuild already drives sync_agents + lock_update_for_rebuild per agent, so the rev-marker shortcut keeps its meaning ('we've ack'd this rev for this agent') without further plumbing. failures of the hyperhive lock bump log + continue — individual rebuilds will surface concrete errors if anything's really wrong.	2026-05-16 00:32:55 +02:00
müde	06fdbac1ac	actions::run_apply_commit through meta two-phase approval-driven deploys now walk the meta flake via prepare_deploy / finalize_deploy / abort_deploy so a failed build leaves no commit in meta's deploy log: 1. capture applied/main sha for rollback 2. tag approved/<id> + building/<id> 3. ff applied/main to proposal/<id>, read-tree sync working tree 4. meta::prepare_deploy(name) — nix flake lock --update-input agent-<n> without committing 5. lifecycle::rebuild_no_meta — container-level only (new extracted helper; public lifecycle::rebuild still wraps it with single-phase meta sync + commit for dashboard / auto _update callers that don't care about rollback) 6a. on success: tag deployed/<id>, meta::finalize_deploy commits the staged lock with 'deploy <n> deployed/<id> <sha12>' 6b. on failure: tag failed/<id> annotated with the build error, git_update_ref applied/main back to prev sha, read-tree to main, meta::abort_deploy git-restores flake.lock meta's git log now records only successful deploys; failures + denials still live in applied as annotated tags.	2026-05-16 00:32:16 +02:00
müde	22f35def8f	actions::destroy syncs meta after lifecycle once nixos-container destroy lands + per-agent state cleanup is done, rerender the meta flake from the remaining containers so the destroyed agent's input + nixosConfiguration drop off and its flake.lock entry vanishes. log + keep going on meta-sync failure — the destroy already succeeded at the lifecycle level, so meta drift here is just bookkeeping. new public lifecycle::agents_for_meta_listing exposes the agent enumeration for callers outside the module.	2026-05-16 00:29:26 +02:00
müde	4cb529351e	lifecycle::rebuild through meta rebuild now does sync_agents (idempotent — no-op when the rendered flake matches disk; recovers from a divergent meta repo on the side) followed by lock_update_for_rebuild which relocks just this agent's input and commits the lock change if any. flake ref for nixos-container update flips from applied/<n>#default to meta#<name>. new helper meta::lock_update_for_rebuild is single-phase (no separate finalize): rebuild has no failure-revert semantics — it always wants the latest applied/<n>/main. spawn already syncs meta before container create; rebuild now picks up the meta side on every manual ↻ R3BU1LD.	2026-05-16 00:28:26 +02:00
müde	8f94e4379a	lifecycle::spawn through meta after setup_proposed + setup_applied, spawn now syncs the meta flake (one input + one nixosConfiguration per agent) so `--flake /var/lib/hyperhive/meta#<name>` resolves before nixos-container create runs. flake ref switches from applied/<n>#default to meta#<name>; the wrapper modules (identity, HIVE_PORT, HIVE_LABEL, HIVE_DASHBOARD_PORT) now live in the meta flake's mkAgent. new helper agents_for_meta builds the AgentSpec list by enumerating containers + optionally appending a not-yet-present name for the spawn case. spawn keeps its caller signature; rebuild + auto_update get wired up in follow-up commits.	2026-05-16 00:27:12 +02:00
müde	c42ad1330c	lifecycle: pre-wire applied remote in proposed setup_proposed now lands a git remote named 'applied' on every proposed/<n>/config pointing at /applied/<n>/.git — the path as seen from inside the manager container, where the RO bind in set_nspawn_flags makes the URL resolve. From the manager: git fetch applied git log applied/main git show applied/refs/tags/deployed/<id> git diff applied/main HEAD git rebase applied/main all work without manually constructing the path each time. The RO bind blocks push at the kernel level so the remote can only fetch. Idempotent — also applied to pre-existing proposed repos (no-op if the remote is already correct, set-url if drifted) so the startup migration picks up the wiring on existing agents.	2026-05-16 00:25:43 +02:00
müde	3d14ddeb7d	lifecycle: bind /meta RO into manager set_nspawn_flags now adds a third manager-only bind alongside /agents (RW) and /applied (RO): --bind-ro=/var/lib/hyperhive/meta :/meta. manager can git log /meta to see every deploy across the swarm and cat /meta/flake.lock to introspect which sha each agent is currently pinned at. defensive create_dir_all on the host side so a cold start with no agents (meta repo not yet seeded) doesn't trip systemd-nspawn's missing-bind-source check before the migration plants the dir.	2026-05-16 00:24:39 +02:00
müde	92822efe16	meta: new hive-c0re module owns /var/lib/hyperhive/meta/ leaf module with no runtime callers yet (every public item is #[allow(dead_code)] until lifecycle / actions / auto_update rewire to use it). API surface: - sync_agents — idempotent: render flake.nix for the given agent set, git-init on first call, nix flake lock, commit if anything changed. - prepare_deploy / finalize_deploy / abort_deploy — two-phase for the request_apply_commit path. prepare runs nix flake lock --update-input agent-<n> without committing; finalize commits with a 'deploy <n> deployed/<id> <sha12>' message; abort git-restores the lock so a failed build leaves no orphan commit. - lock_update_hyperhive — one-shot for the auto-update path. flake.nix template defines mkAgent that pulls each agent's nixosModules.default from its input and wraps with the identity / HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT module — what setup_applied used to generate inline. nix invocations carry --extra-experimental-features as a belt in case flakes aren't enabled in nix.conf.	2026-05-16 00:22:37 +02:00
müde	5b5a93e0c6	lifecycle: module-only agent flake.nix, tracked in proposed setup_proposed now seeds both agent.nix (a regular NixOS module function) and flake.nix (boilerplate exporting nixosModules.default = import ./agent.nix) into the manager-editable proposed repo, committed together. setup_applied's hyperhive_flake + dashboard port wrapper generation is deleted entirely — the meta flake at /var/lib/hyperhive/meta/ now owns the wrapper module. setup_ applied just fetches proposed's main on first spawn and tags deployed/0; subsequent rebuilds touch nothing in applied that the manager didn't author. spawn + rebuild keep their old param list with the now-unused hyperhive_flake + dashboard_port underscored — call sites get cleaned up after the meta module lands and consumes them.	2026-05-16 00:10:06 +02:00
müde	e26143a412	dashboard: diff against applied/proposal/<id>, prefer fetched_sha approval_diff now runs git diff refs/heads/main..refs/tags/ proposal/<id> against the applied repo instead of cobbling a single-file diff from proposed. consequences: multi-file proposals show every change, manager amendments in proposed cannot lie about what'll be deployed, no-op proposals render an explicit '(proposal matches currently-deployed tree)'. displayed sha prefers fetched_sha (hive-c0re-vouched) and falls back to commit_ref only for the brief pre-fetch window. unified_diff helper + similar dep dropped — git diff is the source of truth now. dead-code allows on the lifecycle git helpers + approvals.set_fetched_sha come off since all are wired up. readme picks up the tag flow + /applied RO mount.	2026-05-15 23:18:17 +02:00
müde	fc61cb9310	fmt: clippy doc_markdown backticks	2026-05-15 23:11:10 +02:00
müde	4a8204f035	lifecycle: bind /applied into manager read-only set_nspawn_flags now adds --bind-ro=/var/lib/hyperhive/applied :/applied for the manager container alongside the existing /agents RW mount. manager can git-fetch deployed/failed/denied tags out of /applied/<n>/.git to mirror them into its proposed clones; the read-only bind means git plumbing inside the container cannot corrupt the authoritative repos. picked up by the next rebuild of hm1nd (no spawn-time change needed since set_nspawn_flags runs on every spawn + rebuild).	2026-05-15 23:02:31 +02:00
müde	6cf66e23dc	actions: deny plants annotated denied/<id> tag apply-commit denials now leave a git object behind: tag denied/<id> annotated with the operator's note (or empty body if they didn't supply one) at proposal/<id> inside the applied repo. rejected configs become first-class git history — git show denied/<id> in the manager's applied.git mount yields the tree the operator rejected plus the reason. helper event carries the tag for parity with deployed/failed. spawn denials fall through unannotated since they have no proposal commit. deny becomes async (single git plumbing call); dashboard + admin-socket callers grow .await.	2026-05-15 23:01:22 +02:00
müde	315d4289c7	actions: tag-driven approve(ApplyCommit) flow run_apply_commit walks the approval through the tag state machine in applied: approved/<id> + building/<id> stamped before the build, then git read-tree --reset to proposal/<id> populates the working dir without moving HEAD. on rebuild success deployed/<id> is planted and refs/heads/main fast- forwards to the proposal. on failure failed/<id> is annotated with the build error and the working tree resets back to main so the agent stays evaluable. helper events Rebuilt + ApprovalResolved both carry the terminal tag so the manager can git-show the exact tree (and read the failure note from an annotated tag) against its read-only applied.git mount. finish_approval grows a terminal_tag param; spawn path passes None. lifecycle::apply_commit deleted.	2026-05-15 23:00:01 +02:00
müde	35b0edaf27	manager_server: fetch+tag at request_apply_commit submit submit_apply_commit (1) queues the approval row, (2) git-fetches the manager-supplied sha from proposed into applied, pins it as refs/tags/proposal/<id>, (3) persists the resolved sha on the row via approvals.set_fetched_sha. from this point on the proposal is immutable from the manager's perspective: amends or force-pushes in proposed do not change what hive-c0re will build. fetch failures mark the row failed and surface the error to the manager so a phantom pending entry can't linger.	2026-05-15 22:57:43 +02:00
müde	8cb8fcedad	lifecycle: setup_applied seeds via fetch + tags deployed/0 new shape: applied is git-init'd at first spawn, fetches proposed's initial commit into its main, tags deployed/0 there. the wrapper flake.nix is regenerated on every spawn/rebuild but no longer tracked — apply churn vanishes, manager-authored files in the proposal flow now survive untouched. setup_applied gains an Option<&Path> for proposed (None on rebuild paths that just refresh the flake). pre-overhaul applied dirs are detected via the missing deployed/0 tag and bail loudly with the destroy --purge migration hint. apply_commit is stubbed with a clear error until the tag-driven approve flow lands.	2026-05-15 22:56:58 +02:00
müde	63ef69674b	lifecycle: git helpers for tag-driven applied repo new plumbing for the upcoming flow: git_fetch_to_tag (pulls a sha from proposed into applied and pins it as a tag in one shot), git_rev_parse (normalises shas + reads back tag targets), git_tag / git_tag_annotated (lightweight vs body- carrying for failed/denied), git_read_tree_reset (replace working tree without moving HEAD — lets main stay on last known-good across an in-flight build), git_update_ref (ff main on deploy). annotated tag bodies go via stdin to avoid escape games. all dead-code-allowed; callers land in subsequent commits.	2026-05-15 22:52:23 +02:00
müde	b32c3d4f98	approvals: persist fetched_sha alongside the queue new column fetched_sha records the canonical sha hive-c0re plans to fetch from the proposed repo into applied at submit time. distinct from commit_ref (manager-supplied, may be amended out from under the queue). set_fetched_sha is unused until manager_server wires the fetch step next commit.	2026-05-15 22:49:04 +02:00
müde	871e7bf3fa	wire types: add sha + tag to Approval and HelperEvent approval grows fetched_sha (canonical hive-c0re-vouched sha, distinct from manager-supplied commit_ref). helperevent {approvalresolved,spawned,rebuilt} grow optional sha + tag so the manager can git-show the exact tree it's hearing about (against the upcoming /agents/<n>/applied.git RO mount) and know which terminal tag landed. all serde-defaulted; existing construction sites pass none until the tag-driven flow lands.	2026-05-15 22:47:39 +02:00
müde	6a2ffd521b	surface agent-vs-agent port collisions (manager:8000 can't collide) manager is fixed at 8000, sub-agents are 8100-8999, so collisions are strictly between two sub-agents hashing to the same value. the colliding container's harness restart-loops on AddrInUse — which the user just hit on :8945. previously the only sign was a buried journalctl warn line. now surfaced two ways: - lifecycle::spawn / rebuild preflight: walks the live container list, computes each agent's hashed port, refuses with 'port N already taken by <other> — rename one of them' if any running sub-agent shares the new agent's port. so the operator sees an actionable error in the dashboard's transient pill / approve-result instead of waiting for the harness to die. - /api/state grows a port_conflicts: [{port, agents: [...]}] array; dashboard renders a pulsing red banner above the containers list listing each cluster. matches the questions panel pulse so it's hard to miss.	2026-05-15 22:08:19 +02:00
müde	2029840671	deny: operator can attach a reason that reaches the manager clicking DENY on the dashboard now prompts for an optional reason ('reason for denying (optional, sent to manager):'). the value rides along as a hidden 'note' form field; backend chain: POST /deny/{id} { note } → actions::deny(coord, id, Some(note)) → Approvals::mark_denied writes it to the row → HelperEvent::ApprovalResolved { ..., note: Some("...") } manager already had note: Option<String> on the event, just never populated for denials before. host admin socket (hive-c0re deny) still passes None. generalized the prompt-on-submit pattern: any form with a data-prompt attribute pops a window.prompt() before the POST and stashes the answer in a hidden input named by data-prompt-field (default 'note'). reusable for future opt-in note fields.	2026-05-15 21:58:42 +02:00
müde	91c78d626f	dashboard: per-container applied agent.nix viewer new GET /api/agent-config/{name} returns the contents of /var/lib/hyperhive/applied/<name>/agent.nix — the file the container actually builds against. validated against the live container list to avoid arbitrary filesystem reads. frontend mirrors the journald viewer: collapsed <details> on each container row, lazy-fetches on expand, refresh button re-fetches. restore-keyed (agent-config:<name>) so it survives the dashboard heartbeat refresh. read-only — mutating the applied config goes through the existing request_apply_commit + operator approval flow.	2026-05-15 21:46:25 +02:00
müde	80229c6af9	manager: needs_login / logged_in / needs_update events + update tool crash_watch grows two more state-axes alongside running/stopped: - logged-in (claude session dir populated for the agent) - up-to-date (recorded flake rev matches current) per-tick transitions emit HelperEvent::NeedsLogin / LoggedIn / NeedsUpdate. seed-on-first-tick semantics retained — nothing fires on harness boot for agents that were already in their state. only needs_update fires the 'stale appeared' direction; the resolved direction is already covered by Rebuilt. new mcp__hyperhive__update(name) on the manager surface: idempotent rebuild via auto_update::rebuild_agent. transient-aware (Rebuilding) so the dashboard shows the spinner. login intentionally has NO tool — it's interactive OAuth, only the operator can complete it. prompts + approvals doc + turn-loop doc updated. todo grows a 'show per-agent applied config in dashboard' entry (separate follow-up).	2026-05-15 21:42:13 +02:00
müde	b374f39b0d	dashboard: preserve <details open> across refresh via data-restore-key generalises the focus-preservation pattern to expanded details sections (journald viewer was collapsing on every 5s refresh; same issue for approval diff blocks). before re-render we snapshot which <details data-restore-key=...> are open; after render we re-apply. setting .open = true programmatically also fires the toggle event, so journald's lazy-fetch listener re-runs cleanly. tagged: journal:<container>, approval-diff:<id>. anything else that should survive a refresh just needs a stable data-restore-key attribute.	2026-05-15 21:37:17 +02:00
müde	3b532753b3	notifications: per-event tags + debug logs bug: all notifications used tag='hyperhive', so each new fire replaced the previous — operator only ever saw one at a time and might miss the fact that a second arrived. now per-event tags (hyperhive:approval:<id>, hyperhive❓<id>, hyperhive:msg:<at>:<rand>) so distinct events stack in the OS notification center. dropped the bogus icon (was pointing at dashboard.css) — some browsers refuse to display a notification with an invalid icon. added console.debug at every block point (not supported, permission not granted, muted) and a 'shown' log on success, so the operator can see in the browser console exactly why a notification didn't fire. note for the operator: most browsers also suppress notifications while the originating tab is FOCUSED. that's a browser-level decision, not ours.	2026-05-15 21:34:21 +02:00
müde	d275b50177	dashboard: don't yank the form away while operator is typing every refreshState tick does root.innerHTML = '' across the managed sections, which destroys any focused input. detect the case before re-rendering: if document.activeElement is an INPUT / TEXTAREA / SELECT inside one of the managed sections, skip this tick and try again in 2s. eventually the operator blurs and the refresh lands. managed section ids: containers / tombstones / questions / inbox / approvals. msgflow + message-flow SSE rows don't have inputs so they're not affected.	2026-05-15 21:19:01 +02:00
müde	acaa0eb895	agent_web_port: back to pure hash, drop port-file dance operator's call: probing-forward + state-file machinery is more brittle than the bug it tried to fix. revert to the original deterministic FNV-1a hash mod 900. collisions are real but rare; operator resolves by renaming (different name → different hash) and rebuilding. no per-agent port file, no scan, no migration path, nothing to drift out of sync with the running container. existing port files on disk are silently ignored — operator rebuilds affected agents to regenerate flakes from the deterministic hash.	2026-05-15 21:17:31 +02:00
müde	c35f566d15	agent_web_port: actually resolve legacy collisions previous attempt was wrong: the legacy branch returned port_hash unconditionally, so two legacies hashing to the same port both wrote that port and the collision persisted (test still trying to bind coder's port). new rule: always probe forward from port_hash, with scan_taken_ports parameterised by include_implicit_hashes: - legacy migration (applied dir exists, no port file): pass false. scan only counts other agents' port files. first-queried legacy claims its hash; subsequent colliders see the first's port file and probe forward. we don't know which legacy originally won the bind race, so first-write-wins; the loser was already crash-looping anyway and gets a fresh port to rebuild to. - fresh spawn (no applied dir): pass true. counts port files AND implicit hashes for not-yet-migrated legacies, so a new spawn doesn't race with an unmigrated peer. migration note for affected users: agents whose port file says something other than their hashed port may have been corrupted by the previous fix. Hit ↻ R3BU1LD on the offender to regenerate the flake (uses the current port file) and the container will bind the right port on restart.	2026-05-15 21:13:17 +02:00
müde	237b215c55	dashboard: browser notifications for operator-bound events three signals fire OS notifications: - new approval lands in the queue (per id, via /api/state delta) - new ask_operator question queued (per id) - broker message sent to operator (live via SSE) first /api/state render after page load seeds the 'seen' sets without firing — only items that arrive while the page is open count. controls in a row under the banner: 🔔 enable notifications (calls requestPermission, hides on grant), 🔕 mute / 🔔 unmute toggle (localStorage-backed so operator can silence without revoking the permission), inline status text when blocked or unsupported. notification tag='hyperhive' collapses rapid bursts; onclick focuses the dashboard tab. requires secure context (HTTPS or localhost) — on other origins the API is unavailable and the controls hide themselves. todo: entry dropped.	2026-05-15 21:10:20 +02:00
müde	58c3cd853b	container crash watcher → HelperEvent::ContainerCrash new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).	2026-05-15 21:02:05 +02:00
müde	6db38cf70c	model: runtime override via /model slash; fixes for port + bind - runtime model override: Bus::{model,set_model} + POST /api/model (form-encoded {model: name}). turn.rs reads bus.model() per turn so a flip lands on the next claude invocation. /api/state grows a model field; agent page shows a 'model · <name>' chip in the state row. '/model <name>' slash command POSTs to the endpoint and refreshes state. - port regression fix: agent_web_port no longer probes forward for existing agents (the previous fix shifted ports for any agent without a port file, including legacy ones whose container was already bound to the bare hashed port — dashboard rendered the new port, container was still on the old one, conn errors). new rule: port file exists → use it; absent + applied flake present → legacy, persist port_hash without probing; absent + no applied flake → fresh spawn, probe forward. - SO_REUSEADDR on both the dashboard and per-agent web UI binds via tokio::net::TcpSocket. operator hit 12 retries failing on manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly without a new dep; retry still covers the genuine process-still-alive overlap. todo: drops the model-override entry (shipped); adds two new items — model persistence (optional, future), and custom per-agent MCP tools (groundwork for moving bitburner-agent into hyperhive).	2026-05-15 20:59:45 +02:00
müde	7d93dd9db4	no nap tool — recv with long wait_seconds replaces it; max raised to 180s recv-with-timeout is strictly better than a fixed sleep because it wakes instantly on incoming messages. drop the half-written nap MCP tool, raise the recv wait_seconds cap from 60s to 180s on both agent and manager sockets. prompts updated: agent.md + manager.md now spell out the pattern — when there's nothing else useful to do, call recv with wait_seconds=180 to park the turn; do NOT use Bash sleep for the same purpose. todo drops the nap entry and the napping-state-badge follow-up; both replaced by 'just use a long recv'.	2026-05-15 20:53:15 +02:00
müde	f65ee88269	recv: optional wait_seconds parameter, capped at 60s AgentRequest::Recv and ManagerRequest::Recv grow an optional wait_seconds field (default None → 30s, capped at 60s server-side). agent_server / manager_server clamp via recv_timeout(). MCP tool schemas advertise the param so claude can pick its own poll window — useful when an agent wants to throttle wakes without entering a distinct nap state. both harness loops still pass None, keeping the existing 30s default behaviour for system-level Recvs.	2026-05-15 20:49:33 +02:00
müde	0385d96bf3	dashboard: per-container journald viewer new GET /api/journal/{name}?unit=&lines= shells out journalctl -M <container> -b --no-pager --output=short-iso --lines=<N> (cap 5000). optional unit filter, restricted to hive-ag3nt.service / hive-m1nd.service so the shell-out can't be coerced into reading unrelated units. validates the container name against the live list before invoking journalctl. frontend renders a collapsed '↳ logs · <container>' details block on each container row. expanding triggers a lazy fetch; refresh button re-fetches; unit dropdown switches between the harness service (default) and the full machine journal. output sits in a 24em-tall monospace pre, auto-scrolled to the bottom on fresh fetch. hive-c0re's systemd unit already runs as root, so journalctl has the access it needs.	2026-05-15 20:42:56 +02:00
müde	79a46f359a	agent_web_port: collision-aware sticky allocation operator hit 'coder' and 'test' colliding on the same hashed port — fnv-1a mod 900 has ~0.1% collision probability per pair and clearly that's not enough. agent_web_port goes stateful: - per-agent port persisted to /var/lib/hyperhive/agents/<name>/port - on first call, look up the file; if absent, hash, then probe forward through the allocated range skipping any port other agents already claim, then write the chosen value back - subsequent calls return the persisted port (sticky) other agents' ports come from their port file if present, else the fallback is the hashed value — that handles existing deployments without forcing a rebuild-all just to migrate. rebuilding the colliding agent re-runs agent_web_port, sees its peer's implicit hash port as taken, picks the next free slot, persists. range exhaustion (very unlikely — 900 slots) logs a warning and returns the hash; the bind-with-retry on the harness will surface the failure honestly rather than silently looping.	2026-05-15 20:41:18 +02:00
müde	754db7830e	ask_operator: ttl_seconds auto-cancel + remaining-time chip manager can pass ttl_seconds to ask_operator. on submit, host stores deadline_at = now + ttl in operator_questions (new column, migrated via existing pragma_table_info pattern), spawns a tokio task that sleeps until the deadline then resolves the question with answer '[expired]' and fires the same OperatorAnswered helper event. already-resolved races no-op silently. dashboard renders a '⏳ MM:SS' chip on the question row when deadline_at is set. format collapses seconds → s, < 1h → m s, ≥ 1h → h m. heartbeat refresh (5s) keeps the chip current; the operator sees it tick down. manager prompt + mcp tool description updated. journald viewer per container queued in todo (separate task).	2026-05-15 20:38:02 +02:00
müde	2146e47770	web ui: retry binding on AddrInUse during restart races operator hit 'Address already in use (os error 98)' on a harness restart — the new harness raced the old socket's release. add a bind_with_retry helper that backs off (250ms doubling, capped at 2s, 12 tries ≈ 22s total) on AddrInUse before giving up. applied to both the per-agent web UI and the hive-c0re dashboard. proper fix would be SO_REUSEADDR via socket2 but retry covers the TIME_WAIT case fine and keeps the dep count down. Other bind errors still fail immediately (port permission, fd exhaustion).	2026-05-15 20:33:51 +02:00
müde	538e0446d7	agent page: inbox view of last 30 messages addressed to this agent new wire request AgentRequest::Recent { limit } / ManagerRequest::Recent (plus matching responses with Vec<InboxRow>). InboxRow moved to hive-sh4re so it lives on both surfaces without an internal-to-wire conversion. host-side dispatch in agent_server / manager_server calls broker.recent_for(name, limit). per-agent web_ui /api/state grew an inbox: Vec<InboxRow> populated via the same per-agent socket (best-effort; transport failure returns empty). frontend renders as a collapsible <details> section between the state row and the terminal — fmt timestamp / from / body in a tight grid, capped at 16em scrollable. only visible when there are rows.	2026-05-15 20:32:19 +02:00

1 2 3 4

185 commits