hyperhive

Author	SHA1	Message	Date
müde	aed43ce4df	dashboard: tombstones + meta_inputs events — last /api/state refetches drop new DashboardEvent::TombstonesChanged + MetaInputsChanged carry full snapshots (lists are tiny; snapshot beats diff for race avoidance). Coordinator-side helpers emit_tombstones_snapshot + emit_meta_inputs_snapshot fire from every mutation site: actions::destroy + post_purge_tombstone + actions::approve (spawn finalise consumes tombstone) + run_meta_update + auto_update::rebuild_agent (lock bumps). client adds derived stores + apply* handlers + drops the post-submit refetch on PURG3 (container row + tombstone row) and meta-update. after this commit /api/state is fetched exactly once per page session (cold load); every other change rides the SSE channel.	2026-05-17 23:52:12 +02:00
müde	e7ce35c503	phase 6: container events + drop the 5s /api/state poll new DashboardEvent::ContainerStateChanged + ContainerRemoved close the last refetch loop on the dashboard. Coordinator's rescan_containers_and_emit diffs a fresh container_view::build_all against a cached last_containers map and fires per-row events. called from actions::approve (post-spawn), actions::destroy, the lifecycle_action wrapper, auto_update::rebuild_agent, and the existing 10s crash_watch poll. ContainerView extracted to its own module so coordinator and dashboard can both build it. dashboard endpoints flip to 200; container-lifecycle forms carry data-no-refresh. client drops the periodic poll entirely — initial cold load + SSE for everything afterwards. pending overlay reads from the existing transientsState since the new event payload doesn't carry it. PURG3 + meta-update keep the post-submit refetch since tombstones + meta_inputs aren't event-derived yet; tracked in TODO.md.	2026-05-17 22:01:15 +02:00
müde	313121a6e9	fix: transient state leak via RAII guard bare set_transient/clear_transient pairs leak the in-memory transient on task cancellation, panics, or any early return between the two calls — dashboard then shows the agent stuck in 'rebuilding…' forever (coder hit this today). add Coordinator::transient_guard returning a TransientGuard whose Drop clears, and convert every caller (dashboard lifecycle_action, auto_update::rebuild_agent, manager_server Update, actions::destroy, actions Spawn task, migrate phase 4). destroy() now takes &Arc<Coordinator> so it can hold a guard. existing stuck transients clear on next hive-c0re restart since transient state is in-memory only.	2026-05-16 19:47:52 +02:00
müde	d06b598c56	kick_agent on every rebuild + apply path agents weren't being woken with the 'you were rebuilt — check /state/ for notes, --continue intact' system message after several recent rebuild surfaces: - auto_update::rebuild_agent — used by the dashboard rebuild button, admin-CLI rebuild via lifecycle_action, the startup rev-scan, AND the new meta-input update batch loop. kick moves into rebuild_agent's success arm so all four paths benefit. (the dashboard's lifecycle_action extra closure was already firing kick — now it's a no-op for the rebuild path since rebuild_agent does it.) - actions::run_apply_commit — apply-commit approve flow built + tagged deployed/<id> but never kicked. add kick on success with the more specific 'config update applied' hint. - server.rs::HostRequest::Rebuild — the admin-CLI direct path calls lifecycle::rebuild bypassing rebuild_agent. add kick on success. dashboard's restart / start lifecycle_action extras still kick via their own closures since they don't route through rebuild_agent. stop / kill / destroy intentionally don't kick — there's nothing to wake.	2026-05-16 04:20:01 +02:00
müde	266c2c7a77	dashboard: meta flake inputs UI + sequential rebuild loop new section 'M3T4 1NPUTS' between approvals and message flow: one row per input in meta/flake.lock (hyperhive first, then agent-<n> alphabetically). each row shows the input name, the first 12 chars of the locked sha, a relative timestamp from locked.lastModified, and the original.url when available. checkbox per row; submit button is disabled until at least one box is checked; submitting confirms then POSTs the selected names to /meta-update. backend: - meta::lock_update(inputs: &[String]) — runs 'nix flake update <names>' in the meta dir, commits the lock change with a combined message ('lock update: hyperhive, agent-coder'). preserves the existing META_LOCK serialization. existing lock_update_for_rebuild / lock_update_hyperhive stay for their single-input callers. - POST /meta-update — comma-separated 'inputs' form field (JS joins checkboxes since axum::Form doesn't natively decode repeated keys); spawns a background task that runs the lock update + per-agent rebuild loop. hyperhive selection fans out to all agents; agent-<n> selection only rebuilds <n>. each rebuild fires Rebuilt to the manager exactly like dashboard / admin-CLI / auto-update. rebuild loop is sequential — auto_update::run too (was parallel via tokio::spawn). parallel rebuilds collide on nix-store's sqlite cache ('sqlite db busy, not using cache') and the meta META_LOCK contention. nix-daemon serializes the heavy build steps anyway, so this isn't a throughput loss.	2026-05-16 03:38:07 +02:00
müde	50ef806266	operator pronouns: configurable free-text, threaded into prompts new NixOS module option services.hive-c0re.operatorPronouns (free text, default 'she/her', example 'they/them'). hive-c0re takes it as a CLI flag (--operator-pronouns, lib.escapeShellArg'd in the systemd unit), stores it on Coordinator, threads it into the meta flake's mkAgent so each agent's systemd service gets HIVE_OPERATOR_PRONOUNS set. the harness reads the env at boot and substitutes {operator_pronouns} into the agent / manager system prompt alongside {label}. nix string is escaped against backslash + double-quote so non-ascii / quoted values round-trip safely. prompt addendum: both agent.md and manager.md mention the operator's pronouns up front so claude uses them naturally in third-person reference. propagates on next ↻ R3BU1LD (meta lock bump, no per-agent approval).	2026-05-16 02:05:22 +02:00
müde	d202f3785c	suppress crash_watch during background rebuilds + meta repoint crash_watch fires ContainerCrash whenever it sees a previously- running container in a non-running state without a transient flag set. dashboard rebuilds already set Rebuilding via lifecycle_action; the two other rebuild paths didn't: - migrate::repoint_container: phase 4 walks every container, each nixos-container update activation briefly takes the systemd unit down. previously fired ContainerCrash for every agent during the migration; manager would then spuriously call start() on agents that were already coming back up. - auto_update::rebuild_agent: startup scan + admin-socket caller bypass lifecycle_action. both paths now set the Rebuilding transient around the rebuild + clear after. matches what dashboard does.	2026-05-16 01:12:48 +02:00
müde	87016cd567	auto_update: bump meta hyperhive input before per-agent rebuilds auto_update::run now calls meta::lock_update_hyperhive once up-front so the per-agent rebuilds it kicks off rebuild against the new base. lifecycle::rebuild already drives sync_agents + lock_update_for_rebuild per agent, so the rev-marker shortcut keeps its meaning ('we've ack'd this rev for this agent') without further plumbing. failures of the hyperhive lock bump log + continue — individual rebuilds will surface concrete errors if anything's really wrong.	2026-05-16 00:32:55 +02:00
müde	871e7bf3fa	wire types: add sha + tag to Approval and HelperEvent approval grows fetched_sha (canonical hive-c0re-vouched sha, distinct from manager-supplied commit_ref). helperevent {approvalresolved,spawned,rebuilt} grow optional sha + tag so the manager can git-show the exact tree it's hearing about (against the upcoming /agents/<n>/applied.git RO mount) and know which terminal tag landed. all serde-defaulted; existing construction sites pass none until the tag-driven flow lands.	2026-05-15 22:47:39 +02:00
müde	ff8f8c7c56	per-agent /state dir for durable notes; manager sees them via /agents	2026-05-15 18:00:08 +02:00
müde	37c6504462	manager events: Spawned/Rebuilt/Killed/Destroyed + start button	2026-05-15 17:38:41 +02:00
müde	e1289a3e4c	nix templates: factor harness-base.nix (shared scaffolding incl. gitconfig)	2026-05-15 16:10:55 +02:00
müde	f1fd787f17	rebuild button on agent UI (cross-origin POST to dashboard /rebuild)	2026-05-15 15:57:11 +02:00
müde	824914807a	ensure_manager: rebuild hm1nd if applied flake missing (migration safety)	2026-05-15 15:53:39 +02:00
müde	f99ed3fe7a	manager: same lifecycle as agents; auto-spawn on hive-c0re start	2026-05-15 13:43:32 +02:00
müde	e777576528	auto-update: surface pending updates in dashboard + include manager	2026-05-15 13:31:33 +02:00
müde	a4e1556f90	auto-update agents on startup when hyperhive rev changes	2026-05-15 13:25:27 +02:00

17 commits