hyperhive

Author	SHA1	Message	Date
müde	4cb529351e	lifecycle::rebuild through meta rebuild now does sync_agents (idempotent — no-op when the rendered flake matches disk; recovers from a divergent meta repo on the side) followed by lock_update_for_rebuild which relocks just this agent's input and commits the lock change if any. flake ref for nixos-container update flips from applied/<n>#default to meta#<name>. new helper meta::lock_update_for_rebuild is single-phase (no separate finalize): rebuild has no failure-revert semantics — it always wants the latest applied/<n>/main. spawn already syncs meta before container create; rebuild now picks up the meta side on every manual ↻ R3BU1LD.	2026-05-16 00:28:26 +02:00
müde	8f94e4379a	lifecycle::spawn through meta after setup_proposed + setup_applied, spawn now syncs the meta flake (one input + one nixosConfiguration per agent) so `--flake /var/lib/hyperhive/meta#<name>` resolves before nixos-container create runs. flake ref switches from applied/<n>#default to meta#<name>; the wrapper modules (identity, HIVE_PORT, HIVE_LABEL, HIVE_DASHBOARD_PORT) now live in the meta flake's mkAgent. new helper agents_for_meta builds the AgentSpec list by enumerating containers + optionally appending a not-yet-present name for the spawn case. spawn keeps its caller signature; rebuild + auto_update get wired up in follow-up commits.	2026-05-16 00:27:12 +02:00
müde	c42ad1330c	lifecycle: pre-wire applied remote in proposed setup_proposed now lands a git remote named 'applied' on every proposed/<n>/config pointing at /applied/<n>/.git — the path as seen from inside the manager container, where the RO bind in set_nspawn_flags makes the URL resolve. From the manager: git fetch applied git log applied/main git show applied/refs/tags/deployed/<id> git diff applied/main HEAD git rebase applied/main all work without manually constructing the path each time. The RO bind blocks push at the kernel level so the remote can only fetch. Idempotent — also applied to pre-existing proposed repos (no-op if the remote is already correct, set-url if drifted) so the startup migration picks up the wiring on existing agents.	2026-05-16 00:25:43 +02:00
müde	3d14ddeb7d	lifecycle: bind /meta RO into manager set_nspawn_flags now adds a third manager-only bind alongside /agents (RW) and /applied (RO): --bind-ro=/var/lib/hyperhive/meta :/meta. manager can git log /meta to see every deploy across the swarm and cat /meta/flake.lock to introspect which sha each agent is currently pinned at. defensive create_dir_all on the host side so a cold start with no agents (meta repo not yet seeded) doesn't trip systemd-nspawn's missing-bind-source check before the migration plants the dir.	2026-05-16 00:24:39 +02:00
müde	92822efe16	meta: new hive-c0re module owns /var/lib/hyperhive/meta/ leaf module with no runtime callers yet (every public item is #[allow(dead_code)] until lifecycle / actions / auto_update rewire to use it). API surface: - sync_agents — idempotent: render flake.nix for the given agent set, git-init on first call, nix flake lock, commit if anything changed. - prepare_deploy / finalize_deploy / abort_deploy — two-phase for the request_apply_commit path. prepare runs nix flake lock --update-input agent-<n> without committing; finalize commits with a 'deploy <n> deployed/<id> <sha12>' message; abort git-restores the lock so a failed build leaves no orphan commit. - lock_update_hyperhive — one-shot for the auto-update path. flake.nix template defines mkAgent that pulls each agent's nixosModules.default from its input and wraps with the identity / HIVE_PORT / HIVE_LABEL / HIVE_DASHBOARD_PORT module — what setup_applied used to generate inline. nix invocations carry --extra-experimental-features as a belt in case flakes aren't enabled in nix.conf.	2026-05-16 00:22:37 +02:00
müde	5b5a93e0c6	lifecycle: module-only agent flake.nix, tracked in proposed setup_proposed now seeds both agent.nix (a regular NixOS module function) and flake.nix (boilerplate exporting nixosModules.default = import ./agent.nix) into the manager-editable proposed repo, committed together. setup_applied's hyperhive_flake + dashboard port wrapper generation is deleted entirely — the meta flake at /var/lib/hyperhive/meta/ now owns the wrapper module. setup_ applied just fetches proposed's main on first spawn and tags deployed/0; subsequent rebuilds touch nothing in applied that the manager didn't author. spawn + rebuild keep their old param list with the now-unused hyperhive_flake + dashboard_port underscored — call sites get cleaned up after the meta module lands and consumes them.	2026-05-16 00:10:06 +02:00
müde	e26143a412	dashboard: diff against applied/proposal/<id>, prefer fetched_sha approval_diff now runs git diff refs/heads/main..refs/tags/ proposal/<id> against the applied repo instead of cobbling a single-file diff from proposed. consequences: multi-file proposals show every change, manager amendments in proposed cannot lie about what'll be deployed, no-op proposals render an explicit '(proposal matches currently-deployed tree)'. displayed sha prefers fetched_sha (hive-c0re-vouched) and falls back to commit_ref only for the brief pre-fetch window. unified_diff helper + similar dep dropped — git diff is the source of truth now. dead-code allows on the lifecycle git helpers + approvals.set_fetched_sha come off since all are wired up. readme picks up the tag flow + /applied RO mount.	2026-05-15 23:18:17 +02:00
müde	fc61cb9310	fmt: clippy doc_markdown backticks	2026-05-15 23:11:10 +02:00
müde	4a8204f035	lifecycle: bind /applied into manager read-only set_nspawn_flags now adds --bind-ro=/var/lib/hyperhive/applied :/applied for the manager container alongside the existing /agents RW mount. manager can git-fetch deployed/failed/denied tags out of /applied/<n>/.git to mirror them into its proposed clones; the read-only bind means git plumbing inside the container cannot corrupt the authoritative repos. picked up by the next rebuild of hm1nd (no spawn-time change needed since set_nspawn_flags runs on every spawn + rebuild).	2026-05-15 23:02:31 +02:00
müde	315d4289c7	actions: tag-driven approve(ApplyCommit) flow run_apply_commit walks the approval through the tag state machine in applied: approved/<id> + building/<id> stamped before the build, then git read-tree --reset to proposal/<id> populates the working dir without moving HEAD. on rebuild success deployed/<id> is planted and refs/heads/main fast- forwards to the proposal. on failure failed/<id> is annotated with the build error and the working tree resets back to main so the agent stays evaluable. helper events Rebuilt + ApprovalResolved both carry the terminal tag so the manager can git-show the exact tree (and read the failure note from an annotated tag) against its read-only applied.git mount. finish_approval grows a terminal_tag param; spawn path passes None. lifecycle::apply_commit deleted.	2026-05-15 23:00:01 +02:00
müde	8cb8fcedad	lifecycle: setup_applied seeds via fetch + tags deployed/0 new shape: applied is git-init'd at first spawn, fetches proposed's initial commit into its main, tags deployed/0 there. the wrapper flake.nix is regenerated on every spawn/rebuild but no longer tracked — apply churn vanishes, manager-authored files in the proposal flow now survive untouched. setup_applied gains an Option<&Path> for proposed (None on rebuild paths that just refresh the flake). pre-overhaul applied dirs are detected via the missing deployed/0 tag and bail loudly with the destroy --purge migration hint. apply_commit is stubbed with a clear error until the tag-driven approve flow lands.	2026-05-15 22:56:58 +02:00
müde	63ef69674b	lifecycle: git helpers for tag-driven applied repo new plumbing for the upcoming flow: git_fetch_to_tag (pulls a sha from proposed into applied and pins it as a tag in one shot), git_rev_parse (normalises shas + reads back tag targets), git_tag / git_tag_annotated (lightweight vs body- carrying for failed/denied), git_read_tree_reset (replace working tree without moving HEAD — lets main stay on last known-good across an in-flight build), git_update_ref (ff main on deploy). annotated tag bodies go via stdin to avoid escape games. all dead-code-allowed; callers land in subsequent commits.	2026-05-15 22:52:23 +02:00
müde	6a2ffd521b	surface agent-vs-agent port collisions (manager:8000 can't collide) manager is fixed at 8000, sub-agents are 8100-8999, so collisions are strictly between two sub-agents hashing to the same value. the colliding container's harness restart-loops on AddrInUse — which the user just hit on :8945. previously the only sign was a buried journalctl warn line. now surfaced two ways: - lifecycle::spawn / rebuild preflight: walks the live container list, computes each agent's hashed port, refuses with 'port N already taken by <other> — rename one of them' if any running sub-agent shares the new agent's port. so the operator sees an actionable error in the dashboard's transient pill / approve-result instead of waiting for the harness to die. - /api/state grows a port_conflicts: [{port, agents: [...]}] array; dashboard renders a pulsing red banner above the containers list listing each cluster. matches the questions panel pulse so it's hard to miss.	2026-05-15 22:08:19 +02:00
müde	acaa0eb895	agent_web_port: back to pure hash, drop port-file dance operator's call: probing-forward + state-file machinery is more brittle than the bug it tried to fix. revert to the original deterministic FNV-1a hash mod 900. collisions are real but rare; operator resolves by renaming (different name → different hash) and rebuilding. no per-agent port file, no scan, no migration path, nothing to drift out of sync with the running container. existing port files on disk are silently ignored — operator rebuilds affected agents to regenerate flakes from the deterministic hash.	2026-05-15 21:17:31 +02:00
müde	c35f566d15	agent_web_port: actually resolve legacy collisions previous attempt was wrong: the legacy branch returned port_hash unconditionally, so two legacies hashing to the same port both wrote that port and the collision persisted (test still trying to bind coder's port). new rule: always probe forward from port_hash, with scan_taken_ports parameterised by include_implicit_hashes: - legacy migration (applied dir exists, no port file): pass false. scan only counts other agents' port files. first-queried legacy claims its hash; subsequent colliders see the first's port file and probe forward. we don't know which legacy originally won the bind race, so first-write-wins; the loser was already crash-looping anyway and gets a fresh port to rebuild to. - fresh spawn (no applied dir): pass true. counts port files AND implicit hashes for not-yet-migrated legacies, so a new spawn doesn't race with an unmigrated peer. migration note for affected users: agents whose port file says something other than their hashed port may have been corrupted by the previous fix. Hit ↻ R3BU1LD on the offender to regenerate the flake (uses the current port file) and the container will bind the right port on restart.	2026-05-15 21:13:17 +02:00
müde	6db38cf70c	model: runtime override via /model slash; fixes for port + bind - runtime model override: Bus::{model,set_model} + POST /api/model (form-encoded {model: name}). turn.rs reads bus.model() per turn so a flip lands on the next claude invocation. /api/state grows a model field; agent page shows a 'model · <name>' chip in the state row. '/model <name>' slash command POSTs to the endpoint and refreshes state. - port regression fix: agent_web_port no longer probes forward for existing agents (the previous fix shifted ports for any agent without a port file, including legacy ones whose container was already bound to the bare hashed port — dashboard rendered the new port, container was still on the old one, conn errors). new rule: port file exists → use it; absent + applied flake present → legacy, persist port_hash without probing; absent + no applied flake → fresh spawn, probe forward. - SO_REUSEADDR on both the dashboard and per-agent web UI binds via tokio::net::TcpSocket. operator hit 12 retries failing on manager :8000 — REUSEADDR handles the TIME_WAIT case cleanly without a new dep; retry still covers the genuine process-still-alive overlap. todo: drops the model-override entry (shipped); adds two new items — model persistence (optional, future), and custom per-agent MCP tools (groundwork for moving bitburner-agent into hyperhive).	2026-05-15 20:59:45 +02:00
müde	79a46f359a	agent_web_port: collision-aware sticky allocation operator hit 'coder' and 'test' colliding on the same hashed port — fnv-1a mod 900 has ~0.1% collision probability per pair and clearly that's not enough. agent_web_port goes stateful: - per-agent port persisted to /var/lib/hyperhive/agents/<name>/port - on first call, look up the file; if absent, hash, then probe forward through the allocated range skipping any port other agents already claim, then write the chosen value back - subsequent calls return the persisted port (sticky) other agents' ports come from their port file if present, else the fallback is the hashed value — that handles existing deployments without forcing a rebuild-all just to migrate. rebuilding the colliding agent re-runs agent_web_port, sees its peer's implicit hash port as taken, picks the next free slot, persists. range exhaustion (very unlikely — 900 slots) logs a warning and returns the hash; the bind-with-retry on the harness will surface the failure honestly rather than silently looping.	2026-05-15 20:41:18 +02:00
müde	ff8f8c7c56	per-agent /state dir for durable notes; manager sees them via /agents	2026-05-15 18:00:08 +02:00
müde	8428c693e0	dashboard: stop/restart per-container + update-all when any stale	2026-05-15 17:00:56 +02:00
müde	0f0e242906	programs.git.enable + harness PATH tracks systemPackages - harness-base.nix: switch to programs.git for declarative gitconfig. - agent + manager service path = /run/current-system/sw → agents pick up new packages from their own agent.nix without harness edits. - generated applied/<name>/flake.nix overrides programs.git.config.user (no more raw etc.gitconfig collision).	2026-05-15 16:16:14 +02:00
müde	e1289a3e4c	nix templates: factor harness-base.nix (shared scaffolding incl. gitconfig)	2026-05-15 16:10:55 +02:00
müde	f1fd787f17	rebuild button on agent UI (cross-origin POST to dashboard /rebuild)	2026-05-15 15:57:11 +02:00
müde	f99ed3fe7a	manager: same lifecycle as agents; auto-spawn on hive-c0re start	2026-05-15 13:43:32 +02:00
müde	a42fdb3a5c	phase 8 step 1: per-agent claude creds bind + destroy keeps state	2026-05-15 12:39:22 +02:00
müde	0fc287c768	fmt	2026-05-15 02:58:35 +02:00
müde	b711296460	destroy verb: CLI + admin socket + dashboard button; purges state + approvals	2026-05-15 02:57:22 +02:00
müde	fcd6563887	fmt	2026-05-15 02:02:20 +02:00
müde	07a5d3a778	lifecycle: clear HOST_ADDRESS/LOCAL_ADDRESS/HOST_BRIDGE — start script's --network-veth was forcing private netns	2026-05-15 01:51:12 +02:00
müde	59de7fa3c5	lifecycle: force PRIVATE_NETWORK=0 so per-agent web UI port reaches host	2026-05-15 01:35:30 +02:00
müde	ee99774d17	Phase 7d: per-container MemoryMax + CPUQuota via systemd drop-in	2026-05-15 00:30:48 +02:00
müde	7c1ed07cf2	lifecycle: HYPERHIVE_GIT env override (bypass PATH); module sets it	2026-05-15 00:24:51 +02:00
müde	6dbf4eedd7	lifecycle: u16::try_from instead of as-cast	2026-05-14 23:39:53 +02:00
müde	d0f954bbc1	Phase 6a: per-container web UI (axum); per-agent port hashed from name	2026-05-14 23:39:06 +02:00
müde	967ec7c9d7	fmt	2026-05-14 23:22:00 +02:00
müde	2fd80dbd68	Phase 5c: separate proposed (manager) and applied (hive-c0re) repos; per-agent gitconfig	2026-05-14 23:20:32 +02:00
müde	3c702cf43f	fmt	2026-05-14 23:10:37 +02:00
müde	433c0d212e	Phase 5b: per-agent config flakes; approve validates + advances commit	2026-05-14 23:09:35 +02:00
müde	6e7fd2e897	Phase 3c: nixpkgs-unstable for claude-code; harness calls claude --print, falls back to echo	2026-05-14 22:26:14 +02:00
müde	764d6497dd	lifecycle: rebuild reconciles bind flag idempotently and restarts	2026-05-14 22:09:22 +02:00
müde	377eb994a1	lifecycle: bind via EXTRA_NSPAWN_FLAGS in /etc/nixos-containers/<name>.conf	2026-05-14 22:06:27 +02:00
müde	326da5a7bf	naming: h-<name> for agents, hm1nd for manager (11-char limit)	2026-05-14 21:59:01 +02:00
müde	7ce0f0022f	lifecycle: bind agent dir via /run/systemd/nspawn override (nixos-container lacks --bind)	2026-05-14 21:52:17 +02:00
müde	f6cf4223a4	lifecycle: surface nixos-container stderr in error + log	2026-05-14 21:48:23 +02:00
müde	d79b5a39a1	hive-c0re: in-memory broker + per-agent sockets + coordinator state	2026-05-14 21:42:51 +02:00
müde	90798b936e	hive-c0re: nixos-container lifecycle (spawn/kill/rebuild/list)	2026-05-14 20:51:35 +02:00

45 commits