Commit graph

31 commits

Author SHA1 Message Date
damocles
80dd5bb69e two-step agent spawn: request_init_config + request_spawn 2026-05-20 14:40:15 +02:00
müde
5aad2d67e1 forge: mirror applied config repos to a private agent-configs org
on startup (and after every applied-repo ref mutation) core pushes
each agent's hive-c0re-owned applied repo — main plus every
proposal/approved/building/deployed/failed/denied tag — to
agent-configs/<name> on the local forge. the org is private and
agents are not members, so core is the only principal that can read
it.

the tokenised push url is passed inline, never stored as a named
remote: the applied repo is bind-mounted read-only into the manager,
so a token in .git/config would leak the core admin credential to an
agent.

push_config is best-effort at every site (ensure_all, spawn,
approve, deny, submit) — a missing or down forge never blocks a
deploy.
2026-05-20 10:24:50 +02:00
damocles
f9f1346eae clippy: zero pedantic warnings across the tree 2026-05-18 22:09:34 +02:00
müde
aed43ce4df dashboard: tombstones + meta_inputs events — last /api/state refetches drop
new DashboardEvent::TombstonesChanged + MetaInputsChanged carry
full snapshots (lists are tiny; snapshot beats diff for race
avoidance). Coordinator-side helpers
emit_tombstones_snapshot + emit_meta_inputs_snapshot fire from
every mutation site: actions::destroy + post_purge_tombstone +
actions::approve (spawn finalise consumes tombstone) +
run_meta_update + auto_update::rebuild_agent (lock bumps).

client adds derived stores + apply* handlers + drops the
post-submit refetch on PURG3 (container row + tombstone row)
and meta-update.

after this commit /api/state is fetched exactly once per page
session (cold load); every other change rides the SSE channel.
2026-05-17 23:52:12 +02:00
müde
e7ce35c503 phase 6: container events + drop the 5s /api/state poll
new DashboardEvent::ContainerStateChanged + ContainerRemoved
close the last refetch loop on the dashboard. Coordinator's
rescan_containers_and_emit diffs a fresh container_view::build_all
against a cached last_containers map and fires per-row events.
called from actions::approve (post-spawn), actions::destroy,
the lifecycle_action wrapper, auto_update::rebuild_agent, and
the existing 10s crash_watch poll.

ContainerView extracted to its own module so coordinator and
dashboard can both build it. dashboard endpoints flip to 200;
container-lifecycle forms carry data-no-refresh. client drops
the periodic poll entirely — initial cold load + SSE for
everything afterwards. pending overlay reads from the existing
transientsState since the new event payload doesn't carry it.

PURG3 + meta-update keep the post-submit refetch since
tombstones + meta_inputs aren't event-derived yet; tracked in
TODO.md.
2026-05-17 22:01:15 +02:00
müde
56d615b51f dashboard: approval_added / approval_resolved mutation events + client derived state 2026-05-17 13:30:25 +02:00
müde
411cf86632 nix fmt + rustfmt sweep 2026-05-17 01:40:28 +02:00
müde
480d646f69 forge: auto-create a user + token per agent on spawn / startup
new forge module probes the hive-forge nixos-container (no-op when
absent), and ensures every agent + the manager has a forgejo user
named after them with an access token at `<state>/forge-token`
(visible inside the container as `/state/forge-token`).

idempotent: skips user creation when forgejo reports 'already
exists', skips token issuance when the file is present, scopes the
token to read:user,write:repository,write:issue. token-name suffixed
with a clock so re-issuing doesn't collide with a stale name. shells
out via `nixos-container run hive-forge -- runuser -u forgejo --
forgejo admin` (runuser instead of sudo since sudo isn't in the
container by default).

hooks: ensure_all sweeps existing containers at hive-c0re startup
(backgrounded), and the actions.rs spawn task calls ensure_user_for
the new agent right after lifecycle::spawn succeeds. failures log a
warning but don't abort spawn — a missing token is recoverable from
the next startup sweep.
2026-05-16 20:55:13 +02:00
müde
313121a6e9 fix: transient state leak via RAII guard
bare set_transient/clear_transient pairs leak the in-memory transient
on task cancellation, panics, or any early return between the two
calls — dashboard then shows the agent stuck in 'rebuilding…'
forever (coder hit this today). add Coordinator::transient_guard
returning a TransientGuard whose Drop clears, and convert every
caller (dashboard lifecycle_action, auto_update::rebuild_agent,
manager_server Update, actions::destroy, actions Spawn task,
migrate phase 4). destroy() now takes &Arc<Coordinator> so it can
hold a guard. existing stuck transients clear on next hive-c0re
restart since transient state is in-memory only.
2026-05-16 19:47:52 +02:00
müde
d06b598c56 kick_agent on every rebuild + apply path
agents weren't being woken with the 'you were rebuilt — check
/state/ for notes, --continue intact' system message after
several recent rebuild surfaces:

- auto_update::rebuild_agent — used by the dashboard rebuild
  button, admin-CLI rebuild via lifecycle_action, the startup
  rev-scan, AND the new meta-input update batch loop. kick
  moves *into* rebuild_agent's success arm so all four
  paths benefit. (the dashboard's lifecycle_action extra
  closure was already firing kick — now it's a no-op for the
  rebuild path since rebuild_agent does it.)
- actions::run_apply_commit — apply-commit approve flow built
  + tagged deployed/<id> but never kicked. add kick on
  success with the more specific 'config update applied' hint.
- server.rs::HostRequest::Rebuild — the admin-CLI direct path
  calls lifecycle::rebuild bypassing rebuild_agent. add kick
  on success.

dashboard's restart / start lifecycle_action extras still
kick via their own closures since they don't route through
rebuild_agent. stop / kill / destroy intentionally don't
kick — there's nothing to wake.
2026-05-16 04:20:01 +02:00
müde
50ef806266 operator pronouns: configurable free-text, threaded into prompts
new NixOS module option services.hive-c0re.operatorPronouns
(free text, default 'she/her', example 'they/them'). hive-c0re
takes it as a CLI flag (--operator-pronouns, lib.escapeShellArg'd
in the systemd unit), stores it on Coordinator, threads it into
the meta flake's mkAgent so each agent's systemd service gets
HIVE_OPERATOR_PRONOUNS set. the harness reads the env at boot
and substitutes {operator_pronouns} into the agent / manager
system prompt alongside {label}. nix string is escaped against
backslash + double-quote so non-ascii / quoted values
round-trip safely. prompt addendum: both agent.md and
manager.md mention the operator's pronouns up front so claude
uses them naturally in third-person reference. propagates on
next ↻ R3BU1LD (meta lock bump, no per-agent approval).
2026-05-16 02:05:22 +02:00
müde
06fdbac1ac actions::run_apply_commit through meta two-phase
approval-driven deploys now walk the meta flake via
prepare_deploy / finalize_deploy / abort_deploy so a failed
build leaves no commit in meta's deploy log:

1. capture applied/main sha for rollback
2. tag approved/<id> + building/<id>
3. ff applied/main to proposal/<id>, read-tree sync working tree
4. meta::prepare_deploy(name) — nix flake lock --update-input
   agent-<n> without committing
5. lifecycle::rebuild_no_meta — container-level only (new
   extracted helper; public lifecycle::rebuild still wraps it
   with single-phase meta sync + commit for dashboard / auto
   _update callers that don't care about rollback)
6a. on success: tag deployed/<id>, meta::finalize_deploy commits
    the staged lock with 'deploy <n> deployed/<id> <sha12>'
6b. on failure: tag failed/<id> annotated with the build error,
    git_update_ref applied/main back to prev sha, read-tree to
    main, meta::abort_deploy git-restores flake.lock

meta's git log now records only successful deploys; failures
+ denials still live in applied as annotated tags.
2026-05-16 00:32:16 +02:00
müde
22f35def8f actions::destroy syncs meta after lifecycle
once nixos-container destroy lands + per-agent state cleanup is
done, rerender the meta flake from the remaining containers so
the destroyed agent's input + nixosConfiguration drop off and
its flake.lock entry vanishes. log + keep going on meta-sync
failure — the destroy already succeeded at the lifecycle level,
so meta drift here is just bookkeeping. new public
lifecycle::agents_for_meta_listing exposes the agent
enumeration for callers outside the module.
2026-05-16 00:29:26 +02:00
müde
fc61cb9310 fmt: clippy doc_markdown backticks 2026-05-15 23:11:10 +02:00
müde
6cf66e23dc actions: deny plants annotated denied/<id> tag
apply-commit denials now leave a git object behind: tag
denied/<id> annotated with the operator's note (or empty body
if they didn't supply one) at proposal/<id> inside the applied
repo. rejected configs become first-class git history — git
show denied/<id> in the manager's applied.git mount yields the
tree the operator rejected plus the reason. helper event
carries the tag for parity with deployed/failed. spawn denials
fall through unannotated since they have no proposal commit.
deny becomes async (single git plumbing call); dashboard +
admin-socket callers grow .await.
2026-05-15 23:01:22 +02:00
müde
315d4289c7 actions: tag-driven approve(ApplyCommit) flow
run_apply_commit walks the approval through the tag state
machine in applied: approved/<id> + building/<id> stamped
before the build, then git read-tree --reset to proposal/<id>
populates the working dir without moving HEAD. on rebuild
success deployed/<id> is planted and refs/heads/main fast-
forwards to the proposal. on failure failed/<id> is annotated
with the build error and the working tree resets back to main
so the agent stays evaluable. helper events Rebuilt +
ApprovalResolved both carry the terminal tag so the manager
can git-show the exact tree (and read the failure note from
an annotated tag) against its read-only applied.git mount.
finish_approval grows a terminal_tag param; spawn path passes
None. lifecycle::apply_commit deleted.
2026-05-15 23:00:01 +02:00
müde
871e7bf3fa wire types: add sha + tag to Approval and HelperEvent
approval grows fetched_sha (canonical hive-c0re-vouched sha,
distinct from manager-supplied commit_ref). helperevent
{approvalresolved,spawned,rebuilt} grow optional sha + tag so
the manager can git-show the exact tree it's hearing about
(against the upcoming /agents/<n>/applied.git RO mount) and
know which terminal tag landed. all serde-defaulted; existing
construction sites pass none until the tag-driven flow lands.
2026-05-15 22:47:39 +02:00
müde
2029840671 deny: operator can attach a reason that reaches the manager
clicking DENY on the dashboard now prompts for an optional reason
('reason for denying (optional, sent to manager):'). the value
rides along as a hidden 'note' form field; backend chain:

  POST /deny/{id} { note }
    → actions::deny(coord, id, Some(note))
    → Approvals::mark_denied writes it to the row
    → HelperEvent::ApprovalResolved { ..., note: Some("...") }

manager already had note: Option<String> on the event, just never
populated for denials before. host admin socket (hive-c0re deny)
still passes None.

generalized the prompt-on-submit pattern: any form with a
data-prompt attribute pops a window.prompt() before the POST and
stashes the answer in a hidden input named by data-prompt-field
(default 'note'). reusable for future opt-in note fields.
2026-05-15 21:58:42 +02:00
müde
c337cc06f8 dashboard: spinners on in-flight lifecycle actions + cleaner row layout
backend:
- TransientKind grows Starting / Stopping / Restarting / Rebuilding /
  Destroying alongside the existing Spawning. each dashboard handler
  (start/restart/kill/rebuild/destroy) wraps the lifecycle call with
  set_transient + clear_transient so the dashboard knows what's in
  flight. transient kind is surfaced inline on ContainerView.pending
  (existing-container actions) — only Spawning (pre-creation) lands
  in the separate transients list.

frontend:
- container row is now two lines: identity + meta on top, action
  buttons below. less cluttered, leaves room for the pending state
  pill. pending rows dim their actions and surface a pulsing
  '◐ spawning… / starting… / stopping… / restarting… / rebuilding…
  / destroying…' indicator next to the name.
- 'needs login' / 'needs update' chips moved into a unified .badge
  styling for consistency.
- auto-refresh kicks in not only on transient spawn but on any
  container with a pending action.
2026-05-15 19:49:43 +02:00
müde
48ebfefd1a destroy --purge: also wipe agent state dirs
new --purge flag on the destroy verb (cli + admin socket + dashboard).
default destroy still keeps /var/lib/hyperhive/{agents,applied}/<name>/
so recreating with the same name reuses prior config + creds.
with --purge, both dirs go too (config history, claude creds, /state/
notes). no undo. dashboard adds a separate PURG3 button with an
explicit confirmation copy; the existing DESTR0Y button keeps the
soft semantics.

claude.md dashboard-action-surface section updated; todo entry
dropped.
2026-05-15 19:29:14 +02:00
müde
ff8f8c7c56 per-agent /state dir for durable notes; manager sees them via /agents 2026-05-15 18:00:08 +02:00
müde
37c6504462 manager events: Spawned/Rebuilt/Killed/Destroyed + start button 2026-05-15 17:38:41 +02:00
müde
e1289a3e4c nix templates: factor harness-base.nix (shared scaffolding incl. gitconfig) 2026-05-15 16:10:55 +02:00
müde
f1fd787f17 rebuild button on agent UI (cross-origin POST to dashboard /rebuild) 2026-05-15 15:57:11 +02:00
müde
f99ed3fe7a manager: same lifecycle as agents; auto-spawn on hive-c0re start 2026-05-15 13:43:32 +02:00
müde
c59fa8541c phase 8 step 2: approval-gated spawn + dashboard spinner 2026-05-15 12:53:13 +02:00
müde
a42fdb3a5c phase 8 step 1: per-agent claude creds bind + destroy keeps state 2026-05-15 12:39:22 +02:00
müde
b711296460 destroy verb: CLI + admin socket + dashboard button; purges state + approvals 2026-05-15 02:57:22 +02:00
müde
fcd6563887 fmt 2026-05-15 02:02:20 +02:00
müde
1ceabae892 Phase 7c: ApprovalResolved helper events into manager's inbox 2026-05-15 00:26:42 +02:00
müde
c82d41728c Phase 7a: dashboard approve/deny + unified diff (similar crate) 2026-05-15 00:06:10 +02:00