Commit graph

28 commits

Author SHA1 Message Date
müde
aed43ce4df dashboard: tombstones + meta_inputs events — last /api/state refetches drop
new DashboardEvent::TombstonesChanged + MetaInputsChanged carry
full snapshots (lists are tiny; snapshot beats diff for race
avoidance). Coordinator-side helpers
emit_tombstones_snapshot + emit_meta_inputs_snapshot fire from
every mutation site: actions::destroy + post_purge_tombstone +
actions::approve (spawn finalise consumes tombstone) +
run_meta_update + auto_update::rebuild_agent (lock bumps).

client adds derived stores + apply* handlers + drops the
post-submit refetch on PURG3 (container row + tombstone row)
and meta-update.

after this commit /api/state is fetched exactly once per page
session (cold load); every other change rides the SSE channel.
2026-05-17 23:52:12 +02:00
müde
e7ce35c503 phase 6: container events + drop the 5s /api/state poll
new DashboardEvent::ContainerStateChanged + ContainerRemoved
close the last refetch loop on the dashboard. Coordinator's
rescan_containers_and_emit diffs a fresh container_view::build_all
against a cached last_containers map and fires per-row events.
called from actions::approve (post-spawn), actions::destroy,
the lifecycle_action wrapper, auto_update::rebuild_agent, and
the existing 10s crash_watch poll.

ContainerView extracted to its own module so coordinator and
dashboard can both build it. dashboard endpoints flip to 200;
container-lifecycle forms carry data-no-refresh. client drops
the periodic poll entirely — initial cold load + SSE for
everything afterwards. pending overlay reads from the existing
transientsState since the new event payload doesn't carry it.

PURG3 + meta-update keep the post-submit refetch since
tombstones + meta_inputs aren't event-derived yet; tracked in
TODO.md.
2026-05-17 22:01:15 +02:00
müde
56d615b51f dashboard: approval_added / approval_resolved mutation events + client derived state 2026-05-17 13:30:25 +02:00
müde
411cf86632 nix fmt + rustfmt sweep 2026-05-17 01:40:28 +02:00
müde
480d646f69 forge: auto-create a user + token per agent on spawn / startup
new forge module probes the hive-forge nixos-container (no-op when
absent), and ensures every agent + the manager has a forgejo user
named after them with an access token at `<state>/forge-token`
(visible inside the container as `/state/forge-token`).

idempotent: skips user creation when forgejo reports 'already
exists', skips token issuance when the file is present, scopes the
token to read:user,write:repository,write:issue. token-name suffixed
with a clock so re-issuing doesn't collide with a stale name. shells
out via `nixos-container run hive-forge -- runuser -u forgejo --
forgejo admin` (runuser instead of sudo since sudo isn't in the
container by default).

hooks: ensure_all sweeps existing containers at hive-c0re startup
(backgrounded), and the actions.rs spawn task calls ensure_user_for
the new agent right after lifecycle::spawn succeeds. failures log a
warning but don't abort spawn — a missing token is recoverable from
the next startup sweep.
2026-05-16 20:55:13 +02:00
müde
313121a6e9 fix: transient state leak via RAII guard
bare set_transient/clear_transient pairs leak the in-memory transient
on task cancellation, panics, or any early return between the two
calls — dashboard then shows the agent stuck in 'rebuilding…'
forever (coder hit this today). add Coordinator::transient_guard
returning a TransientGuard whose Drop clears, and convert every
caller (dashboard lifecycle_action, auto_update::rebuild_agent,
manager_server Update, actions::destroy, actions Spawn task,
migrate phase 4). destroy() now takes &Arc<Coordinator> so it can
hold a guard. existing stuck transients clear on next hive-c0re
restart since transient state is in-memory only.
2026-05-16 19:47:52 +02:00
müde
d06b598c56 kick_agent on every rebuild + apply path
agents weren't being woken with the 'you were rebuilt — check
/state/ for notes, --continue intact' system message after
several recent rebuild surfaces:

- auto_update::rebuild_agent — used by the dashboard rebuild
  button, admin-CLI rebuild via lifecycle_action, the startup
  rev-scan, AND the new meta-input update batch loop. kick
  moves *into* rebuild_agent's success arm so all four
  paths benefit. (the dashboard's lifecycle_action extra
  closure was already firing kick — now it's a no-op for the
  rebuild path since rebuild_agent does it.)
- actions::run_apply_commit — apply-commit approve flow built
  + tagged deployed/<id> but never kicked. add kick on
  success with the more specific 'config update applied' hint.
- server.rs::HostRequest::Rebuild — the admin-CLI direct path
  calls lifecycle::rebuild bypassing rebuild_agent. add kick
  on success.

dashboard's restart / start lifecycle_action extras still
kick via their own closures since they don't route through
rebuild_agent. stop / kill / destroy intentionally don't
kick — there's nothing to wake.
2026-05-16 04:20:01 +02:00
müde
50ef806266 operator pronouns: configurable free-text, threaded into prompts
new NixOS module option services.hive-c0re.operatorPronouns
(free text, default 'she/her', example 'they/them'). hive-c0re
takes it as a CLI flag (--operator-pronouns, lib.escapeShellArg'd
in the systemd unit), stores it on Coordinator, threads it into
the meta flake's mkAgent so each agent's systemd service gets
HIVE_OPERATOR_PRONOUNS set. the harness reads the env at boot
and substitutes {operator_pronouns} into the agent / manager
system prompt alongside {label}. nix string is escaped against
backslash + double-quote so non-ascii / quoted values
round-trip safely. prompt addendum: both agent.md and
manager.md mention the operator's pronouns up front so claude
uses them naturally in third-person reference. propagates on
next ↻ R3BU1LD (meta lock bump, no per-agent approval).
2026-05-16 02:05:22 +02:00
müde
06fdbac1ac actions::run_apply_commit through meta two-phase
approval-driven deploys now walk the meta flake via
prepare_deploy / finalize_deploy / abort_deploy so a failed
build leaves no commit in meta's deploy log:

1. capture applied/main sha for rollback
2. tag approved/<id> + building/<id>
3. ff applied/main to proposal/<id>, read-tree sync working tree
4. meta::prepare_deploy(name) — nix flake lock --update-input
   agent-<n> without committing
5. lifecycle::rebuild_no_meta — container-level only (new
   extracted helper; public lifecycle::rebuild still wraps it
   with single-phase meta sync + commit for dashboard / auto
   _update callers that don't care about rollback)
6a. on success: tag deployed/<id>, meta::finalize_deploy commits
    the staged lock with 'deploy <n> deployed/<id> <sha12>'
6b. on failure: tag failed/<id> annotated with the build error,
    git_update_ref applied/main back to prev sha, read-tree to
    main, meta::abort_deploy git-restores flake.lock

meta's git log now records only successful deploys; failures
+ denials still live in applied as annotated tags.
2026-05-16 00:32:16 +02:00
müde
22f35def8f actions::destroy syncs meta after lifecycle
once nixos-container destroy lands + per-agent state cleanup is
done, rerender the meta flake from the remaining containers so
the destroyed agent's input + nixosConfiguration drop off and
its flake.lock entry vanishes. log + keep going on meta-sync
failure — the destroy already succeeded at the lifecycle level,
so meta drift here is just bookkeeping. new public
lifecycle::agents_for_meta_listing exposes the agent
enumeration for callers outside the module.
2026-05-16 00:29:26 +02:00
müde
fc61cb9310 fmt: clippy doc_markdown backticks 2026-05-15 23:11:10 +02:00
müde
6cf66e23dc actions: deny plants annotated denied/<id> tag
apply-commit denials now leave a git object behind: tag
denied/<id> annotated with the operator's note (or empty body
if they didn't supply one) at proposal/<id> inside the applied
repo. rejected configs become first-class git history — git
show denied/<id> in the manager's applied.git mount yields the
tree the operator rejected plus the reason. helper event
carries the tag for parity with deployed/failed. spawn denials
fall through unannotated since they have no proposal commit.
deny becomes async (single git plumbing call); dashboard +
admin-socket callers grow .await.
2026-05-15 23:01:22 +02:00
müde
315d4289c7 actions: tag-driven approve(ApplyCommit) flow
run_apply_commit walks the approval through the tag state
machine in applied: approved/<id> + building/<id> stamped
before the build, then git read-tree --reset to proposal/<id>
populates the working dir without moving HEAD. on rebuild
success deployed/<id> is planted and refs/heads/main fast-
forwards to the proposal. on failure failed/<id> is annotated
with the build error and the working tree resets back to main
so the agent stays evaluable. helper events Rebuilt +
ApprovalResolved both carry the terminal tag so the manager
can git-show the exact tree (and read the failure note from
an annotated tag) against its read-only applied.git mount.
finish_approval grows a terminal_tag param; spawn path passes
None. lifecycle::apply_commit deleted.
2026-05-15 23:00:01 +02:00
müde
871e7bf3fa wire types: add sha + tag to Approval and HelperEvent
approval grows fetched_sha (canonical hive-c0re-vouched sha,
distinct from manager-supplied commit_ref). helperevent
{approvalresolved,spawned,rebuilt} grow optional sha + tag so
the manager can git-show the exact tree it's hearing about
(against the upcoming /agents/<n>/applied.git RO mount) and
know which terminal tag landed. all serde-defaulted; existing
construction sites pass none until the tag-driven flow lands.
2026-05-15 22:47:39 +02:00
müde
2029840671 deny: operator can attach a reason that reaches the manager
clicking DENY on the dashboard now prompts for an optional reason
('reason for denying (optional, sent to manager):'). the value
rides along as a hidden 'note' form field; backend chain:

  POST /deny/{id} { note }
    → actions::deny(coord, id, Some(note))
    → Approvals::mark_denied writes it to the row
    → HelperEvent::ApprovalResolved { ..., note: Some("...") }

manager already had note: Option<String> on the event, just never
populated for denials before. host admin socket (hive-c0re deny)
still passes None.

generalized the prompt-on-submit pattern: any form with a
data-prompt attribute pops a window.prompt() before the POST and
stashes the answer in a hidden input named by data-prompt-field
(default 'note'). reusable for future opt-in note fields.
2026-05-15 21:58:42 +02:00
müde
c337cc06f8 dashboard: spinners on in-flight lifecycle actions + cleaner row layout
backend:
- TransientKind grows Starting / Stopping / Restarting / Rebuilding /
  Destroying alongside the existing Spawning. each dashboard handler
  (start/restart/kill/rebuild/destroy) wraps the lifecycle call with
  set_transient + clear_transient so the dashboard knows what's in
  flight. transient kind is surfaced inline on ContainerView.pending
  (existing-container actions) — only Spawning (pre-creation) lands
  in the separate transients list.

frontend:
- container row is now two lines: identity + meta on top, action
  buttons below. less cluttered, leaves room for the pending state
  pill. pending rows dim their actions and surface a pulsing
  '◐ spawning… / starting… / stopping… / restarting… / rebuilding…
  / destroying…' indicator next to the name.
- 'needs login' / 'needs update' chips moved into a unified .badge
  styling for consistency.
- auto-refresh kicks in not only on transient spawn but on any
  container with a pending action.
2026-05-15 19:49:43 +02:00
müde
48ebfefd1a destroy --purge: also wipe agent state dirs
new --purge flag on the destroy verb (cli + admin socket + dashboard).
default destroy still keeps /var/lib/hyperhive/{agents,applied}/<name>/
so recreating with the same name reuses prior config + creds.
with --purge, both dirs go too (config history, claude creds, /state/
notes). no undo. dashboard adds a separate PURG3 button with an
explicit confirmation copy; the existing DESTR0Y button keeps the
soft semantics.

claude.md dashboard-action-surface section updated; todo entry
dropped.
2026-05-15 19:29:14 +02:00
müde
ff8f8c7c56 per-agent /state dir for durable notes; manager sees them via /agents 2026-05-15 18:00:08 +02:00
müde
37c6504462 manager events: Spawned/Rebuilt/Killed/Destroyed + start button 2026-05-15 17:38:41 +02:00
müde
e1289a3e4c nix templates: factor harness-base.nix (shared scaffolding incl. gitconfig) 2026-05-15 16:10:55 +02:00
müde
f1fd787f17 rebuild button on agent UI (cross-origin POST to dashboard /rebuild) 2026-05-15 15:57:11 +02:00
müde
f99ed3fe7a manager: same lifecycle as agents; auto-spawn on hive-c0re start 2026-05-15 13:43:32 +02:00
müde
c59fa8541c phase 8 step 2: approval-gated spawn + dashboard spinner 2026-05-15 12:53:13 +02:00
müde
a42fdb3a5c phase 8 step 1: per-agent claude creds bind + destroy keeps state 2026-05-15 12:39:22 +02:00
müde
b711296460 destroy verb: CLI + admin socket + dashboard button; purges state + approvals 2026-05-15 02:57:22 +02:00
müde
fcd6563887 fmt 2026-05-15 02:02:20 +02:00
müde
1ceabae892 Phase 7c: ApprovalResolved helper events into manager's inbox 2026-05-15 00:26:42 +02:00
müde
c82d41728c Phase 7a: dashboard approve/deny + unified diff (similar crate) 2026-05-15 00:06:10 +02:00