fix: transient state leak via RAII guard

bare set_transient/clear_transient pairs leak the in-memory transient
on task cancellation, panics, or any early return between the two
calls — dashboard then shows the agent stuck in 'rebuilding…'
forever (coder hit this today). add Coordinator::transient_guard
returning a TransientGuard whose Drop clears, and convert every
caller (dashboard lifecycle_action, auto_update::rebuild_agent,
manager_server Update, actions::destroy, actions Spawn task,
migrate phase 4). destroy() now takes &Arc<Coordinator> so it can
hold a guard. existing stuck transients clear on next hive-c0re
restart since transient state is in-memory only.
This commit is contained in:
müde 2026-05-16 19:47:52 +02:00
parent 1a36c38a54
commit 313121a6e9
6 changed files with 56 additions and 18 deletions

View file

@ -228,9 +228,10 @@ async fn dispatch(req: &ManagerRequest, coord: &Arc<Coordinator>) -> ManagerResp
message: "update: hyperhive_flake has no canonical path".into(),
};
};
coord.set_transient(name, crate::coordinator::TransientKind::Rebuilding);
let _guard =
coord.transient_guard(name, crate::coordinator::TransientKind::Rebuilding);
let result = crate::auto_update::rebuild_agent(coord, name, &current_rev).await;
coord.clear_transient(name);
drop(_guard);
match result {
Ok(()) => {
coord.kick_agent(name, "container rebuilt");