fix: transient state leak via RAII guard
bare set_transient/clear_transient pairs leak the in-memory transient on task cancellation, panics, or any early return between the two calls — dashboard then shows the agent stuck in 'rebuilding…' forever (coder hit this today). add Coordinator::transient_guard returning a TransientGuard whose Drop clears, and convert every caller (dashboard lifecycle_action, auto_update::rebuild_agent, manager_server Update, actions::destroy, actions Spawn task, migrate phase 4). destroy() now takes &Arc<Coordinator> so it can hold a guard. existing stuck transients clear on next hive-c0re restart since transient state is in-memory only.
This commit is contained in:
parent
1a36c38a54
commit
313121a6e9
6 changed files with 56 additions and 18 deletions
|
|
@ -101,9 +101,9 @@ pub async fn run(coord: &Arc<Coordinator>) -> Result<()> {
|
|||
// update activation triggers. Without this, crash_watch
|
||||
// would fire ContainerCrash for every agent here and the
|
||||
// manager would spuriously try to recover them.
|
||||
coord.set_transient(name, crate::coordinator::TransientKind::Rebuilding);
|
||||
let _guard = coord.transient_guard(name, crate::coordinator::TransientKind::Rebuilding);
|
||||
let result = repoint_container(name).await;
|
||||
coord.clear_transient(name);
|
||||
drop(_guard);
|
||||
if let Err(e) = result {
|
||||
tracing::warn!(%name, error = ?e, "migration: container repoint failed");
|
||||
all_ok = false;
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue