meta: serialize all ops behind a tokio mutex + clear stale lock at startup
journal showed three concurrent rebuilds racing on the meta repo's .git/index.lock — auto_update::run kicks off parallel tokio::spawn for every stale agent, each rebuild eventually calls into meta::sync_agents / lock_update_for_rebuild which do git add + commit, git isn't safe across concurrent processes on the same .git/, and one of the failing-mid-write children left index.lock behind. subsequent ops blocked until somebody rm'd it manually. fix: static META_LOCK (tokio::sync::Mutex<()>) acquired at the top of every public meta function. concurrent rebuilds take turns on meta ops; the actual nix build (nixos-container update) releases the lock first and runs without it, so parallel agent builds still parallelize on nix-daemon's own concurrency model. migrate::run additionally clears /var/lib/hyperhive/meta/.git/ index.lock on startup if it exists — we just booted, nothing of ours is holding it. covers the 'previous crash left a stale lock' case the user just hit so the daemon recovers without manual intervention.
This commit is contained in:
parent
3db33b0fe5
commit
78f21ccc5d
3 changed files with 84 additions and 0 deletions
|
|
@ -49,6 +49,17 @@ pub async fn run(coord: &Arc<Coordinator>) -> Result<()> {
|
|||
tracing::info!("migration: {KILL_SWITCH} set — skipping");
|
||||
return Ok(());
|
||||
}
|
||||
// Stale meta index lock: a previous hive-c0re crash mid-`git add`
|
||||
// can leave `.git/index.lock` behind, which blocks every
|
||||
// subsequent meta op until somebody `rm`s it manually. We just
|
||||
// booted so nothing of ours is holding it; safe to clear.
|
||||
let meta_lock = std::path::PathBuf::from("/var/lib/hyperhive/meta/.git/index.lock");
|
||||
if meta_lock.exists() {
|
||||
match std::fs::remove_file(&meta_lock) {
|
||||
Ok(()) => tracing::warn!("cleared stale meta/.git/index.lock"),
|
||||
Err(e) => tracing::warn!(error = ?e, "clear stale meta lock failed"),
|
||||
}
|
||||
}
|
||||
let names = enumerate_agents().await;
|
||||
tracing::info!(count = names.len(), "migration: scanning");
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue