startup auto-migration from pre-meta layout

new migrate module runs before auto_update on hive-c0re boot.
four idempotent phases:

1. for every applied/<n>/ whose flake.nix isn't already the
   module-only boilerplate, rewrite + commit + relocate
   deployed/0 to HEAD so setup_applied's existence check passes
2. for every proposed/<n>/config without an 'applied' remote,
   wire it (delegates to setup_proposed which is now
   idempotent and adds the remote itself)
3. meta::sync_agents over the current container list — inits
   the meta repo on first call, rerender + relock if drifted
4. nixos-container update <c> --flake meta#<name> for every
   container, guarded by /var/lib/hyperhive/.meta-migration-done
   so phase 4's expensive eval only runs once across restarts

env kill-switch HIVE_SKIP_META_MIGRATION=1 defers the whole
thing. each agent's failure is logged + skipped so one broken
agent doesn't block the rest. runs ahead of ensure_manager so
the manager auto-spawn comes up against meta from the first
attempt.
This commit is contained in:
müde 2026-05-16 00:34:58 +02:00
parent 87016cd567
commit 59a89314f0
3 changed files with 194 additions and 1 deletions

View file

@ -18,6 +18,7 @@ mod events_vacuum;
mod lifecycle;
mod manager_server;
mod meta;
mod migrate;
mod operator_questions;
mod server;
@ -97,6 +98,15 @@ async fn main() -> Result<()> {
} => {
let coord = Arc::new(Coordinator::open(&db, hyperhive_flake, dashboard_port)?);
manager_server::start(coord.clone())?;
// Idempotent pre-flight: rewrite pre-meta-layout applied
// repos, ensure proposed repos carry the `applied`
// remote, bootstrap the meta repo, repoint containers at
// `meta#<name>` (one-shot, guarded by a marker file).
// Runs before manager auto-spawn so the new manager is
// built against meta from the first attempt.
if let Err(e) = migrate::run(&coord).await {
tracing::warn!(error = ?e, "startup migration failed");
}
// Auto-create the manager container if it isn't there yet. Block
// on this — without hm1nd the system has no manager harness.
// Failures are logged but allowed: a broken auto-spawn shouldn't