container crash watcher → HelperEvent::ContainerCrash
new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).
This commit is contained in:
parent
6db38cf70c
commit
58c3cd853b
6 changed files with 92 additions and 7 deletions
|
|
@ -12,6 +12,7 @@ mod auto_update;
|
|||
mod broker;
|
||||
mod client;
|
||||
mod coordinator;
|
||||
mod crash_watch;
|
||||
mod dashboard;
|
||||
mod events_vacuum;
|
||||
mod lifecycle;
|
||||
|
|
@ -130,6 +131,10 @@ async fn main() -> Result<()> {
|
|||
// Per-agent events.sqlite vacuum: host-side so the harness
|
||||
// doesn't need any retention wiring of its own.
|
||||
events_vacuum::spawn(coord.clone());
|
||||
// Container crash watcher: emits HelperEvent::ContainerCrash
|
||||
// when a previously-running container goes away without an
|
||||
// operator-initiated transient state.
|
||||
crash_watch::spawn(coord.clone());
|
||||
let dash_coord = coord.clone();
|
||||
tokio::spawn(async move {
|
||||
if let Err(e) = dashboard::serve(dashboard_port, dash_coord).await {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue