container crash watcher → HelperEvent::ContainerCrash

new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).
2026-05-15 21:02:05 +02:00 · 2026-05-15 21:02:05 +02:00 · 58c3cd853b
commit 58c3cd853b
parent 6db38cf70c
6 changed files with 92 additions and 7 deletions
--- a/hive-sh4re/src/lib.rs
+++ b/hive-sh4re/src/lib.rs
@ -259,6 +259,16 @@ pub enum HelperEvent {
    /// A sub-agent's container was torn down (container removed; state
    /// dirs preserved per `destroy` semantics).
    Destroyed { agent: String },
+    /// Container exited without an operator-initiated stop. Fired by
+    /// the crash watcher when an agent's container transitions from
+    /// running → stopped and no `Stopping` / `Restarting` /
+    /// `Destroying` transient was set, so the operator (or the
+    /// manager) knows it crashed rather than was killed on purpose.
+    ContainerCrash {
+        agent: String,
+        #[serde(default, skip_serializing_if = "Option::is_none")]
+        note: Option<String>,
+    },
    /// The operator answered a question that was queued via
    /// `AskOperator`. `id` matches the `QuestionQueued.id` returned to the
    /// asker; `question` echoes the original prompt so the manager can