container crash watcher → HelperEvent::ContainerCrash

new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).
2026-05-15 21:02:05 +02:00 · 2026-05-15 21:02:05 +02:00 · 58c3cd853b
commit 58c3cd853b
parent 6db38cf70c
6 changed files with 92 additions and 7 deletions
--- a/docs/approvals.md
+++ b/docs/approvals.md
@ -115,6 +115,10 @@ regular claude turn so the manager can react. Variants
 - `Killed { agent }` — admin `HostRequest::Kill` + dashboard
  `/kill` + manager `Kill` MCP tool.
 - `Destroyed { agent }` — `actions::destroy`.
+- `ContainerCrash { agent, note }` — `crash_watch`: a previously-
+  running container went away with no operator-initiated transient
+  state (Stopping / Restarting / Destroying / Rebuilding). Manager
+  can `start` it again or escalate.
 - `OperatorAnswered { id, question, answer }` — dashboard
  `/answer-question/{id}` after the operator submits the answer
  form.