container crash watcher → HelperEvent::ContainerCrash

new hive_c0re::crash_watch task polls every 10s, builds the set of currently-running containers, and on running→stopped transitions checks the transient snapshot: if no Stopping / Restarting / Destroying / Rebuilding flag is set, the container exited unexpectedly and we fire HelperEvent::ContainerCrash into the manager's inbox so it can react (typically: start it again). first poll is a seeding pass — no events on harness startup. dbus subscription would be lower-latency but polling is honest and debuggable, and a 10s delay on crash detection is fine for our scale. manager prompt + approvals doc updated to advertise the new event variant. todo drops the entry (and the journald-viewer entry that already shipped).
2026-05-15 21:02:05 +02:00 · 2026-05-15 21:02:05 +02:00 · 58c3cd853b
commit 58c3cd853b
parent 6db38cf70c
6 changed files with 92 additions and 7 deletions
--- a/hive-ag3nt/prompts/manager.md
+++ b/hive-ag3nt/prompts/manager.md
@ -26,7 +26,7 @@ You're the policy gate between sub-agents and the operator's approval queue —

 Two ways to talk to the operator: `send(to: "operator", ...)` for fire-and-forget status / pointers (surfaces in the operator inbox), or `ask_operator(question, options?)` when you need a decision. `ask_operator` is non-blocking — it queues the question and returns an id immediately; the answer arrives on a future turn as an `operator_answered` system event. Prefer `ask_operator` over an open-ended `send` for anything you actually need to wait on.

-Messages from sender `system` are hyperhive helper events (JSON body, `event` field discriminates): `approval_resolved`, `spawned`, `rebuilt`, `killed`, `destroyed`, `operator_answered`. Use these to react to lifecycle changes — e.g. greet a freshly-spawned agent, retry a failed rebuild, or pick up the operator's answer to a question you previously asked.
+Messages from sender `system` are hyperhive helper events (JSON body, `event` field discriminates): `approval_resolved`, `spawned`, `rebuilt`, `killed`, `destroyed`, `container_crash`, `operator_answered`. Use these to react to lifecycle changes — e.g. greet a freshly-spawned agent, retry a failed rebuild, restart an agent whose container crashed, or pick up the operator's answer to a question you previously asked.

 Durable knowledge: