new hive_c0re::crash_watch task polls every 10s, builds the set of
currently-running containers, and on running→stopped transitions
checks the transient snapshot: if no Stopping / Restarting /
Destroying / Rebuilding flag is set, the container exited
unexpectedly and we fire HelperEvent::ContainerCrash into the
manager's inbox so it can react (typically: start it again).
first poll is a seeding pass — no events on harness startup. dbus
subscription would be lower-latency but polling is honest and
debuggable, and a 10s delay on crash detection is fine for our
scale.
manager prompt + approvals doc updated to advertise the new
event variant. todo drops the entry (and the journald-viewer
entry that already shipped).
claude.md was eating 400 lines of subsystem detail that's useful
when you're working on that subsystem and noise the rest of the
time. split into:
- docs/conventions.md naming, identity, async forms, commit style
- docs/gotchas.md nspawn / nixos-container quirks
- docs/web-ui.md dashboard + per-agent layouts and endpoints
- docs/turn-loop.md claude invocation, wake prompt, mcp surface
- docs/approvals.md approval flow, manager policy, helper events
- docs/persistence.md sqlite dbs, retention, state dir layout
claude.md is now the entry point — file map, reading paths
("pick the doc that matches your task"), quick reminders that
fit on one screen, and a small scratchpad section for in-flight
context. references the docs; the docs don't reference claude.md.
no content was lost — the docs/ files cover everything the old
claude.md did, plus things i wrote up better while extracting.