ask_operator: ttl_seconds auto-cancel + remaining-time chip

manager can pass ttl_seconds to ask_operator. on submit, host
stores deadline_at = now + ttl in operator_questions (new column,
migrated via existing pragma_table_info pattern), spawns a tokio
task that sleeps until the deadline then resolves the question with
answer '[expired]' and fires the same OperatorAnswered helper event.
already-resolved races no-op silently.

dashboard renders a ' MM:SS' chip on the question row when
deadline_at is set. format collapses seconds → s, < 1h → m s, ≥ 1h
→ h m. heartbeat refresh (5s) keeps the chip current; the operator
sees it tick down.

manager prompt + mcp tool description updated. journald viewer per
container queued in todo (separate task).
This commit is contained in:
müde 2026-05-15 20:38:02 +02:00
parent 2146e47770
commit 754db7830e
8 changed files with 133 additions and 36 deletions

18
TODO.md
View file

@ -68,13 +68,6 @@ Pick anything from here when relevant. Cross-cutting design notes live in
## Manager → operator question channel
- **TTL on `ask_operator`.** Manual cancel via dashboard already
ships (✗ CANC3L button resolves the question with answer
`[cancelled]` and fires `OperatorAnswered` so the manager sees a
terminal state). Still missing: per-question `ttl_seconds` that
auto-cancels after a deadline. Spawn a tokio task per submitted
question that calls the same cancel path after the ttl expires
(cheap; rare). Surface remaining time on the dashboard.
## Spawn flow
@ -114,6 +107,17 @@ Pick anything from here when relevant. Cross-cutting design notes live in
## Lifecycle / reliability
- **journald viewer per container in the dashboard.** Surface the
equivalent of `journalctl -M h-coder -b` in the dashboard so the
operator can see container logs without ssh-ing in. Optional
filter by hive-specific systemd unit (`hive-ag3nt.service`,
`hive-m1nd.service`). Implementation: backend shells out to
`journalctl -M <container> -b --output=short-iso --no-pager`
(optionally `-u <unit>`), streams or paginates the result over a
new dashboard endpoint. Could be a `<details>` per container row
or a dedicated page. Honest journalctl, not the in-container
events stream — those are different surfaces (events = claude turn
loop; journalctl = systemd-wide logs incl. boot, network, etc.).
- **Container crash events.** Watch `container@*.service` via D-Bus, push
`HelperEvent::ContainerCrash` to the manager's inbox so the manager can
react (restart, escalate, etc.).