broker: lease-style delivery — ack_turn + requeue_inflight close the no-drop loop

This commit is contained in:
damocles 2026-05-18 22:01:48 +02:00
parent 69a3ca7469
commit 690cb5ab5b
8 changed files with 684 additions and 35 deletions

View file

@ -183,6 +183,43 @@ read them à la carte.
In-flight or recent context that hasn't earned a section yet.
Prune freely.
- **Just landed:** lease-style message delivery / no-drop
on turn fail. The `messages` table gained an `acked_at`
column (idempotent ALTER + backfill = `delivered_at` so
pre-migration delivered rows count as already-acked).
`Broker::recv` now returns `Delivery { id, redelivered,
message }` — the harness gets the row id back so
`AckTurn` can sweep every popped id at turn-end-OK. Two
new wire arms on both agent + manager surfaces:
`AckTurn` (drains the broker's per-recipient in-memory
`unacked_ids` list and stamps the rows `acked_at = NOW`)
and `RequeueInflight` (one-shot at harness boot: resets
`delivered_at = NULL` on every still-inflight row +
remembers each id so the next `Recv` carries
`redelivered: true`). Both bin loops call
`requeue_inflight` once before entering serve, and
`ack_turn` after every `TurnOutcome::Ok` (Failed +
PromptTooLong intentionally skip the ack so the popped
rows stay in-flight for the next boot's requeue).
`format_recv` + `format_wake_prompt` on both bins
surface a `[redelivered after harness restart — may
already be handled]` banner so claude knows the
side-effects of any previous handling may already have
happened. Lock order: `inflight` mutex first then
`conn` mutex in all three methods (`recv` / `ack_turn`
/ `requeue_inflight`) so a concurrent pop can't race
the requeue's DB update vs in-memory populate and
miss the redelivered tag. `vacuum_delivered` filter
flipped from `delivered_at < cutoff` to `acked_at IS
NOT NULL AND acked_at < cutoff` so unacked-but-
delivered rows survive vacuum (they're recoverable via
`requeue_inflight`). 7 new tests in `broker::tests`
cover happy path, crash recovery, idempotency, per-
recipient isolation, batch ack, vacuum preservation,
and FIFO ordering on requeue. Closes the "post-rebuild
system-message missed wake" bug class entirely (any
turn that wakes from a `delivered_at NOT NULL,
acked_at NULL` row resurfaces on next boot).
- **Just landed:** ctx + cost badges split. The per-agent
page now shows TWO chips — `ctx · N` (last inference's
prompt size = actual context window utilisation, parsed