agent badges: split into ctx (last-inference) + cost (cumulative)

the existing ctx badge was misnamed: it summed `result.usage`, which is the cumulative tokens billed across every inference in the turn. for tool-heavy turns that easily exceeds the model's context window (a 600k cached prefix × 15 sub-calls = 9M cache_read), making it useless as a "should i compact?" signal. now two separate badges: ctx · N last inference's prompt size = actual context window in use right now. parsed from each `assistant` event's `.message.usage`; the harness tracks the most recent one across the stream and snapshots it when the `result` event lands. cost · M cumulative tokens billed across the whole turn (the previous behaviour, now correctly labelled). both update via a single `TokenUsageChanged { ctx, cost }` SSE event at turn-end. turn_stats grows four columns (`last_input_tokens`, `last_output_tokens`, `last_cache_read_input_tokens`, `last_cache_creation_input_tokens`) so the cold-load seed can paint both badges on page load. migrations run try-and-ignore ALTERs so existing agent dbs catch up; pre-migration rows have last-inference zeros and yield no `ctx` seed (badge stays empty until next turn) rather than a misleading 0.
2026-05-18 18:48:35 +02:00 · 2026-05-18 18:48:35 +02:00 · 5c6c607e25
commit 5c6c607e25
parent 14549dd8a9
9 changed files with 267 additions and 101 deletions
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@ -310,13 +310,22 @@ Layout, top to bottom:
    `turn_state_since`.
  - Model chip: `model · <name>` (e.g. `model · haiku`). Driven
    by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
-  - Ctx badge: `ctx · 142k` — total prompt tokens in the
-    current context window (input + cache_read + cache_write),
-    mirroring claude code's bottom-right indicator. Hover for
-    the breakdown including output. Driven by
-    `LiveEvent::TokenUsageChanged`; emitted from
-    `Bus::record_usage` whenever the terminal `result` event
-    delivers a fresh usage block.
+  - Ctx badge: `ctx · 142k` — last inference's prompt size
+    (input + cache_read + cache_write of the most recent
+    model call in the just-ended turn). This is the **actual
+    context window utilisation** — the number to watch when
+    deciding whether to compact.
+  - Cost badge: `cost · 1.3M` — cumulative tokens billed
+    across **every inference** in the last turn (sum of all
+    per-call prompts). Tool-heavy turns rebill the cached
+    prefix per call, so this routinely exceeds the model's
+    window — it's a cost signal, not a size signal.
+  - Both badges driven by `LiveEvent::TokenUsageChanged {
+    ctx, cost }`, emitted once at turn-end from
+    `Bus::record_turn_usage`. The harness tracks per-inference
+    usage by walking `assistant` events in the stream-json
+    and updating `last_inference` on each one; the `result`
+    event supplies `cost` and triggers the emit.
  - Last-turn chip: `last turn 12.3s` appears after the first
    turn ends, computed from the state-since deltas.
  - `■ cancel turn` button: visible only while state=thinking,
@ -437,8 +446,11 @@ Bus events (new vocabulary on `/events/stream`):
  `needs_login_idle` / `needs_login_in_progress`. Drives the
  alive-badge.
 - `model_changed { model }` — drives the model chip.
- `token_usage_changed { usage: TokenUsage }` — drives the
-  ctx-badge. Emitted from `Bus::record_usage` whenever the
-  stream-json `result` event delivers a fresh usage block.
+- `token_usage_changed { ctx: TokenUsage, cost: TokenUsage }`
+  — drives the ctx + cost badges. Emitted from
+  `Bus::record_turn_usage` at turn-end; `ctx` is the last
+  inference's usage (current context size), `cost` is the
+  cumulative across every inference (the `result` event's
+  totals).
 - `turn_state_changed { state, since_unix }` — drives the
  state badge (`idle`/`thinking`/`compacting`).