agent badges: split into ctx (last-inference) + cost (cumulative)

the existing ctx badge was misnamed: it summed `result.usage`, which is
the cumulative tokens billed across every inference in the turn. for
tool-heavy turns that easily exceeds the model's context window (a 600k
cached prefix × 15 sub-calls = 9M cache_read), making it useless as a
"should i compact?" signal.

now two separate badges:

  ctx · N    last inference's prompt size = actual context window in
             use right now. parsed from each `assistant` event's
             `.message.usage`; the harness tracks the most recent one
             across the stream and snapshots it when the `result`
             event lands.

  cost · M   cumulative tokens billed across the whole turn (the
             previous behaviour, now correctly labelled).

both update via a single `TokenUsageChanged { ctx, cost }` SSE event at
turn-end. turn_stats grows four columns (`last_input_tokens`,
`last_output_tokens`, `last_cache_read_input_tokens`,
`last_cache_creation_input_tokens`) so the cold-load seed can paint both
badges on page load. migrations run try-and-ignore ALTERs so existing
agent dbs catch up; pre-migration rows have last-inference zeros and
yield no `ctx` seed (badge stays empty until next turn) rather than a
misleading 0.
This commit is contained in:
müde 2026-05-18 18:48:35 +02:00
parent 14549dd8a9
commit 5c6c607e25
9 changed files with 267 additions and 101 deletions

View file

@ -310,13 +310,22 @@ Layout, top to bottom:
`turn_state_since`. `turn_state_since`.
- Model chip: `model · <name>` (e.g. `model · haiku`). Driven - Model chip: `model · <name>` (e.g. `model · haiku`). Driven
by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`. by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
- Ctx badge: `ctx · 142k` — total prompt tokens in the - Ctx badge: `ctx · 142k` — last inference's prompt size
current context window (input + cache_read + cache_write), (input + cache_read + cache_write of the most recent
mirroring claude code's bottom-right indicator. Hover for model call in the just-ended turn). This is the **actual
the breakdown including output. Driven by context window utilisation** — the number to watch when
`LiveEvent::TokenUsageChanged`; emitted from deciding whether to compact.
`Bus::record_usage` whenever the terminal `result` event - Cost badge: `cost · 1.3M` — cumulative tokens billed
delivers a fresh usage block. across **every inference** in the last turn (sum of all
per-call prompts). Tool-heavy turns rebill the cached
prefix per call, so this routinely exceeds the model's
window — it's a cost signal, not a size signal.
- Both badges driven by `LiveEvent::TokenUsageChanged {
ctx, cost }`, emitted once at turn-end from
`Bus::record_turn_usage`. The harness tracks per-inference
usage by walking `assistant` events in the stream-json
and updating `last_inference` on each one; the `result`
event supplies `cost` and triggers the emit.
- Last-turn chip: `last turn 12.3s` appears after the first - Last-turn chip: `last turn 12.3s` appears after the first
turn ends, computed from the state-since deltas. turn ends, computed from the state-since deltas.
- `■ cancel turn` button: visible only while state=thinking, - `■ cancel turn` button: visible only while state=thinking,
@ -437,8 +446,11 @@ Bus events (new vocabulary on `/events/stream`):
`needs_login_idle` / `needs_login_in_progress`. Drives the `needs_login_idle` / `needs_login_in_progress`. Drives the
alive-badge. alive-badge.
- `model_changed { model }` — drives the model chip. - `model_changed { model }` — drives the model chip.
- `token_usage_changed { usage: TokenUsage }` — drives the - `token_usage_changed { ctx: TokenUsage, cost: TokenUsage }`
ctx-badge. Emitted from `Bus::record_usage` whenever the — drives the ctx + cost badges. Emitted from
stream-json `result` event delivers a fresh usage block. `Bus::record_turn_usage` at turn-end; `ctx` is the last
inference's usage (current context size), `cost` is the
cumulative across every inference (the `result` event's
totals).
- `turn_state_changed { state, since_unix }` — drives the - `turn_state_changed { state, since_unix }` — drives the
state badge (`idle`/`thinking`/`compacting`). state badge (`idle`/`thinking`/`compacting`).

View file

@ -525,30 +525,43 @@
el_.textContent = 'model · ' + model; el_.textContent = 'model · ' + model;
el_.title = `claude --model ${model}\nset via the operator's /model command; persists across turns until changed`; el_.title = `claude --model ${model}\nset via the operator's /model command; persists across turns until changed`;
} }
// Context badge — mirrors Claude Code's bottom-right "N tokens" // Token badges — two separate chips:
// indicator. Primary number is total prompt tokens used in the // ctx · N last inference's prompt size = current context window
// current context window (input + both cache axes); hover for the // utilisation (what to watch for compaction decisions)
// breakdown including output. Kept as chrome on the state row so // cost · M cumulative billed tokens across the whole last turn
// the terminal stays the star. // (sum across every inference; tool-heavy turns rebill
function renderTokenUsage(u) { // the cached prompt per call and blow past the model's
const el_ = $('ctx-badge'); // context window — this is a cost signal, not a size
// signal)
// Both fed by the same `token_usage_changed` SSE event (`{ ctx, cost }`).
const fmtTokens = (n) => {
if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M';
if (n >= 1_000) return Math.round(n / 1000) + 'k';
return String(n);
};
function renderOneUsage(elId, label, u, blurb) {
const el_ = $(elId);
if (!el_) return; if (!el_) return;
if (!u) { el_.hidden = true; return; } if (!u) { el_.hidden = true; return; }
const ctx = u.input_tokens + u.cache_read_input_tokens + u.cache_creation_input_tokens; const total = u.input_tokens + u.cache_read_input_tokens + u.cache_creation_input_tokens;
const fmt = (n) => {
if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M';
if (n >= 1_000) return Math.round(n / 1000) + 'k';
return String(n);
};
el_.hidden = false; el_.hidden = false;
el_.title = [ el_.title = [
'context window in use', blurb,
'input: ' + u.input_tokens, 'input: ' + u.input_tokens,
'cache_read: ' + u.cache_read_input_tokens, 'cache_read: ' + u.cache_read_input_tokens,
'cache_write: ' + u.cache_creation_input_tokens, 'cache_write: ' + u.cache_creation_input_tokens,
'output (last turn): ' + u.output_tokens, 'output: ' + u.output_tokens,
].join('\n'); ].join('\n');
el_.textContent = 'ctx · ' + fmt(ctx); el_.textContent = label + ' · ' + fmtTokens(total);
}
function renderTokenUsage(ev) {
// `ev` is `{ ctx, cost }` either off /api/state cold-load (each may
// be null) or off a `token_usage_changed` SSE event (both present
// post-turn).
renderOneUsage('ctx-badge', 'ctx', ev && ev.ctx,
'last-inference prompt size — the actual context window in use right now');
renderOneUsage('cost-badge', 'cost', ev && ev.cost,
'cumulative tokens billed across the last turn (sum across every inference)');
} }
function renderLastTurn(ms) { function renderLastTurn(ms) {
const el_ = $('last-turn'); const el_ = $('last-turn');
@ -626,7 +639,7 @@
} }
renderAliveBadge(s.status); renderAliveBadge(s.status);
renderModelChip(s.model); renderModelChip(s.model);
renderTokenUsage(s.token_usage); renderTokenUsage({ ctx: s.ctx_usage, cost: s.cost_usage });
// Open-threads aren't part of /api/state (kept on the broker // Open-threads aren't part of /api/state (kept on the broker
// db, fetched via the per-agent socket). Cold-load fetches // db, fetched via the per-agent socket). Cold-load fetches
// it here; turn_end refreshes it via the renderer below. // it here; turn_end refreshes it via the renderer below.
@ -1026,7 +1039,7 @@
}, },
model_changed(ev, api) { if (!api.fromHistory) renderModelChip(ev.model); }, model_changed(ev, api) { if (!api.fromHistory) renderModelChip(ev.model); },
token_usage_changed(ev, api) { token_usage_changed(ev, api) {
if (!api.fromHistory) renderTokenUsage(ev.usage); if (!api.fromHistory) renderTokenUsage({ ctx: ev.ctx, cost: ev.cost });
}, },
turn_state_changed(ev, api) { turn_state_changed(ev, api) {
if (!api.fromHistory) setStateAbs(ev.state, ev.since_unix); if (!api.fromHistory) setStateAbs(ev.state, ev.since_unix);

View file

@ -18,6 +18,7 @@
<span id="state-badge" class="state-badge state-loading">… booting</span> <span id="state-badge" class="state-badge state-loading">… booting</span>
<span id="model-chip" class="model-chip" hidden></span> <span id="model-chip" class="model-chip" hidden></span>
<span id="ctx-badge" class="ctx-badge" hidden title="tokens used in the current context window"></span> <span id="ctx-badge" class="ctx-badge" hidden title="tokens used in the current context window"></span>
<span id="cost-badge" class="ctx-badge" hidden title="cumulative tokens billed across the last turn (sum across every inference; tool-heavy turns rebill the cached prompt per call)"></span>
<span id="last-turn" class="last-turn" hidden></span> <span id="last-turn" class="last-turn" hidden></span>
<button type="button" id="cancel-btn" class="btn-cancel-turn" hidden>■ cancel turn</button> <button type="button" id="cancel-btn" class="btn-cancel-turn" hidden>■ cancel turn</button>
<button type="button" id="new-session-btn" class="btn-new-session" <button type="button" id="new-session-btn" class="btn-new-session"

View file

@ -74,10 +74,11 @@ async fn main() -> Result<()> {
let login_state = Arc::new(Mutex::new(initial)); let login_state = Arc::new(Mutex::new(initial));
let bus = Bus::new(); let bus = Bus::new();
let stats = TurnStats::open_default(); let stats = TurnStats::open_default();
if let Some(s) = &stats if let Some(s) = &stats {
&& let Some(u) = s.last_usage() let (ctx, cost) = s.last_usage();
{ if ctx.is_some() || cost.is_some() {
bus.seed_usage(u); bus.seed_usage(ctx, cost);
}
} }
let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Agent).await?; let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Agent).await?;
let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(())); let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(()));
@ -354,7 +355,8 @@ fn build_row(
open_threads_count: Option<u64>, open_threads_count: Option<u64>,
open_reminders_count: Option<u64>, open_reminders_count: Option<u64>,
) -> TurnStatRow { ) -> TurnStatRow {
let usage = bus.last_usage().unwrap_or_default(); let cost = bus.last_cost_usage().unwrap_or_default();
let ctx = bus.last_ctx_usage().unwrap_or(cost);
let tool_calls = bus.take_tool_calls(); let tool_calls = bus.take_tool_calls();
let tool_call_count: u64 = tool_calls.values().copied().sum(); let tool_call_count: u64 = tool_calls.values().copied().sum();
let tool_call_breakdown_json = if tool_calls.is_empty() { let tool_call_breakdown_json = if tool_calls.is_empty() {
@ -373,10 +375,14 @@ fn build_row(
duration_ms, duration_ms,
model, model,
wake_from, wake_from,
input_tokens: usage.input_tokens, input_tokens: cost.input_tokens,
output_tokens: usage.output_tokens, output_tokens: cost.output_tokens,
cache_read_input_tokens: usage.cache_read_input_tokens, cache_read_input_tokens: cost.cache_read_input_tokens,
cache_creation_input_tokens: usage.cache_creation_input_tokens, cache_creation_input_tokens: cost.cache_creation_input_tokens,
last_input_tokens: ctx.input_tokens,
last_output_tokens: ctx.output_tokens,
last_cache_read_input_tokens: ctx.cache_read_input_tokens,
last_cache_creation_input_tokens: ctx.cache_creation_input_tokens,
tool_call_count, tool_call_count,
tool_call_breakdown_json, tool_call_breakdown_json,
open_threads_count, open_threads_count,

View file

@ -64,10 +64,11 @@ async fn main() -> Result<()> {
let login_state = Arc::new(Mutex::new(initial)); let login_state = Arc::new(Mutex::new(initial));
let bus = Bus::new(); let bus = Bus::new();
let stats = TurnStats::open_default(); let stats = TurnStats::open_default();
if let Some(s) = &stats if let Some(s) = &stats {
&& let Some(u) = s.last_usage() let (ctx, cost) = s.last_usage();
{ if ctx.is_some() || cost.is_some() {
bus.seed_usage(u); bus.seed_usage(ctx, cost);
}
} }
let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Manager).await?; let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Manager).await?;
let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(())); let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(()));
@ -291,7 +292,8 @@ fn build_row(
open_threads_count: Option<u64>, open_threads_count: Option<u64>,
open_reminders_count: Option<u64>, open_reminders_count: Option<u64>,
) -> TurnStatRow { ) -> TurnStatRow {
let usage = bus.last_usage().unwrap_or_default(); let cost = bus.last_cost_usage().unwrap_or_default();
let ctx = bus.last_ctx_usage().unwrap_or(cost);
let tool_calls = bus.take_tool_calls(); let tool_calls = bus.take_tool_calls();
let tool_call_count: u64 = tool_calls.values().copied().sum(); let tool_call_count: u64 = tool_calls.values().copied().sum();
let tool_call_breakdown_json = if tool_calls.is_empty() { let tool_call_breakdown_json = if tool_calls.is_empty() {
@ -310,10 +312,14 @@ fn build_row(
duration_ms, duration_ms,
model, model,
wake_from, wake_from,
input_tokens: usage.input_tokens, input_tokens: cost.input_tokens,
output_tokens: usage.output_tokens, output_tokens: cost.output_tokens,
cache_read_input_tokens: usage.cache_read_input_tokens, cache_read_input_tokens: cost.cache_read_input_tokens,
cache_creation_input_tokens: usage.cache_creation_input_tokens, cache_creation_input_tokens: cost.cache_creation_input_tokens,
last_input_tokens: ctx.input_tokens,
last_output_tokens: ctx.output_tokens,
last_cache_read_input_tokens: ctx.cache_read_input_tokens,
last_cache_creation_input_tokens: ctx.cache_creation_input_tokens,
tool_call_count, tool_call_count,
tool_call_breakdown_json, tool_call_breakdown_json,
open_threads_count, open_threads_count,

View file

@ -130,10 +130,15 @@ pub enum LiveEvent {
/// updates the chip + the per-turn stats sink will key off this /// updates the chip + the per-turn stats sink will key off this
/// to mark the boundary in its log. /// to mark the boundary in its log.
ModelChanged { model: String }, ModelChanged { model: String },
/// Final-turn `usage` block landed (input + output + cache /// Token usage for the turn just ended. Carries two snapshots:
/// counters). Powers the context-window badge + accumulates into /// - `ctx` is the LAST inference's usage block (the actual context
/// the per-turn stats sink. /// window in use right now — what the operator needs to decide
TokenUsageChanged { usage: TokenUsage }, /// whether to compact / reset).
/// - `cost` is the cumulative usage across every inference in the
/// turn (sum of per-call billed tokens — the cost signal). For
/// tool-heavy turns the cumulative blows past the model's window
/// because each tool call's prompt is rebilled.
TokenUsageChanged { ctx: TokenUsage, cost: TokenUsage },
/// Harness's `TurnState` transitioned (idle / thinking / /// Harness's `TurnState` transitioned (idle / thinking /
/// compacting). `since_unix` matches `Bus::state_snapshot().1` /// compacting). `since_unix` matches `Bus::state_snapshot().1`
/// so the client's elapsed-time ticker keeps progressing across /// so the client's elapsed-time ticker keeps progressing across
@ -221,15 +226,29 @@ impl TokenUsage {
self.input_tokens + self.cache_read_input_tokens + self.cache_creation_input_tokens self.input_tokens + self.cache_read_input_tokens + self.cache_creation_input_tokens
} }
/// Parse usage from a stream-json event. Returns `Some` only for the /// Parse usage from the terminal `result` stream-json event. This is the
/// terminal `result` event (which is the only one that carries `usage`); /// **cumulative** sum across every inference in the turn — useful as a
/// every other event maps to `None`. Missing numeric fields default to 0 /// cost signal, but NOT the current context size (a tool-heavy turn
/// so partial server payloads don't drop the whole snapshot. /// sums per-call cached prompts and easily exceeds the model window).
pub fn from_stream_event(v: &serde_json::Value) -> Option<Self> { pub fn from_stream_event(v: &serde_json::Value) -> Option<Self> {
if v.get("type").and_then(|t| t.as_str()) != Some("result") { if v.get("type").and_then(|t| t.as_str()) != Some("result") {
return None; return None;
} }
let u = v.get("usage")?; Self::from_usage_obj(v.get("usage")?)
}
/// Parse usage from a per-inference `assistant` event's
/// `.message.usage` block. Each turn fires one of these for every
/// model call; tracking the LAST one over the turn gives the actual
/// conversation context size — the number to watch for compaction.
pub fn from_assistant_event(v: &serde_json::Value) -> Option<Self> {
if v.get("type").and_then(|t| t.as_str()) != Some("assistant") {
return None;
}
Self::from_usage_obj(v.get("message")?.get("usage")?)
}
fn from_usage_obj(u: &serde_json::Value) -> Option<Self> {
let field = |k: &str| u.get(k).and_then(serde_json::Value::as_u64).unwrap_or(0); let field = |k: &str| u.get(k).and_then(serde_json::Value::as_u64).unwrap_or(0);
Some(Self { Some(Self {
input_tokens: field("input_tokens"), input_tokens: field("input_tokens"),
@ -281,12 +300,16 @@ pub struct Bus {
/// Model name passed to `claude --model`. Default `haiku`; the /// Model name passed to `claude --model`. Default `haiku`; the
/// operator can override at runtime via `POST /api/model`. /// operator can override at runtime via `POST /api/model`.
model: Arc<Mutex<String>>, model: Arc<Mutex<String>>,
/// Last token usage reported by claude (from the `result` stream-json /// Last-inference token usage from the most recent turn's final
/// event). `None` until the first turn with usage data completes. /// `assistant` event. Represents the actual context window size at
/// Updated on every turn; survives across turns within one harness /// turn-end — the number the operator watches to decide whether to
/// process lifetime (resets on container restart, which is fine — /// compact. `None` until the first turn completes.
/// it's a live indicator, not a cumulative counter). last_ctx_usage: Arc<Mutex<Option<TokenUsage>>>,
last_usage: Arc<Mutex<Option<TokenUsage>>>, /// Cumulative token usage from the most recent turn's `result`
/// event (sum across every inference in the turn). This is the cost
/// signal — tool-heavy turns rebill the cached prompt per call and
/// blow past the model window. `None` until the first turn completes.
last_cost_usage: Arc<Mutex<Option<TokenUsage>>>,
/// One-shot: next `run_claude` call drops `--continue`, starting /// One-shot: next `run_claude` call drops `--continue`, starting
/// a fresh claude session. Set by `POST /api/new-session` from /// a fresh claude session. Set by `POST /api/new-session` from
/// the per-agent web UI; consumed (cleared back to false) by the /// the per-agent web UI; consumed (cleared back to false) by the
@ -323,7 +346,8 @@ impl Bus {
store, store,
state: Arc::new(Mutex::new((TurnState::Idle, now_unix()))), state: Arc::new(Mutex::new((TurnState::Idle, now_unix()))),
model: Arc::new(Mutex::new(initial_model)), model: Arc::new(Mutex::new(initial_model)),
last_usage: Arc::new(Mutex::new(None)), last_ctx_usage: Arc::new(Mutex::new(None)),
last_cost_usage: Arc::new(Mutex::new(None)),
skip_continue_once: Arc::new(AtomicBool::new(false)), skip_continue_once: Arc::new(AtomicBool::new(false)),
tool_calls: Arc::new(Mutex::new(std::collections::HashMap::new())), tool_calls: Arc::new(Mutex::new(std::collections::HashMap::new())),
} }
@ -378,19 +402,27 @@ impl Bus {
self.emit(LiveEvent::ModelChanged { model: value }); self.emit(LiveEvent::ModelChanged { model: value });
} }
/// Seed `last_usage` at startup without emitting a SSE event. /// Seed `last_ctx_usage` + `last_cost_usage` at startup without
/// Used by the bin entrypoints to backfill from the most recent /// emitting a SSE event. Used by the bin entrypoints to backfill
/// `turn_stats` row so the per-agent web UI's `ctx-badge` paints /// from the most recent `turn_stats` row so the per-agent web UI's
/// real numbers on cold load instead of staying empty until the /// ctx + cost badges paint real numbers on cold load.
/// next turn finishes. pub fn seed_usage(&self, ctx: Option<TokenUsage>, cost: Option<TokenUsage>) {
pub fn seed_usage(&self, usage: TokenUsage) { if ctx.is_some() {
*self.last_usage.lock().unwrap() = Some(usage); *self.last_ctx_usage.lock().unwrap() = ctx;
}
if cost.is_some() {
*self.last_cost_usage.lock().unwrap() = cost;
}
} }
/// Record the latest token usage from a completed turn. /// Record the just-ended turn's usage. `ctx` is the last inference's
pub fn record_usage(&self, usage: TokenUsage) { /// usage (current context size); `cost` is the cumulative across
*self.last_usage.lock().unwrap() = Some(usage); /// every inference in the turn (cost signal). One SSE event fires
self.emit(LiveEvent::TokenUsageChanged { usage }); /// per turn carrying both.
pub fn record_turn_usage(&self, ctx: TokenUsage, cost: TokenUsage) {
*self.last_ctx_usage.lock().unwrap() = Some(ctx);
*self.last_cost_usage.lock().unwrap() = Some(cost);
self.emit(LiveEvent::TokenUsageChanged { ctx, cost });
} }
/// Walk a stream-json value for `tool_use` blocks and bump the /// Walk a stream-json value for `tool_use` blocks and bump the
@ -430,10 +462,18 @@ impl Bus {
std::mem::take(&mut *self.tool_calls.lock().unwrap()) std::mem::take(&mut *self.tool_calls.lock().unwrap())
} }
/// Last known token usage, or `None` if no turn has completed yet. /// Last context-size snapshot (last inference of the most recent
/// turn), or `None` if no turn has completed yet.
#[must_use] #[must_use]
pub fn last_usage(&self) -> Option<TokenUsage> { pub fn last_ctx_usage(&self) -> Option<TokenUsage> {
*self.last_usage.lock().unwrap() *self.last_ctx_usage.lock().unwrap()
}
/// Last cumulative cost snapshot (sum across the most recent turn's
/// inferences), or `None` if no turn has completed yet.
#[must_use]
pub fn last_cost_usage(&self) -> Option<TokenUsage> {
*self.last_cost_usage.lock().unwrap()
} }
/// Update the harness's authoritative turn-loop state. Records /// Update the harness's authoritative turn-loop state. Records

View file

@ -279,14 +279,28 @@ async fn run_claude(prompt: &str, files: &TurnFiles, bus: &Bus) -> Result<bool>
let bus_err = bus.clone(); let bus_err = bus.clone();
let pump_stdout = tokio::spawn(async move { let pump_stdout = tokio::spawn(async move {
let mut reader = BufReader::new(stdout).lines(); let mut reader = BufReader::new(stdout).lines();
// Track usage as the turn unfolds. `last_inference` overwrites on
// every assistant event so at result-time it holds the most recent
// model call's usage — the actual context size. The `result` event
// carries the cumulative-across-the-turn usage (cost signal). Both
// get handed to `record_turn_usage` together so a single SSE
// event updates both badges.
let mut last_inference: Option<crate::events::TokenUsage> = None;
while let Ok(Some(line)) = reader.next_line().await { while let Ok(Some(line)) = reader.next_line().await {
if line.contains(PROMPT_TOO_LONG_MARKER) { if line.contains(PROMPT_TOO_LONG_MARKER) {
flag_out.store(true, Ordering::Relaxed); flag_out.store(true, Ordering::Relaxed);
} }
match serde_json::from_str::<serde_json::Value>(&line) { match serde_json::from_str::<serde_json::Value>(&line) {
Ok(v) => { Ok(v) => {
if let Some(usage) = crate::events::TokenUsage::from_stream_event(&v) { if let Some(u) = crate::events::TokenUsage::from_assistant_event(&v) {
bus_out.record_usage(usage); last_inference = Some(u);
}
if let Some(cost) = crate::events::TokenUsage::from_stream_event(&v) {
// Fallback to `cost` if the turn somehow produced
// a result without any assistant event — keeps the
// ctx badge from going stale on a degenerate turn.
let ctx = last_inference.unwrap_or(cost);
bus_out.record_turn_usage(ctx, cost);
} }
bus_out.observe_stream(&v); bus_out.observe_stream(&v);
bus_out.emit(LiveEvent::Stream(v)); bus_out.emit(LiveEvent::Stream(v));

View file

@ -22,8 +22,9 @@ use anyhow::{Context, Result};
use rusqlite::{Connection, params}; use rusqlite::{Connection, params};
/// SQL bootstrap. CREATE TABLE IF NOT EXISTS so first-boot agents /// SQL bootstrap. CREATE TABLE IF NOT EXISTS so first-boot agents
/// and existing ones converge on the same shape; ALTER-style /// and existing ones converge on the same shape. The base table is
/// migrations land here as additional statements once we have any. /// fresh-install only; additive migrations land via `MIGRATIONS`
/// below as try-and-ignore ALTERs so existing dbs catch up.
const SCHEMA: &str = " const SCHEMA: &str = "
CREATE TABLE IF NOT EXISTS turn_stats ( CREATE TABLE IF NOT EXISTS turn_stats (
id INTEGER PRIMARY KEY AUTOINCREMENT, id INTEGER PRIMARY KEY AUTOINCREMENT,
@ -36,6 +37,10 @@ CREATE TABLE IF NOT EXISTS turn_stats (
output_tokens INTEGER NOT NULL DEFAULT 0, output_tokens INTEGER NOT NULL DEFAULT 0,
cache_read_input_tokens INTEGER NOT NULL DEFAULT 0, cache_read_input_tokens INTEGER NOT NULL DEFAULT 0,
cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0, cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0,
last_input_tokens INTEGER NOT NULL DEFAULT 0,
last_output_tokens INTEGER NOT NULL DEFAULT 0,
last_cache_read_input_tokens INTEGER NOT NULL DEFAULT 0,
last_cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0,
tool_call_count INTEGER NOT NULL DEFAULT 0, tool_call_count INTEGER NOT NULL DEFAULT 0,
tool_call_breakdown_json TEXT, tool_call_breakdown_json TEXT,
open_threads_count INTEGER, open_threads_count INTEGER,
@ -47,6 +52,17 @@ CREATE INDEX IF NOT EXISTS idx_turn_stats_started
ON turn_stats (started_at DESC); ON turn_stats (started_at DESC);
"; ";
/// Additive column migrations. Each runs unconditionally and ignores
/// `duplicate column name` errors — sqlite < 3.35 lacks
/// `ADD COLUMN IF NOT EXISTS`, so try-and-ignore is the portable path.
/// New columns MUST carry a default so existing rows decode.
const MIGRATIONS: &[&str] = &[
"ALTER TABLE turn_stats ADD COLUMN last_input_tokens INTEGER NOT NULL DEFAULT 0",
"ALTER TABLE turn_stats ADD COLUMN last_output_tokens INTEGER NOT NULL DEFAULT 0",
"ALTER TABLE turn_stats ADD COLUMN last_cache_read_input_tokens INTEGER NOT NULL DEFAULT 0",
"ALTER TABLE turn_stats ADD COLUMN last_cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0",
];
/// One row to be inserted. `Option`-wrapped fields default to NULL /// One row to be inserted. `Option`-wrapped fields default to NULL
/// when the harness couldn't gather them (e.g. socket roundtrip for /// when the harness couldn't gather them (e.g. socket roundtrip for
/// open_threads failed) so a partial row beats no row. /// open_threads failed) so a partial row beats no row.
@ -57,10 +73,16 @@ pub struct TurnStatRow {
pub duration_ms: i64, pub duration_ms: i64,
pub model: String, pub model: String,
pub wake_from: String, pub wake_from: String,
/// Cumulative across every inference in the turn (cost signal).
pub input_tokens: u64, pub input_tokens: u64,
pub output_tokens: u64, pub output_tokens: u64,
pub cache_read_input_tokens: u64, pub cache_read_input_tokens: u64,
pub cache_creation_input_tokens: u64, pub cache_creation_input_tokens: u64,
/// Last inference's usage — the actual context size at turn end.
pub last_input_tokens: u64,
pub last_output_tokens: u64,
pub last_cache_read_input_tokens: u64,
pub last_cache_creation_input_tokens: u64,
pub tool_call_count: u64, pub tool_call_count: u64,
/// Per-tool breakdown as JSON: `{"Read":12,"Bash":3,...}`. None /// Per-tool breakdown as JSON: `{"Read":12,"Bash":3,...}`. None
/// when no tools were called (saves a sqlite write of `"{}"`). /// when no tools were called (saves a sqlite write of `"{}"`).
@ -107,6 +129,18 @@ impl TurnStats {
.with_context(|| format!("open turn_stats db {}", path.display()))?; .with_context(|| format!("open turn_stats db {}", path.display()))?;
conn.execute_batch(SCHEMA) conn.execute_batch(SCHEMA)
.context("apply turn_stats schema")?; .context("apply turn_stats schema")?;
for stmt in MIGRATIONS {
// Ignore "duplicate column name" — the migration already ran.
// Any other error is logged but doesn't fail open() because the
// base schema works and we'd rather keep the harness alive than
// crash on an upgrade hiccup.
if let Err(e) = conn.execute(stmt, []) {
let msg = e.to_string();
if !msg.contains("duplicate column name") {
tracing::warn!(error = %msg, stmt, "turn_stats migration failed");
}
}
}
Ok(Self { Ok(Self {
inner: std::sync::Arc::new(Mutex::new(conn)), inner: std::sync::Arc::new(Mutex::new(conn)),
}) })
@ -121,6 +155,8 @@ impl TurnStats {
started_at, ended_at, duration_ms, model, wake_from, started_at, ended_at, duration_ms, model, wake_from,
input_tokens, output_tokens, input_tokens, output_tokens,
cache_read_input_tokens, cache_creation_input_tokens, cache_read_input_tokens, cache_creation_input_tokens,
last_input_tokens, last_output_tokens,
last_cache_read_input_tokens, last_cache_creation_input_tokens,
tool_call_count, tool_call_breakdown_json, tool_call_count, tool_call_breakdown_json,
open_threads_count, open_reminders_count, open_threads_count, open_reminders_count,
result_kind, note result_kind, note
@ -130,7 +166,9 @@ impl TurnStats {
?8, ?9, ?8, ?9,
?10, ?11, ?10, ?11,
?12, ?13, ?12, ?13,
?14, ?15 ?14, ?15,
?16, ?17,
?18, ?19
)", )",
params![ params![
row.started_at, row.started_at,
@ -142,6 +180,10 @@ impl TurnStats {
i64::try_from(row.output_tokens).unwrap_or(i64::MAX), i64::try_from(row.output_tokens).unwrap_or(i64::MAX),
i64::try_from(row.cache_read_input_tokens).unwrap_or(i64::MAX), i64::try_from(row.cache_read_input_tokens).unwrap_or(i64::MAX),
i64::try_from(row.cache_creation_input_tokens).unwrap_or(i64::MAX), i64::try_from(row.cache_creation_input_tokens).unwrap_or(i64::MAX),
i64::try_from(row.last_input_tokens).unwrap_or(i64::MAX),
i64::try_from(row.last_output_tokens).unwrap_or(i64::MAX),
i64::try_from(row.last_cache_read_input_tokens).unwrap_or(i64::MAX),
i64::try_from(row.last_cache_creation_input_tokens).unwrap_or(i64::MAX),
i64::try_from(row.tool_call_count).unwrap_or(i64::MAX), i64::try_from(row.tool_call_count).unwrap_or(i64::MAX),
row.tool_call_breakdown_json, row.tool_call_breakdown_json,
row.open_threads_count row.open_threads_count
@ -157,32 +199,58 @@ impl TurnStats {
} }
} }
/// Token counts from the most recently inserted row, if any. Lets /// Token counts from the most recently inserted row, if any.
/// the harness seed `Bus::last_usage` on startup so the per-agent /// Returns `(ctx, cost)` — both backfill `Bus` on startup so the
/// web UI's `ctx-badge` paints with real numbers on cold load /// per-agent web UI's ctx + cost badges paint with real numbers on
/// instead of waiting for the next `TokenUsageChanged` SSE event. /// cold load instead of waiting for the next `TokenUsageChanged`
/// Best-effort: any sqlite error returns `None` and the caller /// SSE event. Best-effort: any sqlite error returns `(None, None)`.
/// falls back to the empty state. ///
/// Pre-migration rows (before the `last_*_tokens` columns existed)
/// have last-inference zeros — those rows yield `ctx = None` so the
/// badge stays empty until the next real turn rather than showing a
/// misleading 0.
#[must_use] #[must_use]
pub fn last_usage(&self) -> Option<crate::events::TokenUsage> { pub fn last_usage(
&self,
) -> (
Option<crate::events::TokenUsage>,
Option<crate::events::TokenUsage>,
) {
let conn = self.inner.lock().unwrap(); let conn = self.inner.lock().unwrap();
conn.query_row( conn.query_row(
"SELECT input_tokens, output_tokens, "SELECT input_tokens, output_tokens,
cache_read_input_tokens, cache_creation_input_tokens cache_read_input_tokens, cache_creation_input_tokens,
last_input_tokens, last_output_tokens,
last_cache_read_input_tokens, last_cache_creation_input_tokens
FROM turn_stats FROM turn_stats
ORDER BY started_at DESC ORDER BY started_at DESC
LIMIT 1", LIMIT 1",
[], [],
|row| { |row| {
Ok(crate::events::TokenUsage { let g = |i: usize| -> rusqlite::Result<u64> {
input_tokens: u64::try_from(row.get::<_, i64>(0)?).unwrap_or(0), Ok(u64::try_from(row.get::<_, i64>(i)?).unwrap_or(0))
output_tokens: u64::try_from(row.get::<_, i64>(1)?).unwrap_or(0), };
cache_read_input_tokens: u64::try_from(row.get::<_, i64>(2)?).unwrap_or(0), let cost = crate::events::TokenUsage {
cache_creation_input_tokens: u64::try_from(row.get::<_, i64>(3)?).unwrap_or(0), input_tokens: g(0)?,
}) output_tokens: g(1)?,
cache_read_input_tokens: g(2)?,
cache_creation_input_tokens: g(3)?,
};
let last = crate::events::TokenUsage {
input_tokens: g(4)?,
output_tokens: g(5)?,
cache_read_input_tokens: g(6)?,
cache_creation_input_tokens: g(7)?,
};
let ctx = if last == crate::events::TokenUsage::default() {
None
} else {
Some(last)
};
Ok((ctx, Some(cost)))
}, },
) )
.ok() .unwrap_or((None, None))
} }
} }

View file

@ -225,9 +225,13 @@ struct StateSnapshot {
/// the operator can see what they just switched to (and what's /// the operator can see what they just switched to (and what's
/// in flight). Mutable at runtime via `POST /api/model`. /// in flight). Mutable at runtime via `POST /api/model`.
model: String, model: String,
/// Token usage from the last completed turn. `null` until the /// Last-inference token usage from the most recent completed
/// first turn with usage data finishes. /// turn — represents the current context-window size at turn-end.
token_usage: Option<crate::events::TokenUsage>, /// `null` until the first turn finishes.
ctx_usage: Option<crate::events::TokenUsage>,
/// Cumulative token usage across the most recent turn's inferences
/// (cost signal). `null` until the first turn finishes.
cost_usage: Option<crate::events::TokenUsage>,
} }
#[derive(Serialize)] #[derive(Serialize)]
@ -310,7 +314,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
let inbox = recent_inbox(&state.socket, state.flavor()).await; let inbox = recent_inbox(&state.socket, state.flavor()).await;
let (turn_state, turn_state_since) = state.bus.state_snapshot(); let (turn_state, turn_state_since) = state.bus.state_snapshot();
let model = state.bus.model(); let model = state.bus.model();
let token_usage = state.bus.last_usage(); let ctx_usage = state.bus.last_ctx_usage();
let cost_usage = state.bus.last_cost_usage();
axum::Json(StateSnapshot { axum::Json(StateSnapshot {
seq, seq,
label: state.label.clone(), label: state.label.clone(),
@ -321,7 +326,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
turn_state, turn_state,
turn_state_since, turn_state_since,
model, model,
token_usage, ctx_usage,
cost_usage,
}) })
} }