agent badges: split into ctx (last-inference) + cost (cumulative)
the existing ctx badge was misnamed: it summed `result.usage`, which is
the cumulative tokens billed across every inference in the turn. for
tool-heavy turns that easily exceeds the model's context window (a 600k
cached prefix × 15 sub-calls = 9M cache_read), making it useless as a
"should i compact?" signal.
now two separate badges:
ctx · N last inference's prompt size = actual context window in
use right now. parsed from each `assistant` event's
`.message.usage`; the harness tracks the most recent one
across the stream and snapshots it when the `result`
event lands.
cost · M cumulative tokens billed across the whole turn (the
previous behaviour, now correctly labelled).
both update via a single `TokenUsageChanged { ctx, cost }` SSE event at
turn-end. turn_stats grows four columns (`last_input_tokens`,
`last_output_tokens`, `last_cache_read_input_tokens`,
`last_cache_creation_input_tokens`) so the cold-load seed can paint both
badges on page load. migrations run try-and-ignore ALTERs so existing
agent dbs catch up; pre-migration rows have last-inference zeros and
yield no `ctx` seed (badge stays empty until next turn) rather than a
misleading 0.
This commit is contained in:
parent
14549dd8a9
commit
5c6c607e25
9 changed files with 267 additions and 101 deletions
|
|
@ -310,13 +310,22 @@ Layout, top to bottom:
|
||||||
`turn_state_since`.
|
`turn_state_since`.
|
||||||
- Model chip: `model · <name>` (e.g. `model · haiku`). Driven
|
- Model chip: `model · <name>` (e.g. `model · haiku`). Driven
|
||||||
by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
|
by `LiveEvent::ModelChanged`; emitted from `Bus::set_model`.
|
||||||
- Ctx badge: `ctx · 142k` — total prompt tokens in the
|
- Ctx badge: `ctx · 142k` — last inference's prompt size
|
||||||
current context window (input + cache_read + cache_write),
|
(input + cache_read + cache_write of the most recent
|
||||||
mirroring claude code's bottom-right indicator. Hover for
|
model call in the just-ended turn). This is the **actual
|
||||||
the breakdown including output. Driven by
|
context window utilisation** — the number to watch when
|
||||||
`LiveEvent::TokenUsageChanged`; emitted from
|
deciding whether to compact.
|
||||||
`Bus::record_usage` whenever the terminal `result` event
|
- Cost badge: `cost · 1.3M` — cumulative tokens billed
|
||||||
delivers a fresh usage block.
|
across **every inference** in the last turn (sum of all
|
||||||
|
per-call prompts). Tool-heavy turns rebill the cached
|
||||||
|
prefix per call, so this routinely exceeds the model's
|
||||||
|
window — it's a cost signal, not a size signal.
|
||||||
|
- Both badges driven by `LiveEvent::TokenUsageChanged {
|
||||||
|
ctx, cost }`, emitted once at turn-end from
|
||||||
|
`Bus::record_turn_usage`. The harness tracks per-inference
|
||||||
|
usage by walking `assistant` events in the stream-json
|
||||||
|
and updating `last_inference` on each one; the `result`
|
||||||
|
event supplies `cost` and triggers the emit.
|
||||||
- Last-turn chip: `last turn 12.3s` appears after the first
|
- Last-turn chip: `last turn 12.3s` appears after the first
|
||||||
turn ends, computed from the state-since deltas.
|
turn ends, computed from the state-since deltas.
|
||||||
- `■ cancel turn` button: visible only while state=thinking,
|
- `■ cancel turn` button: visible only while state=thinking,
|
||||||
|
|
@ -437,8 +446,11 @@ Bus events (new vocabulary on `/events/stream`):
|
||||||
`needs_login_idle` / `needs_login_in_progress`. Drives the
|
`needs_login_idle` / `needs_login_in_progress`. Drives the
|
||||||
alive-badge.
|
alive-badge.
|
||||||
- `model_changed { model }` — drives the model chip.
|
- `model_changed { model }` — drives the model chip.
|
||||||
- `token_usage_changed { usage: TokenUsage }` — drives the
|
- `token_usage_changed { ctx: TokenUsage, cost: TokenUsage }`
|
||||||
ctx-badge. Emitted from `Bus::record_usage` whenever the
|
— drives the ctx + cost badges. Emitted from
|
||||||
stream-json `result` event delivers a fresh usage block.
|
`Bus::record_turn_usage` at turn-end; `ctx` is the last
|
||||||
|
inference's usage (current context size), `cost` is the
|
||||||
|
cumulative across every inference (the `result` event's
|
||||||
|
totals).
|
||||||
- `turn_state_changed { state, since_unix }` — drives the
|
- `turn_state_changed { state, since_unix }` — drives the
|
||||||
state badge (`idle`/`thinking`/`compacting`).
|
state badge (`idle`/`thinking`/`compacting`).
|
||||||
|
|
|
||||||
|
|
@ -525,30 +525,43 @@
|
||||||
el_.textContent = 'model · ' + model;
|
el_.textContent = 'model · ' + model;
|
||||||
el_.title = `claude --model ${model}\nset via the operator's /model command; persists across turns until changed`;
|
el_.title = `claude --model ${model}\nset via the operator's /model command; persists across turns until changed`;
|
||||||
}
|
}
|
||||||
// Context badge — mirrors Claude Code's bottom-right "N tokens"
|
// Token badges — two separate chips:
|
||||||
// indicator. Primary number is total prompt tokens used in the
|
// ctx · N last inference's prompt size = current context window
|
||||||
// current context window (input + both cache axes); hover for the
|
// utilisation (what to watch for compaction decisions)
|
||||||
// breakdown including output. Kept as chrome on the state row so
|
// cost · M cumulative billed tokens across the whole last turn
|
||||||
// the terminal stays the star.
|
// (sum across every inference; tool-heavy turns rebill
|
||||||
function renderTokenUsage(u) {
|
// the cached prompt per call and blow past the model's
|
||||||
const el_ = $('ctx-badge');
|
// context window — this is a cost signal, not a size
|
||||||
|
// signal)
|
||||||
|
// Both fed by the same `token_usage_changed` SSE event (`{ ctx, cost }`).
|
||||||
|
const fmtTokens = (n) => {
|
||||||
|
if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M';
|
||||||
|
if (n >= 1_000) return Math.round(n / 1000) + 'k';
|
||||||
|
return String(n);
|
||||||
|
};
|
||||||
|
function renderOneUsage(elId, label, u, blurb) {
|
||||||
|
const el_ = $(elId);
|
||||||
if (!el_) return;
|
if (!el_) return;
|
||||||
if (!u) { el_.hidden = true; return; }
|
if (!u) { el_.hidden = true; return; }
|
||||||
const ctx = u.input_tokens + u.cache_read_input_tokens + u.cache_creation_input_tokens;
|
const total = u.input_tokens + u.cache_read_input_tokens + u.cache_creation_input_tokens;
|
||||||
const fmt = (n) => {
|
|
||||||
if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M';
|
|
||||||
if (n >= 1_000) return Math.round(n / 1000) + 'k';
|
|
||||||
return String(n);
|
|
||||||
};
|
|
||||||
el_.hidden = false;
|
el_.hidden = false;
|
||||||
el_.title = [
|
el_.title = [
|
||||||
'context window in use',
|
blurb,
|
||||||
'input: ' + u.input_tokens,
|
'input: ' + u.input_tokens,
|
||||||
'cache_read: ' + u.cache_read_input_tokens,
|
'cache_read: ' + u.cache_read_input_tokens,
|
||||||
'cache_write: ' + u.cache_creation_input_tokens,
|
'cache_write: ' + u.cache_creation_input_tokens,
|
||||||
'output (last turn): ' + u.output_tokens,
|
'output: ' + u.output_tokens,
|
||||||
].join('\n');
|
].join('\n');
|
||||||
el_.textContent = 'ctx · ' + fmt(ctx);
|
el_.textContent = label + ' · ' + fmtTokens(total);
|
||||||
|
}
|
||||||
|
function renderTokenUsage(ev) {
|
||||||
|
// `ev` is `{ ctx, cost }` either off /api/state cold-load (each may
|
||||||
|
// be null) or off a `token_usage_changed` SSE event (both present
|
||||||
|
// post-turn).
|
||||||
|
renderOneUsage('ctx-badge', 'ctx', ev && ev.ctx,
|
||||||
|
'last-inference prompt size — the actual context window in use right now');
|
||||||
|
renderOneUsage('cost-badge', 'cost', ev && ev.cost,
|
||||||
|
'cumulative tokens billed across the last turn (sum across every inference)');
|
||||||
}
|
}
|
||||||
function renderLastTurn(ms) {
|
function renderLastTurn(ms) {
|
||||||
const el_ = $('last-turn');
|
const el_ = $('last-turn');
|
||||||
|
|
@ -626,7 +639,7 @@
|
||||||
}
|
}
|
||||||
renderAliveBadge(s.status);
|
renderAliveBadge(s.status);
|
||||||
renderModelChip(s.model);
|
renderModelChip(s.model);
|
||||||
renderTokenUsage(s.token_usage);
|
renderTokenUsage({ ctx: s.ctx_usage, cost: s.cost_usage });
|
||||||
// Open-threads aren't part of /api/state (kept on the broker
|
// Open-threads aren't part of /api/state (kept on the broker
|
||||||
// db, fetched via the per-agent socket). Cold-load fetches
|
// db, fetched via the per-agent socket). Cold-load fetches
|
||||||
// it here; turn_end refreshes it via the renderer below.
|
// it here; turn_end refreshes it via the renderer below.
|
||||||
|
|
@ -1026,7 +1039,7 @@
|
||||||
},
|
},
|
||||||
model_changed(ev, api) { if (!api.fromHistory) renderModelChip(ev.model); },
|
model_changed(ev, api) { if (!api.fromHistory) renderModelChip(ev.model); },
|
||||||
token_usage_changed(ev, api) {
|
token_usage_changed(ev, api) {
|
||||||
if (!api.fromHistory) renderTokenUsage(ev.usage);
|
if (!api.fromHistory) renderTokenUsage({ ctx: ev.ctx, cost: ev.cost });
|
||||||
},
|
},
|
||||||
turn_state_changed(ev, api) {
|
turn_state_changed(ev, api) {
|
||||||
if (!api.fromHistory) setStateAbs(ev.state, ev.since_unix);
|
if (!api.fromHistory) setStateAbs(ev.state, ev.since_unix);
|
||||||
|
|
|
||||||
|
|
@ -18,6 +18,7 @@
|
||||||
<span id="state-badge" class="state-badge state-loading">… booting</span>
|
<span id="state-badge" class="state-badge state-loading">… booting</span>
|
||||||
<span id="model-chip" class="model-chip" hidden></span>
|
<span id="model-chip" class="model-chip" hidden></span>
|
||||||
<span id="ctx-badge" class="ctx-badge" hidden title="tokens used in the current context window"></span>
|
<span id="ctx-badge" class="ctx-badge" hidden title="tokens used in the current context window"></span>
|
||||||
|
<span id="cost-badge" class="ctx-badge" hidden title="cumulative tokens billed across the last turn (sum across every inference; tool-heavy turns rebill the cached prompt per call)"></span>
|
||||||
<span id="last-turn" class="last-turn" hidden></span>
|
<span id="last-turn" class="last-turn" hidden></span>
|
||||||
<button type="button" id="cancel-btn" class="btn-cancel-turn" hidden>■ cancel turn</button>
|
<button type="button" id="cancel-btn" class="btn-cancel-turn" hidden>■ cancel turn</button>
|
||||||
<button type="button" id="new-session-btn" class="btn-new-session"
|
<button type="button" id="new-session-btn" class="btn-new-session"
|
||||||
|
|
|
||||||
|
|
@ -74,10 +74,11 @@ async fn main() -> Result<()> {
|
||||||
let login_state = Arc::new(Mutex::new(initial));
|
let login_state = Arc::new(Mutex::new(initial));
|
||||||
let bus = Bus::new();
|
let bus = Bus::new();
|
||||||
let stats = TurnStats::open_default();
|
let stats = TurnStats::open_default();
|
||||||
if let Some(s) = &stats
|
if let Some(s) = &stats {
|
||||||
&& let Some(u) = s.last_usage()
|
let (ctx, cost) = s.last_usage();
|
||||||
{
|
if ctx.is_some() || cost.is_some() {
|
||||||
bus.seed_usage(u);
|
bus.seed_usage(ctx, cost);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Agent).await?;
|
let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Agent).await?;
|
||||||
let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(()));
|
let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(()));
|
||||||
|
|
@ -354,7 +355,8 @@ fn build_row(
|
||||||
open_threads_count: Option<u64>,
|
open_threads_count: Option<u64>,
|
||||||
open_reminders_count: Option<u64>,
|
open_reminders_count: Option<u64>,
|
||||||
) -> TurnStatRow {
|
) -> TurnStatRow {
|
||||||
let usage = bus.last_usage().unwrap_or_default();
|
let cost = bus.last_cost_usage().unwrap_or_default();
|
||||||
|
let ctx = bus.last_ctx_usage().unwrap_or(cost);
|
||||||
let tool_calls = bus.take_tool_calls();
|
let tool_calls = bus.take_tool_calls();
|
||||||
let tool_call_count: u64 = tool_calls.values().copied().sum();
|
let tool_call_count: u64 = tool_calls.values().copied().sum();
|
||||||
let tool_call_breakdown_json = if tool_calls.is_empty() {
|
let tool_call_breakdown_json = if tool_calls.is_empty() {
|
||||||
|
|
@ -373,10 +375,14 @@ fn build_row(
|
||||||
duration_ms,
|
duration_ms,
|
||||||
model,
|
model,
|
||||||
wake_from,
|
wake_from,
|
||||||
input_tokens: usage.input_tokens,
|
input_tokens: cost.input_tokens,
|
||||||
output_tokens: usage.output_tokens,
|
output_tokens: cost.output_tokens,
|
||||||
cache_read_input_tokens: usage.cache_read_input_tokens,
|
cache_read_input_tokens: cost.cache_read_input_tokens,
|
||||||
cache_creation_input_tokens: usage.cache_creation_input_tokens,
|
cache_creation_input_tokens: cost.cache_creation_input_tokens,
|
||||||
|
last_input_tokens: ctx.input_tokens,
|
||||||
|
last_output_tokens: ctx.output_tokens,
|
||||||
|
last_cache_read_input_tokens: ctx.cache_read_input_tokens,
|
||||||
|
last_cache_creation_input_tokens: ctx.cache_creation_input_tokens,
|
||||||
tool_call_count,
|
tool_call_count,
|
||||||
tool_call_breakdown_json,
|
tool_call_breakdown_json,
|
||||||
open_threads_count,
|
open_threads_count,
|
||||||
|
|
|
||||||
|
|
@ -64,10 +64,11 @@ async fn main() -> Result<()> {
|
||||||
let login_state = Arc::new(Mutex::new(initial));
|
let login_state = Arc::new(Mutex::new(initial));
|
||||||
let bus = Bus::new();
|
let bus = Bus::new();
|
||||||
let stats = TurnStats::open_default();
|
let stats = TurnStats::open_default();
|
||||||
if let Some(s) = &stats
|
if let Some(s) = &stats {
|
||||||
&& let Some(u) = s.last_usage()
|
let (ctx, cost) = s.last_usage();
|
||||||
{
|
if ctx.is_some() || cost.is_some() {
|
||||||
bus.seed_usage(u);
|
bus.seed_usage(ctx, cost);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Manager).await?;
|
let files = turn::TurnFiles::prepare(&cli.socket, &label, mcp::Flavor::Manager).await?;
|
||||||
let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(()));
|
let turn_lock: TurnLock = Arc::new(tokio::sync::Mutex::new(()));
|
||||||
|
|
@ -291,7 +292,8 @@ fn build_row(
|
||||||
open_threads_count: Option<u64>,
|
open_threads_count: Option<u64>,
|
||||||
open_reminders_count: Option<u64>,
|
open_reminders_count: Option<u64>,
|
||||||
) -> TurnStatRow {
|
) -> TurnStatRow {
|
||||||
let usage = bus.last_usage().unwrap_or_default();
|
let cost = bus.last_cost_usage().unwrap_or_default();
|
||||||
|
let ctx = bus.last_ctx_usage().unwrap_or(cost);
|
||||||
let tool_calls = bus.take_tool_calls();
|
let tool_calls = bus.take_tool_calls();
|
||||||
let tool_call_count: u64 = tool_calls.values().copied().sum();
|
let tool_call_count: u64 = tool_calls.values().copied().sum();
|
||||||
let tool_call_breakdown_json = if tool_calls.is_empty() {
|
let tool_call_breakdown_json = if tool_calls.is_empty() {
|
||||||
|
|
@ -310,10 +312,14 @@ fn build_row(
|
||||||
duration_ms,
|
duration_ms,
|
||||||
model,
|
model,
|
||||||
wake_from,
|
wake_from,
|
||||||
input_tokens: usage.input_tokens,
|
input_tokens: cost.input_tokens,
|
||||||
output_tokens: usage.output_tokens,
|
output_tokens: cost.output_tokens,
|
||||||
cache_read_input_tokens: usage.cache_read_input_tokens,
|
cache_read_input_tokens: cost.cache_read_input_tokens,
|
||||||
cache_creation_input_tokens: usage.cache_creation_input_tokens,
|
cache_creation_input_tokens: cost.cache_creation_input_tokens,
|
||||||
|
last_input_tokens: ctx.input_tokens,
|
||||||
|
last_output_tokens: ctx.output_tokens,
|
||||||
|
last_cache_read_input_tokens: ctx.cache_read_input_tokens,
|
||||||
|
last_cache_creation_input_tokens: ctx.cache_creation_input_tokens,
|
||||||
tool_call_count,
|
tool_call_count,
|
||||||
tool_call_breakdown_json,
|
tool_call_breakdown_json,
|
||||||
open_threads_count,
|
open_threads_count,
|
||||||
|
|
|
||||||
|
|
@ -130,10 +130,15 @@ pub enum LiveEvent {
|
||||||
/// updates the chip + the per-turn stats sink will key off this
|
/// updates the chip + the per-turn stats sink will key off this
|
||||||
/// to mark the boundary in its log.
|
/// to mark the boundary in its log.
|
||||||
ModelChanged { model: String },
|
ModelChanged { model: String },
|
||||||
/// Final-turn `usage` block landed (input + output + cache
|
/// Token usage for the turn just ended. Carries two snapshots:
|
||||||
/// counters). Powers the context-window badge + accumulates into
|
/// - `ctx` is the LAST inference's usage block (the actual context
|
||||||
/// the per-turn stats sink.
|
/// window in use right now — what the operator needs to decide
|
||||||
TokenUsageChanged { usage: TokenUsage },
|
/// whether to compact / reset).
|
||||||
|
/// - `cost` is the cumulative usage across every inference in the
|
||||||
|
/// turn (sum of per-call billed tokens — the cost signal). For
|
||||||
|
/// tool-heavy turns the cumulative blows past the model's window
|
||||||
|
/// because each tool call's prompt is rebilled.
|
||||||
|
TokenUsageChanged { ctx: TokenUsage, cost: TokenUsage },
|
||||||
/// Harness's `TurnState` transitioned (idle / thinking /
|
/// Harness's `TurnState` transitioned (idle / thinking /
|
||||||
/// compacting). `since_unix` matches `Bus::state_snapshot().1`
|
/// compacting). `since_unix` matches `Bus::state_snapshot().1`
|
||||||
/// so the client's elapsed-time ticker keeps progressing across
|
/// so the client's elapsed-time ticker keeps progressing across
|
||||||
|
|
@ -221,15 +226,29 @@ impl TokenUsage {
|
||||||
self.input_tokens + self.cache_read_input_tokens + self.cache_creation_input_tokens
|
self.input_tokens + self.cache_read_input_tokens + self.cache_creation_input_tokens
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Parse usage from a stream-json event. Returns `Some` only for the
|
/// Parse usage from the terminal `result` stream-json event. This is the
|
||||||
/// terminal `result` event (which is the only one that carries `usage`);
|
/// **cumulative** sum across every inference in the turn — useful as a
|
||||||
/// every other event maps to `None`. Missing numeric fields default to 0
|
/// cost signal, but NOT the current context size (a tool-heavy turn
|
||||||
/// so partial server payloads don't drop the whole snapshot.
|
/// sums per-call cached prompts and easily exceeds the model window).
|
||||||
pub fn from_stream_event(v: &serde_json::Value) -> Option<Self> {
|
pub fn from_stream_event(v: &serde_json::Value) -> Option<Self> {
|
||||||
if v.get("type").and_then(|t| t.as_str()) != Some("result") {
|
if v.get("type").and_then(|t| t.as_str()) != Some("result") {
|
||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
let u = v.get("usage")?;
|
Self::from_usage_obj(v.get("usage")?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse usage from a per-inference `assistant` event's
|
||||||
|
/// `.message.usage` block. Each turn fires one of these for every
|
||||||
|
/// model call; tracking the LAST one over the turn gives the actual
|
||||||
|
/// conversation context size — the number to watch for compaction.
|
||||||
|
pub fn from_assistant_event(v: &serde_json::Value) -> Option<Self> {
|
||||||
|
if v.get("type").and_then(|t| t.as_str()) != Some("assistant") {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
Self::from_usage_obj(v.get("message")?.get("usage")?)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn from_usage_obj(u: &serde_json::Value) -> Option<Self> {
|
||||||
let field = |k: &str| u.get(k).and_then(serde_json::Value::as_u64).unwrap_or(0);
|
let field = |k: &str| u.get(k).and_then(serde_json::Value::as_u64).unwrap_or(0);
|
||||||
Some(Self {
|
Some(Self {
|
||||||
input_tokens: field("input_tokens"),
|
input_tokens: field("input_tokens"),
|
||||||
|
|
@ -281,12 +300,16 @@ pub struct Bus {
|
||||||
/// Model name passed to `claude --model`. Default `haiku`; the
|
/// Model name passed to `claude --model`. Default `haiku`; the
|
||||||
/// operator can override at runtime via `POST /api/model`.
|
/// operator can override at runtime via `POST /api/model`.
|
||||||
model: Arc<Mutex<String>>,
|
model: Arc<Mutex<String>>,
|
||||||
/// Last token usage reported by claude (from the `result` stream-json
|
/// Last-inference token usage from the most recent turn's final
|
||||||
/// event). `None` until the first turn with usage data completes.
|
/// `assistant` event. Represents the actual context window size at
|
||||||
/// Updated on every turn; survives across turns within one harness
|
/// turn-end — the number the operator watches to decide whether to
|
||||||
/// process lifetime (resets on container restart, which is fine —
|
/// compact. `None` until the first turn completes.
|
||||||
/// it's a live indicator, not a cumulative counter).
|
last_ctx_usage: Arc<Mutex<Option<TokenUsage>>>,
|
||||||
last_usage: Arc<Mutex<Option<TokenUsage>>>,
|
/// Cumulative token usage from the most recent turn's `result`
|
||||||
|
/// event (sum across every inference in the turn). This is the cost
|
||||||
|
/// signal — tool-heavy turns rebill the cached prompt per call and
|
||||||
|
/// blow past the model window. `None` until the first turn completes.
|
||||||
|
last_cost_usage: Arc<Mutex<Option<TokenUsage>>>,
|
||||||
/// One-shot: next `run_claude` call drops `--continue`, starting
|
/// One-shot: next `run_claude` call drops `--continue`, starting
|
||||||
/// a fresh claude session. Set by `POST /api/new-session` from
|
/// a fresh claude session. Set by `POST /api/new-session` from
|
||||||
/// the per-agent web UI; consumed (cleared back to false) by the
|
/// the per-agent web UI; consumed (cleared back to false) by the
|
||||||
|
|
@ -323,7 +346,8 @@ impl Bus {
|
||||||
store,
|
store,
|
||||||
state: Arc::new(Mutex::new((TurnState::Idle, now_unix()))),
|
state: Arc::new(Mutex::new((TurnState::Idle, now_unix()))),
|
||||||
model: Arc::new(Mutex::new(initial_model)),
|
model: Arc::new(Mutex::new(initial_model)),
|
||||||
last_usage: Arc::new(Mutex::new(None)),
|
last_ctx_usage: Arc::new(Mutex::new(None)),
|
||||||
|
last_cost_usage: Arc::new(Mutex::new(None)),
|
||||||
skip_continue_once: Arc::new(AtomicBool::new(false)),
|
skip_continue_once: Arc::new(AtomicBool::new(false)),
|
||||||
tool_calls: Arc::new(Mutex::new(std::collections::HashMap::new())),
|
tool_calls: Arc::new(Mutex::new(std::collections::HashMap::new())),
|
||||||
}
|
}
|
||||||
|
|
@ -378,19 +402,27 @@ impl Bus {
|
||||||
self.emit(LiveEvent::ModelChanged { model: value });
|
self.emit(LiveEvent::ModelChanged { model: value });
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Seed `last_usage` at startup without emitting a SSE event.
|
/// Seed `last_ctx_usage` + `last_cost_usage` at startup without
|
||||||
/// Used by the bin entrypoints to backfill from the most recent
|
/// emitting a SSE event. Used by the bin entrypoints to backfill
|
||||||
/// `turn_stats` row so the per-agent web UI's `ctx-badge` paints
|
/// from the most recent `turn_stats` row so the per-agent web UI's
|
||||||
/// real numbers on cold load instead of staying empty until the
|
/// ctx + cost badges paint real numbers on cold load.
|
||||||
/// next turn finishes.
|
pub fn seed_usage(&self, ctx: Option<TokenUsage>, cost: Option<TokenUsage>) {
|
||||||
pub fn seed_usage(&self, usage: TokenUsage) {
|
if ctx.is_some() {
|
||||||
*self.last_usage.lock().unwrap() = Some(usage);
|
*self.last_ctx_usage.lock().unwrap() = ctx;
|
||||||
|
}
|
||||||
|
if cost.is_some() {
|
||||||
|
*self.last_cost_usage.lock().unwrap() = cost;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Record the latest token usage from a completed turn.
|
/// Record the just-ended turn's usage. `ctx` is the last inference's
|
||||||
pub fn record_usage(&self, usage: TokenUsage) {
|
/// usage (current context size); `cost` is the cumulative across
|
||||||
*self.last_usage.lock().unwrap() = Some(usage);
|
/// every inference in the turn (cost signal). One SSE event fires
|
||||||
self.emit(LiveEvent::TokenUsageChanged { usage });
|
/// per turn carrying both.
|
||||||
|
pub fn record_turn_usage(&self, ctx: TokenUsage, cost: TokenUsage) {
|
||||||
|
*self.last_ctx_usage.lock().unwrap() = Some(ctx);
|
||||||
|
*self.last_cost_usage.lock().unwrap() = Some(cost);
|
||||||
|
self.emit(LiveEvent::TokenUsageChanged { ctx, cost });
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Walk a stream-json value for `tool_use` blocks and bump the
|
/// Walk a stream-json value for `tool_use` blocks and bump the
|
||||||
|
|
@ -430,10 +462,18 @@ impl Bus {
|
||||||
std::mem::take(&mut *self.tool_calls.lock().unwrap())
|
std::mem::take(&mut *self.tool_calls.lock().unwrap())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Last known token usage, or `None` if no turn has completed yet.
|
/// Last context-size snapshot (last inference of the most recent
|
||||||
|
/// turn), or `None` if no turn has completed yet.
|
||||||
#[must_use]
|
#[must_use]
|
||||||
pub fn last_usage(&self) -> Option<TokenUsage> {
|
pub fn last_ctx_usage(&self) -> Option<TokenUsage> {
|
||||||
*self.last_usage.lock().unwrap()
|
*self.last_ctx_usage.lock().unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Last cumulative cost snapshot (sum across the most recent turn's
|
||||||
|
/// inferences), or `None` if no turn has completed yet.
|
||||||
|
#[must_use]
|
||||||
|
pub fn last_cost_usage(&self) -> Option<TokenUsage> {
|
||||||
|
*self.last_cost_usage.lock().unwrap()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Update the harness's authoritative turn-loop state. Records
|
/// Update the harness's authoritative turn-loop state. Records
|
||||||
|
|
|
||||||
|
|
@ -279,14 +279,28 @@ async fn run_claude(prompt: &str, files: &TurnFiles, bus: &Bus) -> Result<bool>
|
||||||
let bus_err = bus.clone();
|
let bus_err = bus.clone();
|
||||||
let pump_stdout = tokio::spawn(async move {
|
let pump_stdout = tokio::spawn(async move {
|
||||||
let mut reader = BufReader::new(stdout).lines();
|
let mut reader = BufReader::new(stdout).lines();
|
||||||
|
// Track usage as the turn unfolds. `last_inference` overwrites on
|
||||||
|
// every assistant event so at result-time it holds the most recent
|
||||||
|
// model call's usage — the actual context size. The `result` event
|
||||||
|
// carries the cumulative-across-the-turn usage (cost signal). Both
|
||||||
|
// get handed to `record_turn_usage` together so a single SSE
|
||||||
|
// event updates both badges.
|
||||||
|
let mut last_inference: Option<crate::events::TokenUsage> = None;
|
||||||
while let Ok(Some(line)) = reader.next_line().await {
|
while let Ok(Some(line)) = reader.next_line().await {
|
||||||
if line.contains(PROMPT_TOO_LONG_MARKER) {
|
if line.contains(PROMPT_TOO_LONG_MARKER) {
|
||||||
flag_out.store(true, Ordering::Relaxed);
|
flag_out.store(true, Ordering::Relaxed);
|
||||||
}
|
}
|
||||||
match serde_json::from_str::<serde_json::Value>(&line) {
|
match serde_json::from_str::<serde_json::Value>(&line) {
|
||||||
Ok(v) => {
|
Ok(v) => {
|
||||||
if let Some(usage) = crate::events::TokenUsage::from_stream_event(&v) {
|
if let Some(u) = crate::events::TokenUsage::from_assistant_event(&v) {
|
||||||
bus_out.record_usage(usage);
|
last_inference = Some(u);
|
||||||
|
}
|
||||||
|
if let Some(cost) = crate::events::TokenUsage::from_stream_event(&v) {
|
||||||
|
// Fallback to `cost` if the turn somehow produced
|
||||||
|
// a result without any assistant event — keeps the
|
||||||
|
// ctx badge from going stale on a degenerate turn.
|
||||||
|
let ctx = last_inference.unwrap_or(cost);
|
||||||
|
bus_out.record_turn_usage(ctx, cost);
|
||||||
}
|
}
|
||||||
bus_out.observe_stream(&v);
|
bus_out.observe_stream(&v);
|
||||||
bus_out.emit(LiveEvent::Stream(v));
|
bus_out.emit(LiveEvent::Stream(v));
|
||||||
|
|
|
||||||
|
|
@ -22,8 +22,9 @@ use anyhow::{Context, Result};
|
||||||
use rusqlite::{Connection, params};
|
use rusqlite::{Connection, params};
|
||||||
|
|
||||||
/// SQL bootstrap. CREATE TABLE IF NOT EXISTS so first-boot agents
|
/// SQL bootstrap. CREATE TABLE IF NOT EXISTS so first-boot agents
|
||||||
/// and existing ones converge on the same shape; ALTER-style
|
/// and existing ones converge on the same shape. The base table is
|
||||||
/// migrations land here as additional statements once we have any.
|
/// fresh-install only; additive migrations land via `MIGRATIONS`
|
||||||
|
/// below as try-and-ignore ALTERs so existing dbs catch up.
|
||||||
const SCHEMA: &str = "
|
const SCHEMA: &str = "
|
||||||
CREATE TABLE IF NOT EXISTS turn_stats (
|
CREATE TABLE IF NOT EXISTS turn_stats (
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
|
@ -36,6 +37,10 @@ CREATE TABLE IF NOT EXISTS turn_stats (
|
||||||
output_tokens INTEGER NOT NULL DEFAULT 0,
|
output_tokens INTEGER NOT NULL DEFAULT 0,
|
||||||
cache_read_input_tokens INTEGER NOT NULL DEFAULT 0,
|
cache_read_input_tokens INTEGER NOT NULL DEFAULT 0,
|
||||||
cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0,
|
cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_input_tokens INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_output_tokens INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_cache_read_input_tokens INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0,
|
||||||
tool_call_count INTEGER NOT NULL DEFAULT 0,
|
tool_call_count INTEGER NOT NULL DEFAULT 0,
|
||||||
tool_call_breakdown_json TEXT,
|
tool_call_breakdown_json TEXT,
|
||||||
open_threads_count INTEGER,
|
open_threads_count INTEGER,
|
||||||
|
|
@ -47,6 +52,17 @@ CREATE INDEX IF NOT EXISTS idx_turn_stats_started
|
||||||
ON turn_stats (started_at DESC);
|
ON turn_stats (started_at DESC);
|
||||||
";
|
";
|
||||||
|
|
||||||
|
/// Additive column migrations. Each runs unconditionally and ignores
|
||||||
|
/// `duplicate column name` errors — sqlite < 3.35 lacks
|
||||||
|
/// `ADD COLUMN IF NOT EXISTS`, so try-and-ignore is the portable path.
|
||||||
|
/// New columns MUST carry a default so existing rows decode.
|
||||||
|
const MIGRATIONS: &[&str] = &[
|
||||||
|
"ALTER TABLE turn_stats ADD COLUMN last_input_tokens INTEGER NOT NULL DEFAULT 0",
|
||||||
|
"ALTER TABLE turn_stats ADD COLUMN last_output_tokens INTEGER NOT NULL DEFAULT 0",
|
||||||
|
"ALTER TABLE turn_stats ADD COLUMN last_cache_read_input_tokens INTEGER NOT NULL DEFAULT 0",
|
||||||
|
"ALTER TABLE turn_stats ADD COLUMN last_cache_creation_input_tokens INTEGER NOT NULL DEFAULT 0",
|
||||||
|
];
|
||||||
|
|
||||||
/// One row to be inserted. `Option`-wrapped fields default to NULL
|
/// One row to be inserted. `Option`-wrapped fields default to NULL
|
||||||
/// when the harness couldn't gather them (e.g. socket roundtrip for
|
/// when the harness couldn't gather them (e.g. socket roundtrip for
|
||||||
/// open_threads failed) so a partial row beats no row.
|
/// open_threads failed) so a partial row beats no row.
|
||||||
|
|
@ -57,10 +73,16 @@ pub struct TurnStatRow {
|
||||||
pub duration_ms: i64,
|
pub duration_ms: i64,
|
||||||
pub model: String,
|
pub model: String,
|
||||||
pub wake_from: String,
|
pub wake_from: String,
|
||||||
|
/// Cumulative across every inference in the turn (cost signal).
|
||||||
pub input_tokens: u64,
|
pub input_tokens: u64,
|
||||||
pub output_tokens: u64,
|
pub output_tokens: u64,
|
||||||
pub cache_read_input_tokens: u64,
|
pub cache_read_input_tokens: u64,
|
||||||
pub cache_creation_input_tokens: u64,
|
pub cache_creation_input_tokens: u64,
|
||||||
|
/// Last inference's usage — the actual context size at turn end.
|
||||||
|
pub last_input_tokens: u64,
|
||||||
|
pub last_output_tokens: u64,
|
||||||
|
pub last_cache_read_input_tokens: u64,
|
||||||
|
pub last_cache_creation_input_tokens: u64,
|
||||||
pub tool_call_count: u64,
|
pub tool_call_count: u64,
|
||||||
/// Per-tool breakdown as JSON: `{"Read":12,"Bash":3,...}`. None
|
/// Per-tool breakdown as JSON: `{"Read":12,"Bash":3,...}`. None
|
||||||
/// when no tools were called (saves a sqlite write of `"{}"`).
|
/// when no tools were called (saves a sqlite write of `"{}"`).
|
||||||
|
|
@ -107,6 +129,18 @@ impl TurnStats {
|
||||||
.with_context(|| format!("open turn_stats db {}", path.display()))?;
|
.with_context(|| format!("open turn_stats db {}", path.display()))?;
|
||||||
conn.execute_batch(SCHEMA)
|
conn.execute_batch(SCHEMA)
|
||||||
.context("apply turn_stats schema")?;
|
.context("apply turn_stats schema")?;
|
||||||
|
for stmt in MIGRATIONS {
|
||||||
|
// Ignore "duplicate column name" — the migration already ran.
|
||||||
|
// Any other error is logged but doesn't fail open() because the
|
||||||
|
// base schema works and we'd rather keep the harness alive than
|
||||||
|
// crash on an upgrade hiccup.
|
||||||
|
if let Err(e) = conn.execute(stmt, []) {
|
||||||
|
let msg = e.to_string();
|
||||||
|
if !msg.contains("duplicate column name") {
|
||||||
|
tracing::warn!(error = %msg, stmt, "turn_stats migration failed");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
inner: std::sync::Arc::new(Mutex::new(conn)),
|
inner: std::sync::Arc::new(Mutex::new(conn)),
|
||||||
})
|
})
|
||||||
|
|
@ -121,6 +155,8 @@ impl TurnStats {
|
||||||
started_at, ended_at, duration_ms, model, wake_from,
|
started_at, ended_at, duration_ms, model, wake_from,
|
||||||
input_tokens, output_tokens,
|
input_tokens, output_tokens,
|
||||||
cache_read_input_tokens, cache_creation_input_tokens,
|
cache_read_input_tokens, cache_creation_input_tokens,
|
||||||
|
last_input_tokens, last_output_tokens,
|
||||||
|
last_cache_read_input_tokens, last_cache_creation_input_tokens,
|
||||||
tool_call_count, tool_call_breakdown_json,
|
tool_call_count, tool_call_breakdown_json,
|
||||||
open_threads_count, open_reminders_count,
|
open_threads_count, open_reminders_count,
|
||||||
result_kind, note
|
result_kind, note
|
||||||
|
|
@ -130,7 +166,9 @@ impl TurnStats {
|
||||||
?8, ?9,
|
?8, ?9,
|
||||||
?10, ?11,
|
?10, ?11,
|
||||||
?12, ?13,
|
?12, ?13,
|
||||||
?14, ?15
|
?14, ?15,
|
||||||
|
?16, ?17,
|
||||||
|
?18, ?19
|
||||||
)",
|
)",
|
||||||
params![
|
params![
|
||||||
row.started_at,
|
row.started_at,
|
||||||
|
|
@ -142,6 +180,10 @@ impl TurnStats {
|
||||||
i64::try_from(row.output_tokens).unwrap_or(i64::MAX),
|
i64::try_from(row.output_tokens).unwrap_or(i64::MAX),
|
||||||
i64::try_from(row.cache_read_input_tokens).unwrap_or(i64::MAX),
|
i64::try_from(row.cache_read_input_tokens).unwrap_or(i64::MAX),
|
||||||
i64::try_from(row.cache_creation_input_tokens).unwrap_or(i64::MAX),
|
i64::try_from(row.cache_creation_input_tokens).unwrap_or(i64::MAX),
|
||||||
|
i64::try_from(row.last_input_tokens).unwrap_or(i64::MAX),
|
||||||
|
i64::try_from(row.last_output_tokens).unwrap_or(i64::MAX),
|
||||||
|
i64::try_from(row.last_cache_read_input_tokens).unwrap_or(i64::MAX),
|
||||||
|
i64::try_from(row.last_cache_creation_input_tokens).unwrap_or(i64::MAX),
|
||||||
i64::try_from(row.tool_call_count).unwrap_or(i64::MAX),
|
i64::try_from(row.tool_call_count).unwrap_or(i64::MAX),
|
||||||
row.tool_call_breakdown_json,
|
row.tool_call_breakdown_json,
|
||||||
row.open_threads_count
|
row.open_threads_count
|
||||||
|
|
@ -157,32 +199,58 @@ impl TurnStats {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Token counts from the most recently inserted row, if any. Lets
|
/// Token counts from the most recently inserted row, if any.
|
||||||
/// the harness seed `Bus::last_usage` on startup so the per-agent
|
/// Returns `(ctx, cost)` — both backfill `Bus` on startup so the
|
||||||
/// web UI's `ctx-badge` paints with real numbers on cold load
|
/// per-agent web UI's ctx + cost badges paint with real numbers on
|
||||||
/// instead of waiting for the next `TokenUsageChanged` SSE event.
|
/// cold load instead of waiting for the next `TokenUsageChanged`
|
||||||
/// Best-effort: any sqlite error returns `None` and the caller
|
/// SSE event. Best-effort: any sqlite error returns `(None, None)`.
|
||||||
/// falls back to the empty state.
|
///
|
||||||
|
/// Pre-migration rows (before the `last_*_tokens` columns existed)
|
||||||
|
/// have last-inference zeros — those rows yield `ctx = None` so the
|
||||||
|
/// badge stays empty until the next real turn rather than showing a
|
||||||
|
/// misleading 0.
|
||||||
#[must_use]
|
#[must_use]
|
||||||
pub fn last_usage(&self) -> Option<crate::events::TokenUsage> {
|
pub fn last_usage(
|
||||||
|
&self,
|
||||||
|
) -> (
|
||||||
|
Option<crate::events::TokenUsage>,
|
||||||
|
Option<crate::events::TokenUsage>,
|
||||||
|
) {
|
||||||
let conn = self.inner.lock().unwrap();
|
let conn = self.inner.lock().unwrap();
|
||||||
conn.query_row(
|
conn.query_row(
|
||||||
"SELECT input_tokens, output_tokens,
|
"SELECT input_tokens, output_tokens,
|
||||||
cache_read_input_tokens, cache_creation_input_tokens
|
cache_read_input_tokens, cache_creation_input_tokens,
|
||||||
|
last_input_tokens, last_output_tokens,
|
||||||
|
last_cache_read_input_tokens, last_cache_creation_input_tokens
|
||||||
FROM turn_stats
|
FROM turn_stats
|
||||||
ORDER BY started_at DESC
|
ORDER BY started_at DESC
|
||||||
LIMIT 1",
|
LIMIT 1",
|
||||||
[],
|
[],
|
||||||
|row| {
|
|row| {
|
||||||
Ok(crate::events::TokenUsage {
|
let g = |i: usize| -> rusqlite::Result<u64> {
|
||||||
input_tokens: u64::try_from(row.get::<_, i64>(0)?).unwrap_or(0),
|
Ok(u64::try_from(row.get::<_, i64>(i)?).unwrap_or(0))
|
||||||
output_tokens: u64::try_from(row.get::<_, i64>(1)?).unwrap_or(0),
|
};
|
||||||
cache_read_input_tokens: u64::try_from(row.get::<_, i64>(2)?).unwrap_or(0),
|
let cost = crate::events::TokenUsage {
|
||||||
cache_creation_input_tokens: u64::try_from(row.get::<_, i64>(3)?).unwrap_or(0),
|
input_tokens: g(0)?,
|
||||||
})
|
output_tokens: g(1)?,
|
||||||
|
cache_read_input_tokens: g(2)?,
|
||||||
|
cache_creation_input_tokens: g(3)?,
|
||||||
|
};
|
||||||
|
let last = crate::events::TokenUsage {
|
||||||
|
input_tokens: g(4)?,
|
||||||
|
output_tokens: g(5)?,
|
||||||
|
cache_read_input_tokens: g(6)?,
|
||||||
|
cache_creation_input_tokens: g(7)?,
|
||||||
|
};
|
||||||
|
let ctx = if last == crate::events::TokenUsage::default() {
|
||||||
|
None
|
||||||
|
} else {
|
||||||
|
Some(last)
|
||||||
|
};
|
||||||
|
Ok((ctx, Some(cost)))
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
.ok()
|
.unwrap_or((None, None))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -225,9 +225,13 @@ struct StateSnapshot {
|
||||||
/// the operator can see what they just switched to (and what's
|
/// the operator can see what they just switched to (and what's
|
||||||
/// in flight). Mutable at runtime via `POST /api/model`.
|
/// in flight). Mutable at runtime via `POST /api/model`.
|
||||||
model: String,
|
model: String,
|
||||||
/// Token usage from the last completed turn. `null` until the
|
/// Last-inference token usage from the most recent completed
|
||||||
/// first turn with usage data finishes.
|
/// turn — represents the current context-window size at turn-end.
|
||||||
token_usage: Option<crate::events::TokenUsage>,
|
/// `null` until the first turn finishes.
|
||||||
|
ctx_usage: Option<crate::events::TokenUsage>,
|
||||||
|
/// Cumulative token usage across the most recent turn's inferences
|
||||||
|
/// (cost signal). `null` until the first turn finishes.
|
||||||
|
cost_usage: Option<crate::events::TokenUsage>,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Serialize)]
|
#[derive(Serialize)]
|
||||||
|
|
@ -310,7 +314,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
|
||||||
let inbox = recent_inbox(&state.socket, state.flavor()).await;
|
let inbox = recent_inbox(&state.socket, state.flavor()).await;
|
||||||
let (turn_state, turn_state_since) = state.bus.state_snapshot();
|
let (turn_state, turn_state_since) = state.bus.state_snapshot();
|
||||||
let model = state.bus.model();
|
let model = state.bus.model();
|
||||||
let token_usage = state.bus.last_usage();
|
let ctx_usage = state.bus.last_ctx_usage();
|
||||||
|
let cost_usage = state.bus.last_cost_usage();
|
||||||
axum::Json(StateSnapshot {
|
axum::Json(StateSnapshot {
|
||||||
seq,
|
seq,
|
||||||
label: state.label.clone(),
|
label: state.label.clone(),
|
||||||
|
|
@ -321,7 +326,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
|
||||||
turn_state,
|
turn_state,
|
||||||
turn_state_since,
|
turn_state_since,
|
||||||
model,
|
model,
|
||||||
token_usage,
|
ctx_usage,
|
||||||
|
cost_usage,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue