agent badges: split into ctx (last-inference) + cost (cumulative)
the existing ctx badge was misnamed: it summed `result.usage`, which is
the cumulative tokens billed across every inference in the turn. for
tool-heavy turns that easily exceeds the model's context window (a 600k
cached prefix × 15 sub-calls = 9M cache_read), making it useless as a
"should i compact?" signal.
now two separate badges:
ctx · N last inference's prompt size = actual context window in
use right now. parsed from each `assistant` event's
`.message.usage`; the harness tracks the most recent one
across the stream and snapshots it when the `result`
event lands.
cost · M cumulative tokens billed across the whole turn (the
previous behaviour, now correctly labelled).
both update via a single `TokenUsageChanged { ctx, cost }` SSE event at
turn-end. turn_stats grows four columns (`last_input_tokens`,
`last_output_tokens`, `last_cache_read_input_tokens`,
`last_cache_creation_input_tokens`) so the cold-load seed can paint both
badges on page load. migrations run try-and-ignore ALTERs so existing
agent dbs catch up; pre-migration rows have last-inference zeros and
yield no `ctx` seed (badge stays empty until next turn) rather than a
misleading 0.
This commit is contained in:
parent
14549dd8a9
commit
5c6c607e25
9 changed files with 267 additions and 101 deletions
|
|
@ -225,9 +225,13 @@ struct StateSnapshot {
|
|||
/// the operator can see what they just switched to (and what's
|
||||
/// in flight). Mutable at runtime via `POST /api/model`.
|
||||
model: String,
|
||||
/// Token usage from the last completed turn. `null` until the
|
||||
/// first turn with usage data finishes.
|
||||
token_usage: Option<crate::events::TokenUsage>,
|
||||
/// Last-inference token usage from the most recent completed
|
||||
/// turn — represents the current context-window size at turn-end.
|
||||
/// `null` until the first turn finishes.
|
||||
ctx_usage: Option<crate::events::TokenUsage>,
|
||||
/// Cumulative token usage across the most recent turn's inferences
|
||||
/// (cost signal). `null` until the first turn finishes.
|
||||
cost_usage: Option<crate::events::TokenUsage>,
|
||||
}
|
||||
|
||||
#[derive(Serialize)]
|
||||
|
|
@ -310,7 +314,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
|
|||
let inbox = recent_inbox(&state.socket, state.flavor()).await;
|
||||
let (turn_state, turn_state_since) = state.bus.state_snapshot();
|
||||
let model = state.bus.model();
|
||||
let token_usage = state.bus.last_usage();
|
||||
let ctx_usage = state.bus.last_ctx_usage();
|
||||
let cost_usage = state.bus.last_cost_usage();
|
||||
axum::Json(StateSnapshot {
|
||||
seq,
|
||||
label: state.label.clone(),
|
||||
|
|
@ -321,7 +326,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
|
|||
turn_state,
|
||||
turn_state_since,
|
||||
model,
|
||||
token_usage,
|
||||
ctx_usage,
|
||||
cost_usage,
|
||||
})
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue