agent badges: split into ctx (last-inference) + cost (cumulative)

the existing ctx badge was misnamed: it summed `result.usage`, which is the cumulative tokens billed across every inference in the turn. for tool-heavy turns that easily exceeds the model's context window (a 600k cached prefix × 15 sub-calls = 9M cache_read), making it useless as a "should i compact?" signal. now two separate badges: ctx · N last inference's prompt size = actual context window in use right now. parsed from each `assistant` event's `.message.usage`; the harness tracks the most recent one across the stream and snapshots it when the `result` event lands. cost · M cumulative tokens billed across the whole turn (the previous behaviour, now correctly labelled). both update via a single `TokenUsageChanged { ctx, cost }` SSE event at turn-end. turn_stats grows four columns (`last_input_tokens`, `last_output_tokens`, `last_cache_read_input_tokens`, `last_cache_creation_input_tokens`) so the cold-load seed can paint both badges on page load. migrations run try-and-ignore ALTERs so existing agent dbs catch up; pre-migration rows have last-inference zeros and yield no `ctx` seed (badge stays empty until next turn) rather than a misleading 0.
2026-05-18 18:48:35 +02:00 · 2026-05-18 18:48:35 +02:00 · 5c6c607e25
commit 5c6c607e25
parent 14549dd8a9
9 changed files with 267 additions and 101 deletions
--- a/hive-ag3nt/src/web_ui.rs
+++ b/hive-ag3nt/src/web_ui.rs
@ -225,9 +225,13 @@ struct StateSnapshot {
    /// the operator can see what they just switched to (and what's
    /// in flight). Mutable at runtime via `POST /api/model`.
    model: String,
-    /// Token usage from the last completed turn. `null` until the
-    /// first turn with usage data finishes.
-    token_usage: Option<crate::events::TokenUsage>,
+    /// Last-inference token usage from the most recent completed
+    /// turn — represents the current context-window size at turn-end.
+    /// `null` until the first turn finishes.
+    ctx_usage: Option<crate::events::TokenUsage>,
+    /// Cumulative token usage across the most recent turn's inferences
+    /// (cost signal). `null` until the first turn finishes.
+    cost_usage: Option<crate::events::TokenUsage>,
 }

 #[derive(Serialize)]
@ -310,7 +314,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
    let inbox = recent_inbox(&state.socket, state.flavor()).await;
    let (turn_state, turn_state_since) = state.bus.state_snapshot();
    let model = state.bus.model();
-    let token_usage = state.bus.last_usage();
+    let ctx_usage = state.bus.last_ctx_usage();
+    let cost_usage = state.bus.last_cost_usage();
    axum::Json(StateSnapshot {
        seq,
        label: state.label.clone(),
@ -321,7 +326,8 @@ async fn api_state(State(state): State<AppState>) -> axum::Json<StateSnapshot> {
        turn_state,
        turn_state_since,
        model,
-        token_usage,
+        ctx_usage,
+        cost_usage,
    })
 }