todo + terminal-rendering: ctx-badge cold-load, auto session-reset, more coherence-pass gripes

2026-05-18 18:00:46 +02:00 · 2026-05-18 18:00:46 +02:00 · 8a3e8bfb7f
commit 8a3e8bfb7f
parent 9995bbc891
2 changed files with 98 additions and 0 deletions
--- a/TODO.md
+++ b/TODO.md
@ -61,6 +61,12 @@ how often the friction bites in normal use.
 - **Surface per-turn stats on the agent web UI**: "N turns today" chip + rolling tool-call histogram tooltip on the model chip. (`open_threads` and `open_reminders` chips already landed via other paths — open-threads section on the page + reminder count chip on the container row.) Reads the per-agent `turn_stats.sqlite`.
 - **Stats UI on the main dashboard**: per-agent rollups (avg turn duration, tokens-since-boot, top 5 tools) on the container row. Same data source, host-side aggregation query.

+## Harness Behaviour
+
+- **Persist + cold-load current context size on the per-agent page**: the `ctx-badge` (Claude Code's bottom-right "N tokens" indicator) currently only populates after the first `TokenUsageChanged` SSE event arrives, which is the *next* turn — until then the badge is empty. Operator can't see "this agent is at 78% context" before deciding to manually compact / reset / message it. Last known token usage should be persisted (likely a small `/state/hyperhive-token-usage` blob, or a row in turn_stats already has it — pull last row's totals on cold load) and returned by `/api/state` so the badge paints with real numbers on first render.
+
+- **Auto session-reset when context is large and cache is cold**: today every turn uses `--continue`, so a long-lived agent carries its entire transcript forward indefinitely. When the next turn's context is above some threshold (rough starting point: ~50% of the session limit — hive startup alone burned ~15%, so the headroom disappears fast) *and* the prompt cache is no longer warm (last turn ended past the cache TTL), it's cheaper to start fresh than to re-send the whole history uncached. Open question: drop `--continue` vs. trigger `--compact` first — needs measurement of what each actually costs (uncached re-read of N tokens vs. a compact turn's own token spend + the post-compact uncached re-read). Decision should be data-driven, not guessed. Needs: a context-size estimate per turn (turn_stats already tracks token usage), a cache-warmth heuristic (time since last turn vs. cache TTL), and a one-shot fresh-session path in `turn.rs` mirroring the existing `↻ new session` button.
+
 ## Bugs

 - **Token-budget exhaustion crashes the harness**: when claude's account hits its rate/token cap, the in-flight `claude --print` invocation returns an error the harness doesn't recognise as recoverable, the serve loop exits, and the container stays up with a dead daemon. Operator only notices when an unrelated wake fails to drive a turn. Want: detect the budget-exceeded class of failure (likely a specific stderr line or stream-json `rate_limit_event` shape), fire a `LiveEvent::StatusChanged("rate_limited")` or new status, surface as a red badge + banner on the dashboard + per-agent UI, and have the serve loop park (sleep N minutes, retry) instead of returning Err. Operator can also see "this agent is rate-limited until ~HH:MM" if claude tells us when. Inspect `crate::turn::run_claude`'s `bail!` paths + claude's stderr conventions for the budget error string.