add proactive context-size compaction with a notes-checkpoint turn

This commit is contained in:
damocles 2026-05-20 12:38:50 +02:00 committed by Mara
parent f2015954d9
commit 9cbb05bb86
3 changed files with 152 additions and 9 deletions

View file

@ -200,6 +200,23 @@ read them à la carte.
In-flight or recent context that hasn't earned a section yet. In-flight or recent context that hasn't earned a section yet.
Prune freely. Prune freely.
- **Just landed:** proactive context-size compaction. `turn::
drive_turn` now has two compaction paths. The old *reactive*
one (claude prints `Prompt is too long``/compact` + retry)
is unchanged — by then the session is already past the window
and no turn can run on it. The new *proactive* one fires after
a clean turn whose last-inference context
(`Bus::last_ctx_usage().context_tokens()`) crossed a watermark:
`drive_turn` injects one synthetic notes-checkpoint turn
(`CHECKPOINT_PROMPT` — "flush durable state into /state now")
then runs `/compact`, so the agent can persist in-flight state
before the detail collapses into a summary. Watermark is
`HIVE_COMPACT_WATERMARK_TOKENS` (default 150k, ~75% of a 200k
window; `0` disables). New `maybe_checkpoint_and_compact`
helper, all best-effort (a failed checkpoint/compact is a Note,
never fails the turn). Both ag3nt + m1nd loops get it for free
via the shared `drive_turn`. `docs/turn-loop.md` gained a
Compaction section.
- **Just landed:** dashboard side panel + forge-linked - **Just landed:** dashboard side panel + forge-linked
config/approvals + per-bucket model stats. (a) Long content config/approvals + per-bucket model stats. (a) Long content
(file previews, approval diffs, journald logs) opens in a (file previews, approval diffs, journald logs) opens in a
@ -608,8 +625,8 @@ Prune freely.
agent_web_port + collision banner + spawn/rebuild preflight, agent_web_port + collision banner + spawn/rebuild preflight,
browser notifications, focus-preserving refresh, generalised browser notifications, focus-preserving refresh, generalised
<details data-restore-key> survival, prompt-on-submit pattern. <details data-restore-key> survival, prompt-on-submit pattern.
- **Open threads:** two-step spawn, notes compaction, - **Open threads:** two-step spawn, unprivileged
unprivileged containers, Bash allow-list, xterm.js. The containers, Bash allow-list, xterm.js. The
deployment / gateway / privsep cluster is tracked as deployment / gateway / privsep cluster is tracked as
`area:ops` forge issues. (Landed since this note was first written: `area:ops` forge issues. (Landed since this note was first written:
extra per-agent MCP servers, per-agent send allow-list, extra per-agent MCP servers, per-agent send allow-list,

View file

@ -18,8 +18,9 @@ Each agent harness (`hive-ag3nt serve` or `hive-m1nd serve`) runs:
over stdin. over stdin.
5. Stream stdout (JSON lines) into the bus as 5. Stream stdout (JSON lines) into the bus as
`LiveEvent::Stream(value)`. Pump stderr as `Note`. `LiveEvent::Stream(value)`. Pump stderr as `Note`.
6. Wait for claude to exit. On `Prompt is too long`, run `/compact` 6. Wait for claude to exit. Compaction is two-pronged — *reactive*
on the session once and retry the turn. on `Prompt is too long` and *proactive* on a context watermark
(see [Compaction](#compaction) below).
7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid 7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid
tight loops on transient failures. tight loops on transient failures.
@ -43,14 +44,40 @@ override path: `HYPERHIVE_MODEL_FILE` env var for tests.
`--continue` keeps a persistent session per agent (claude stores `--continue` keeps a persistent session per agent (claude stores
sessions in `~/.claude/projects/`, which is bind-mounted sessions in `~/.claude/projects/`, which is bind-mounted
persistently). Auto-compact and auto-memory are disabled via persistently). Auto-compact and auto-memory are disabled via
`--settings` because hyperhive owns compaction (`/compact` on `--settings` because hyperhive owns compaction — see
overflow, retry once; operator can also force one via `/api/compact`). [Compaction](#compaction) below.
A one-shot `--continue` suppression is available via A one-shot `--continue` suppression is available via
`POST /api/new-session` (or `/new-session` slash command in the `POST /api/new-session` (or `/new-session` slash command in the
per-agent terminal) — `Bus::take_skip_continue()` flips an per-agent terminal) — `Bus::take_skip_continue()` flips an
`AtomicBool` once per turn, the next claude invocation drops `AtomicBool` once per turn, the next claude invocation drops
`--continue`, every subsequent turn resumes normal behaviour. `--continue`, every subsequent turn resumes normal behaviour.
### Compaction
claude's own in-session auto-compact is off (`--settings`); hyperhive
owns it explicitly in `turn::drive_turn`. There are two triggers:
- **Reactive** — claude-code prints `Prompt is too long` (the
`PROMPT_TOO_LONG_MARKER`). The session is *already* past the context
window, so no turn can run on it — `drive_turn` runs `/compact`
straight away and retries the same wake-up prompt once. No
notes-checkpoint turn is possible here: the detail is gone.
- **Proactive** — a turn finishes cleanly but the last inference's
context size (`Bus::last_ctx_usage().context_tokens()`) is at or
above a watermark. While the session is still healthy, `drive_turn`
injects one synthetic *notes-checkpoint* turn (`CHECKPOINT_PROMPT`
— "context is filling up, flush durable state into `/state` now")
and *then* runs `/compact`. This gives the agent a chance to
persist in-flight task state, decisions, and file paths before the
conversation detail collapses into a summary.
The watermark is `HIVE_COMPACT_WATERMARK_TOKENS` (default `150_000`,
~75% of a 200k window); set it to `0` to disable proactive compaction
entirely (the reactive path always applies). The proactive path is
best-effort — a failed checkpoint turn or `/compact` is surfaced as a
`Note` but never fails the turn that already succeeded. The operator
can also force a compaction any time via `/api/compact`.
The child runs with `cwd = /state` (when the bind exists; falls The child runs with `cwd = /state` (when the bind exists; falls
back to the parent's cwd in dev), so any relative path in a tool back to the parent's cwd in dev), so any relative path in a tool
call (`Read foo.md`, `Bash ls`, `Write notes.md`) lands in the call (`Read foo.md`, `Bash ls`, `Write notes.md`) lands in the

View file

@ -34,6 +34,33 @@ const CLAUDE_SETTINGS: &str = include_str!("../prompts/claude-settings.json");
/// claude exit with a useful error in the live view. /// claude exit with a useful error in the live view.
const PROMPT_TOO_LONG_MARKER: &str = "Prompt is too long"; const PROMPT_TOO_LONG_MARKER: &str = "Prompt is too long";
/// Token watermark for *proactive* compaction. Once a turn finishes with
/// the last inference's context size at or above this many tokens,
/// `drive_turn` runs one dedicated notes-checkpoint turn (so the agent
/// can flush durable state into `/state`) and then `/compact` — while the
/// session is still healthy enough to run a turn at all. This is distinct
/// from the reactive `PROMPT_TOO_LONG_MARKER` path, which only fires once
/// the session is *already* past the window: at that point no turn can
/// run on it, so the reactive path just compacts + retries with no
/// checkpoint. Default is ~75% of a 200k-token window; override via
/// `HIVE_COMPACT_WATERMARK_TOKENS`, or set that to `0` to disable
/// proactive compaction entirely (the reactive path always applies).
const DEFAULT_COMPACT_WATERMARK_TOKENS: u64 = 150_000;
/// Synthetic wake prompt for the proactive notes-checkpoint turn. Not an
/// inbox message — the harness injects it directly so the agent gets one
/// turn to persist durable state before `/compact` collapses the
/// turn-by-turn history into a summary.
const CHECKPOINT_PROMPT: &str = "[system] Context checkpoint — no inbox message to handle.\n\n\
Your conversation context has grown large and the harness is about to run `/compact`, \
which collapses the detailed turn-by-turn history into a short summary. Anything you \
do not persist now is effectively lost after the next turn.\n\n\
Use THIS turn to flush anything worth keeping into your durable `/state` files: update \
your notes / CLAUDE.md / TODO.md with in-flight task state, decisions made, important \
file paths, and whatever you would need to resume cleanly with only a summary of this \
conversation to go on. Do not start new work or reply to anyone just write your notes \
and end the turn.";
/// The set of files claude reads on every invocation: the MCP server /// The set of files claude reads on every invocation: the MCP server
/// config (`--mcp-config`), static settings (`--settings`), and the /// config (`--mcp-config`), static settings (`--settings`), and the
/// pre-rendered role/tools system prompt (`--system-prompt-file`). /// pre-rendered role/tools system prompt (`--system-prompt-file`).
@ -129,10 +156,32 @@ pub enum TurnOutcome {
Failed(anyhow::Error), Failed(anyhow::Error),
} }
/// Drive one turn end-to-end, transparently compacting + retrying once on /// Resolve the proactive-compaction watermark: `HIVE_COMPACT_WATERMARK_TOKENS`
/// `Prompt is too long`. Both the sub-agent and manager loops call this. /// if set to a valid integer, else `DEFAULT_COMPACT_WATERMARK_TOKENS`. A
/// value of `0` disables proactive compaction.
fn compact_watermark_tokens() -> u64 {
std::env::var("HIVE_COMPACT_WATERMARK_TOKENS")
.ok()
.and_then(|s| s.trim().parse::<u64>().ok())
.unwrap_or(DEFAULT_COMPACT_WATERMARK_TOKENS)
}
/// Drive one turn end-to-end. Two compaction paths layer on top of the
/// raw `run_turn`:
///
/// - **Reactive** — `run_turn` returns `PromptTooLong`: the session is
/// already past the context window and *no* turn can run on it, so we
/// compact immediately and retry the same wake-up prompt once. No
/// notes-checkpoint turn is possible here — the detail is gone.
/// - **Proactive** — the turn finished cleanly but its context size has
/// crept past the watermark: while the session is still healthy we
/// give the agent one dedicated turn to checkpoint its `/state` notes,
/// then compact. This keeps a later turn from hitting the reactive
/// path (where there is no chance to save anything first).
///
/// Both the sub-agent and manager loops call this.
pub async fn drive_turn(prompt: &str, files: &TurnFiles, bus: &Bus) -> TurnOutcome { pub async fn drive_turn(prompt: &str, files: &TurnFiles, bus: &Bus) -> TurnOutcome {
match run_turn(prompt, files, bus).await { let outcome = match run_turn(prompt, files, bus).await {
TurnOutcome::PromptTooLong => { TurnOutcome::PromptTooLong => {
if let Err(e) = compact_session(files, bus).await { if let Err(e) = compact_session(files, bus).await {
tracing::warn!(error = %format!("{e:#}"), "compact failed"); tracing::warn!(error = %format!("{e:#}"), "compact failed");
@ -141,6 +190,56 @@ pub async fn drive_turn(prompt: &str, files: &TurnFiles, bus: &Bus) -> TurnOutco
run_turn(prompt, files, bus).await run_turn(prompt, files, bus).await
} }
other => other, other => other,
};
// Proactive: a turn just completed on a still-healthy session. If its
// context crossed the watermark, checkpoint + compact before a later
// turn overflows into the reactive path. Best-effort — never changes
// the outcome of the turn that already succeeded.
if matches!(outcome, TurnOutcome::Ok) {
maybe_checkpoint_and_compact(files, bus).await;
}
outcome
}
/// Proactive post-turn compaction. If the last inference's context size
/// has crossed the watermark, run one notes-checkpoint turn so the agent
/// can persist durable state, then `/compact`. Best-effort: a failed
/// checkpoint or compaction is logged + surfaced as a Note but never
/// fails the turn that already succeeded.
async fn maybe_checkpoint_and_compact(files: &TurnFiles, bus: &Bus) {
let watermark = compact_watermark_tokens();
if watermark == 0 {
return; // proactive compaction disabled
}
let Some(used) = bus.last_ctx_usage().map(|u| u.context_tokens()) else {
return; // no usage reading yet — nothing to compare against
};
if used < watermark {
return;
}
bus.emit(LiveEvent::Note {
text: format!(
"context at {used} tokens (watermark {watermark}) — running a \
notes-checkpoint turn before /compact"
),
});
// Give the agent one turn to flush durable state into /state. If the
// session is somehow already too far gone to run even this, fall
// through to compaction anyway — the checkpoint is best-effort.
match run_turn(CHECKPOINT_PROMPT, files, bus).await {
TurnOutcome::Ok => {}
TurnOutcome::PromptTooLong => bus.emit(LiveEvent::Note {
text: "checkpoint turn overflowed the window — compacting without it".into(),
}),
TurnOutcome::Failed(e) => bus.emit(LiveEvent::Note {
text: format!("checkpoint turn failed ({e:#}) — compacting anyway"),
}),
}
if let Err(e) = compact_session(files, bus).await {
tracing::warn!(error = %format!("{e:#}"), "post-checkpoint compact failed");
bus.emit(LiveEvent::Note {
text: format!("/compact after checkpoint failed: {e:#}"),
});
} }
} }