docs: document model/context-window config, dynamic watermarks, rate-limit scoping

This commit is contained in:
iris 2026-05-20 16:55:13 +02:00
parent bac7dd6cde
commit 939df10a61
3 changed files with 60 additions and 19 deletions

View file

@ -21,9 +21,13 @@ Each agent harness (`hive-ag3nt serve` or `hive-m1nd serve`) runs:
6. Wait for claude to exit. Compaction is two-pronged — *reactive*
on `Prompt is too long` and *proactive* on a context watermark
(see [Compaction](#compaction) below). **Rate-limit detection**:
if stdout contains `429` or `rate_limit` markers, the harness
sets the `rate_limited` sentinel (`Bus::emit_status("rate_limited")`),
sleeps `HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries.
on stderr the harness does a raw-line match for `429` /
`rate_limit` markers; on stdout it only fires on parsed
`{"type":"error"}` JSON events (avoiding false positives when
agents discuss `rate_limit_error` in conversation text). On
detection the harness sets the `rate_limited` sentinel
(`Bus::emit_status("rate_limited")`), sleeps
`HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries.
The dashboard and per-agent page show a `⊘ rate limited` badge
while the harness is parked.
7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid
@ -40,11 +44,33 @@ claude --print --verbose --output-format stream-json --model <name> \
# wake prompt piped over stdin
```
`<name>` is read from `Bus::model()` on each turn, default
`haiku`. Operator can flip it at runtime with `/model <name>` in
the web terminal — the next turn picks it up. The choice is
persisted to `/state/hyperhive-model` so it survives restart;
override path: `HYPERHIVE_MODEL_FILE` env var for tests.
`<name>` is read from `Bus::model()` on each turn. The initial
default is set by `hyperhive.model` in the agent's `agent.nix`
(NixOS option; propagates via `HIVE_DEFAULT_MODEL` env var; falls
back to `"haiku"` if unset). The operator can flip it at runtime
with `/model <name>` in the web terminal — the next turn picks it
up. The choice is persisted to `/state/hyperhive-model` so it
survives restart; override path: `HYPERHIVE_MODEL_FILE` env var
for tests.
Context-window size is looked up per-model via
`events::context_window_tokens(model)`. Resolution order (first
match wins):
1. `HIVE_CONTEXT_WINDOW_TOKENS_<KEY>` env var, where `KEY`
(lowercased) is a substring of the active model name. Injected
by the meta flake from `services.hive-c0re.contextWindowTokens`
(host-level NixOS option, defaults: haiku=200k, sonnet=1M,
opus=1M). Override these for all agents at once without a
per-agent config change.
2. `HIVE_CONTEXT_WINDOW_TOKENS` — single global override for any
model (useful in dev / test).
3. Hard fallback: `200_000` (conservative; only reached outside
NixOS where the env vars aren't set).
The effective window drives watermarks and is exposed at runtime
via `/api/state.context_window_tokens` so the UI can show a
percentage-of-window ctx badge.
`--continue` keeps a persistent session per agent (claude stores
sessions in `~/.claude/projects/`, which is bind-mounted
@ -76,17 +102,20 @@ owns it explicitly in `turn::drive_turn`. There are two triggers:
persist in-flight task state, decisions, and file paths before the
conversation detail collapses into a summary.
The watermark is `HIVE_COMPACT_WATERMARK_TOKENS` (default `150_000`,
~75% of a 200k window); set it to `0` to disable proactive compaction
entirely (the reactive path always applies). The proactive path is
best-effort — a failed checkpoint turn or `/compact` is surfaced as a
`Note` but never fails the turn that already succeeded. The operator
can also force a compaction any time via `/api/compact`.
The compact watermark defaults to **75% of `context_window_tokens(model)`**
(dynamically derived — 150k for haiku, 750k for sonnet/opus). Override
with `HIVE_COMPACT_WATERMARK_TOKENS` (absolute token count); set to `0`
to disable proactive compaction entirely (the reactive path always
applies). The proactive path is best-effort — a failed checkpoint turn
or `/compact` is surfaced as a `Note` but never fails the turn that
already succeeded. The operator can also force a compaction any time
via `/api/compact`.
- **Auto session-reset** — a third path that fires when both
conditions hold: context is ≥ a watermark (`HIVE_AUTO_RESET_WATERMARK_TOKENS`,
default `100_000`) AND the time since the last turn exceeds the
assumed prompt-cache TTL (`HIVE_CACHE_TTL_SECS`, default `300`).
default **50% of `context_window_tokens(model)`**) AND the time since
the last turn exceeds the assumed prompt-cache TTL
(`HIVE_CACHE_TTL_SECS`, default `3600`).
Claude's prompt cache lives ~5 minutes; if the cache is already
cold, resuming with `--continue` pays the full re-upload cost of
the current context with no benefit over starting fresh. So: