diff --git a/CLAUDE.md b/CLAUDE.md index b88c08b..68296b2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -139,11 +139,14 @@ hive-sh4re/ wire types (HostRequest/Response, AgentRequest/Response, nix/ modules/hive-c0re.nix systemd service + firewall + git wiring; + `contextWindowTokens` attrset (per-model, + injected as env vars into all containers); imports hive-forge.nix modules/hive-forge.nix optional in-container Forgejo (`hyperhive.forge.enable`, default on); Catppuccin Mocha theme via tmpfiles C+ copy - templates/harness-base.nix shared scaffolding for sub-agents + manager + templates/harness-base.nix shared scaffolding for sub-agents + manager; + `hyperhive.model` option (HIVE_DEFAULT_MODEL) templates/agent-base.nix sub-agent nixosConfiguration templates/manager.nix manager nixosConfiguration templates/weston-vnc.nix optional `hyperhive.gui.enable` @@ -205,3 +208,7 @@ read them à la carte. - **Rate-limit sentinel:** `{state_dir}/hyperhive-rate-limited` is written by the harness on 429 and cleared on retry. `ContainerView.rate_limited` reads it for the dashboard badge. +- **Context window:** defaults are in `services.hive-c0re.contextWindowTokens` + (host nix, affects all agents). Per-agent default model via + `hyperhive.model` in `agent.nix`. Watermarks are 75%/50% of the + effective window. diff --git a/docs/turn-loop.md b/docs/turn-loop.md index 8ad00e8..37659d6 100644 --- a/docs/turn-loop.md +++ b/docs/turn-loop.md @@ -21,9 +21,13 @@ Each agent harness (`hive-ag3nt serve` or `hive-m1nd serve`) runs: 6. Wait for claude to exit. Compaction is two-pronged — *reactive* on `Prompt is too long` and *proactive* on a context watermark (see [Compaction](#compaction) below). **Rate-limit detection**: - if stdout contains `429` or `rate_limit` markers, the harness - sets the `rate_limited` sentinel (`Bus::emit_status("rate_limited")`), - sleeps `HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries. + on stderr the harness does a raw-line match for `429` / + `rate_limit` markers; on stdout it only fires on parsed + `{"type":"error"}` JSON events (avoiding false positives when + agents discuss `rate_limit_error` in conversation text). On + detection the harness sets the `rate_limited` sentinel + (`Bus::emit_status("rate_limited")`), sleeps + `HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries. The dashboard and per-agent page show a `⊘ rate limited` badge while the harness is parked. 7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid @@ -40,11 +44,33 @@ claude --print --verbose --output-format stream-json --model \ # wake prompt piped over stdin ``` -`` is read from `Bus::model()` on each turn, default -`haiku`. Operator can flip it at runtime with `/model ` in -the web terminal — the next turn picks it up. The choice is -persisted to `/state/hyperhive-model` so it survives restart; -override path: `HYPERHIVE_MODEL_FILE` env var for tests. +`` is read from `Bus::model()` on each turn. The initial +default is set by `hyperhive.model` in the agent's `agent.nix` +(NixOS option; propagates via `HIVE_DEFAULT_MODEL` env var; falls +back to `"haiku"` if unset). The operator can flip it at runtime +with `/model ` in the web terminal — the next turn picks it +up. The choice is persisted to `/state/hyperhive-model` so it +survives restart; override path: `HYPERHIVE_MODEL_FILE` env var +for tests. + +Context-window size is looked up per-model via +`events::context_window_tokens(model)`. Resolution order (first +match wins): + +1. `HIVE_CONTEXT_WINDOW_TOKENS_` env var, where `KEY` + (lowercased) is a substring of the active model name. Injected + by the meta flake from `services.hive-c0re.contextWindowTokens` + (host-level NixOS option, defaults: haiku=200k, sonnet=1M, + opus=1M). Override these for all agents at once without a + per-agent config change. +2. `HIVE_CONTEXT_WINDOW_TOKENS` — single global override for any + model (useful in dev / test). +3. Hard fallback: `200_000` (conservative; only reached outside + NixOS where the env vars aren't set). + +The effective window drives watermarks and is exposed at runtime +via `/api/state.context_window_tokens` so the UI can show a +percentage-of-window ctx badge. `--continue` keeps a persistent session per agent (claude stores sessions in `~/.claude/projects/`, which is bind-mounted @@ -76,17 +102,20 @@ owns it explicitly in `turn::drive_turn`. There are two triggers: persist in-flight task state, decisions, and file paths before the conversation detail collapses into a summary. -The watermark is `HIVE_COMPACT_WATERMARK_TOKENS` (default `150_000`, -~75% of a 200k window); set it to `0` to disable proactive compaction -entirely (the reactive path always applies). The proactive path is -best-effort — a failed checkpoint turn or `/compact` is surfaced as a -`Note` but never fails the turn that already succeeded. The operator -can also force a compaction any time via `/api/compact`. +The compact watermark defaults to **75% of `context_window_tokens(model)`** +(dynamically derived — 150k for haiku, 750k for sonnet/opus). Override +with `HIVE_COMPACT_WATERMARK_TOKENS` (absolute token count); set to `0` +to disable proactive compaction entirely (the reactive path always +applies). The proactive path is best-effort — a failed checkpoint turn +or `/compact` is surfaced as a `Note` but never fails the turn that +already succeeded. The operator can also force a compaction any time +via `/api/compact`. - **Auto session-reset** — a third path that fires when both conditions hold: context is ≥ a watermark (`HIVE_AUTO_RESET_WATERMARK_TOKENS`, - default `100_000`) AND the time since the last turn exceeds the - assumed prompt-cache TTL (`HIVE_CACHE_TTL_SECS`, default `300`). + default **50% of `context_window_tokens(model)`**) AND the time since + the last turn exceeds the assumed prompt-cache TTL + (`HIVE_CACHE_TTL_SECS`, default `3600`). Claude's prompt cache lives ~5 minutes; if the cache is already cold, resuming with `--continue` pays the full re-upload cost of the current context with no benefit over starting fresh. So: diff --git a/docs/web-ui.md b/docs/web-ui.md index 7e6b815..31b6526 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -382,7 +382,9 @@ Layout, top to bottom: (input + cache_read + cache_write of the most recent model call in the just-ended turn). This is the **actual context window utilisation** — the number to watch when - deciding whether to compact. + deciding whether to compact. When `context_window_tokens` + is available from `/api/state`, the badge tooltip shows the + percentage of window used. - Cost badge: `cost · 1.3M` — cumulative tokens billed across **every inference** in the last turn (sum of all per-call prompts). Tool-heavy turns rebill the cached @@ -407,7 +409,10 @@ Layout, top to bottom: Polling: `/api/state` is fetched **once** on cold load, and again while `status === 'needs_login_in_progress'` (login session output isn't event-shaped yet). Every other badge - updates from SSE; no periodic refresh timer runs. + updates from SSE; no periodic refresh timer runs. Snapshot + includes `context_window_tokens` (effective window size for + the agent's current model, from `events::context_window_tokens`) + used to compute percentage-of-window in the ctx badge tooltip. - Inbox `
` block (collapsed): `inbox · N` — last 30 messages addressed to this agent, fetched via `AgentRequest::Recent { limit: 30 }`. Reply messages (those