diff --git a/CLAUDE.md b/CLAUDE.md
index b88c08b..68296b2 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -139,11 +139,14 @@ hive-sh4re/        wire types (HostRequest/Response, AgentRequest/Response,
 
 nix/
   modules/hive-c0re.nix         systemd service + firewall + git wiring;
+                                 `contextWindowTokens` attrset (per-model,
+                                 injected as env vars into all containers);
                                  imports hive-forge.nix
   modules/hive-forge.nix        optional in-container Forgejo
                                  (`hyperhive.forge.enable`, default on);
                                  Catppuccin Mocha theme via tmpfiles C+ copy
-  templates/harness-base.nix    shared scaffolding for sub-agents + manager
+  templates/harness-base.nix    shared scaffolding for sub-agents + manager;
+                                 `hyperhive.model` option (HIVE_DEFAULT_MODEL)
   templates/agent-base.nix      sub-agent nixosConfiguration
   templates/manager.nix         manager nixosConfiguration
   templates/weston-vnc.nix      optional `hyperhive.gui.enable`
@@ -205,3 +208,7 @@ read them à la carte.
 - **Rate-limit sentinel:** `{state_dir}/hyperhive-rate-limited`
   is written by the harness on 429 and cleared on retry.
   `ContainerView.rate_limited` reads it for the dashboard badge.
+- **Context window:** defaults are in `services.hive-c0re.contextWindowTokens`
+  (host nix, affects all agents). Per-agent default model via
+  `hyperhive.model` in `agent.nix`. Watermarks are 75%/50% of the
+  effective window.
diff --git a/docs/turn-loop.md b/docs/turn-loop.md
index 8ad00e8..37659d6 100644
--- a/docs/turn-loop.md
+++ b/docs/turn-loop.md
@@ -21,9 +21,13 @@ Each agent harness (`hive-ag3nt serve` or `hive-m1nd serve`) runs:
 6. Wait for claude to exit. Compaction is two-pronged — *reactive*
    on `Prompt is too long` and *proactive* on a context watermark
    (see [Compaction](#compaction) below). **Rate-limit detection**:
-   if stdout contains `429` or `rate_limit` markers, the harness
-   sets the `rate_limited` sentinel (`Bus::emit_status("rate_limited")`),
-   sleeps `HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries.
+   on stderr the harness does a raw-line match for `429` /
+   `rate_limit` markers; on stdout it only fires on parsed
+   `{"type":"error"}` JSON events (avoiding false positives when
+   agents discuss `rate_limit_error` in conversation text). On
+   detection the harness sets the `rate_limited` sentinel
+   (`Bus::emit_status("rate_limited")`), sleeps
+   `HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries.
    The dashboard and per-agent page show a `⊘ rate limited` badge
    while the harness is parked.
 7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid
@@ -40,11 +44,33 @@ claude --print --verbose --output-format stream-json --model <name> \
 # wake prompt piped over stdin
 ```
 
-`<name>` is read from `Bus::model()` on each turn, default
-`haiku`. Operator can flip it at runtime with `/model <name>` in
-the web terminal — the next turn picks it up.  The choice is
-persisted to `/state/hyperhive-model` so it survives restart;
-override path: `HYPERHIVE_MODEL_FILE` env var for tests.
+`<name>` is read from `Bus::model()` on each turn. The initial
+default is set by `hyperhive.model` in the agent's `agent.nix`
+(NixOS option; propagates via `HIVE_DEFAULT_MODEL` env var; falls
+back to `"haiku"` if unset). The operator can flip it at runtime
+with `/model <name>` in the web terminal — the next turn picks it
+up. The choice is persisted to `/state/hyperhive-model` so it
+survives restart; override path: `HYPERHIVE_MODEL_FILE` env var
+for tests.
+
+Context-window size is looked up per-model via
+`events::context_window_tokens(model)`. Resolution order (first
+match wins):
+
+1. `HIVE_CONTEXT_WINDOW_TOKENS_<KEY>` env var, where `KEY`
+   (lowercased) is a substring of the active model name. Injected
+   by the meta flake from `services.hive-c0re.contextWindowTokens`
+   (host-level NixOS option, defaults: haiku=200k, sonnet=1M,
+   opus=1M). Override these for all agents at once without a
+   per-agent config change.
+2. `HIVE_CONTEXT_WINDOW_TOKENS` — single global override for any
+   model (useful in dev / test).
+3. Hard fallback: `200_000` (conservative; only reached outside
+   NixOS where the env vars aren't set).
+
+The effective window drives watermarks and is exposed at runtime
+via `/api/state.context_window_tokens` so the UI can show a
+percentage-of-window ctx badge.
 
 `--continue` keeps a persistent session per agent (claude stores
 sessions in `~/.claude/projects/`, which is bind-mounted
@@ -76,17 +102,20 @@ owns it explicitly in `turn::drive_turn`. There are two triggers:
   persist in-flight task state, decisions, and file paths before the
   conversation detail collapses into a summary.
 
-The watermark is `HIVE_COMPACT_WATERMARK_TOKENS` (default `150_000`,
-~75% of a 200k window); set it to `0` to disable proactive compaction
-entirely (the reactive path always applies). The proactive path is
-best-effort — a failed checkpoint turn or `/compact` is surfaced as a
-`Note` but never fails the turn that already succeeded. The operator
-can also force a compaction any time via `/api/compact`.
+The compact watermark defaults to **75% of `context_window_tokens(model)`**
+(dynamically derived — 150k for haiku, 750k for sonnet/opus). Override
+with `HIVE_COMPACT_WATERMARK_TOKENS` (absolute token count); set to `0`
+to disable proactive compaction entirely (the reactive path always
+applies). The proactive path is best-effort — a failed checkpoint turn
+or `/compact` is surfaced as a `Note` but never fails the turn that
+already succeeded. The operator can also force a compaction any time
+via `/api/compact`.
 
 - **Auto session-reset** — a third path that fires when both
   conditions hold: context is ≥ a watermark (`HIVE_AUTO_RESET_WATERMARK_TOKENS`,
-  default `100_000`) AND the time since the last turn exceeds the
-  assumed prompt-cache TTL (`HIVE_CACHE_TTL_SECS`, default `300`).
+  default **50% of `context_window_tokens(model)`**) AND the time since
+  the last turn exceeds the assumed prompt-cache TTL
+  (`HIVE_CACHE_TTL_SECS`, default `3600`).
   Claude's prompt cache lives ~5 minutes; if the cache is already
   cold, resuming with `--continue` pays the full re-upload cost of
   the current context with no benefit over starting fresh. So:
diff --git a/docs/web-ui.md b/docs/web-ui.md
index 7e6b815..31b6526 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -382,7 +382,9 @@ Layout, top to bottom:
     (input + cache_read + cache_write of the most recent
     model call in the just-ended turn). This is the **actual
     context window utilisation** — the number to watch when
-    deciding whether to compact.
+    deciding whether to compact. When `context_window_tokens`
+    is available from `/api/state`, the badge tooltip shows the
+    percentage of window used.
   - Cost badge: `cost · 1.3M` — cumulative tokens billed
     across **every inference** in the last turn (sum of all
     per-call prompts). Tool-heavy turns rebill the cached
@@ -407,7 +409,10 @@ Layout, top to bottom:
   Polling: `/api/state` is fetched **once** on cold load, and
   again while `status === 'needs_login_in_progress'` (login
   session output isn't event-shaped yet). Every other badge
-  updates from SSE; no periodic refresh timer runs.
+  updates from SSE; no periodic refresh timer runs. Snapshot
+  includes `context_window_tokens` (effective window size for
+  the agent's current model, from `events::context_window_tokens`)
+  used to compute percentage-of-window in the ctx badge tooltip.
 - Inbox `<details>` block (collapsed): `inbox · N` — last 30
   messages addressed to this agent, fetched via
   `AgentRequest::Recent { limit: 30 }`. Reply messages (those