docs: document model/context-window config, dynamic watermarks, rate-limit scoping
This commit is contained in:
parent
bac7dd6cde
commit
939df10a61
3 changed files with 60 additions and 19 deletions
|
|
@ -139,11 +139,14 @@ hive-sh4re/ wire types (HostRequest/Response, AgentRequest/Response,
|
||||||
|
|
||||||
nix/
|
nix/
|
||||||
modules/hive-c0re.nix systemd service + firewall + git wiring;
|
modules/hive-c0re.nix systemd service + firewall + git wiring;
|
||||||
|
`contextWindowTokens` attrset (per-model,
|
||||||
|
injected as env vars into all containers);
|
||||||
imports hive-forge.nix
|
imports hive-forge.nix
|
||||||
modules/hive-forge.nix optional in-container Forgejo
|
modules/hive-forge.nix optional in-container Forgejo
|
||||||
(`hyperhive.forge.enable`, default on);
|
(`hyperhive.forge.enable`, default on);
|
||||||
Catppuccin Mocha theme via tmpfiles C+ copy
|
Catppuccin Mocha theme via tmpfiles C+ copy
|
||||||
templates/harness-base.nix shared scaffolding for sub-agents + manager
|
templates/harness-base.nix shared scaffolding for sub-agents + manager;
|
||||||
|
`hyperhive.model` option (HIVE_DEFAULT_MODEL)
|
||||||
templates/agent-base.nix sub-agent nixosConfiguration
|
templates/agent-base.nix sub-agent nixosConfiguration
|
||||||
templates/manager.nix manager nixosConfiguration
|
templates/manager.nix manager nixosConfiguration
|
||||||
templates/weston-vnc.nix optional `hyperhive.gui.enable`
|
templates/weston-vnc.nix optional `hyperhive.gui.enable`
|
||||||
|
|
@ -205,3 +208,7 @@ read them à la carte.
|
||||||
- **Rate-limit sentinel:** `{state_dir}/hyperhive-rate-limited`
|
- **Rate-limit sentinel:** `{state_dir}/hyperhive-rate-limited`
|
||||||
is written by the harness on 429 and cleared on retry.
|
is written by the harness on 429 and cleared on retry.
|
||||||
`ContainerView.rate_limited` reads it for the dashboard badge.
|
`ContainerView.rate_limited` reads it for the dashboard badge.
|
||||||
|
- **Context window:** defaults are in `services.hive-c0re.contextWindowTokens`
|
||||||
|
(host nix, affects all agents). Per-agent default model via
|
||||||
|
`hyperhive.model` in `agent.nix`. Watermarks are 75%/50% of the
|
||||||
|
effective window.
|
||||||
|
|
|
||||||
|
|
@ -21,9 +21,13 @@ Each agent harness (`hive-ag3nt serve` or `hive-m1nd serve`) runs:
|
||||||
6. Wait for claude to exit. Compaction is two-pronged — *reactive*
|
6. Wait for claude to exit. Compaction is two-pronged — *reactive*
|
||||||
on `Prompt is too long` and *proactive* on a context watermark
|
on `Prompt is too long` and *proactive* on a context watermark
|
||||||
(see [Compaction](#compaction) below). **Rate-limit detection**:
|
(see [Compaction](#compaction) below). **Rate-limit detection**:
|
||||||
if stdout contains `429` or `rate_limit` markers, the harness
|
on stderr the harness does a raw-line match for `429` /
|
||||||
sets the `rate_limited` sentinel (`Bus::emit_status("rate_limited")`),
|
`rate_limit` markers; on stdout it only fires on parsed
|
||||||
sleeps `HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries.
|
`{"type":"error"}` JSON events (avoiding false positives when
|
||||||
|
agents discuss `rate_limit_error` in conversation text). On
|
||||||
|
detection the harness sets the `rate_limited` sentinel
|
||||||
|
(`Bus::emit_status("rate_limited")`), sleeps
|
||||||
|
`HIVE_RATE_LIMIT_SLEEP_SECS` (default 300), then retries.
|
||||||
The dashboard and per-agent page show a `⊘ rate limited` badge
|
The dashboard and per-agent page show a `⊘ rate limited` badge
|
||||||
while the harness is parked.
|
while the harness is parked.
|
||||||
7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid
|
7. Emit `LiveEvent::TurnEnd { ok, note }`. Sleep `poll_ms` to avoid
|
||||||
|
|
@ -40,11 +44,33 @@ claude --print --verbose --output-format stream-json --model <name> \
|
||||||
# wake prompt piped over stdin
|
# wake prompt piped over stdin
|
||||||
```
|
```
|
||||||
|
|
||||||
`<name>` is read from `Bus::model()` on each turn, default
|
`<name>` is read from `Bus::model()` on each turn. The initial
|
||||||
`haiku`. Operator can flip it at runtime with `/model <name>` in
|
default is set by `hyperhive.model` in the agent's `agent.nix`
|
||||||
the web terminal — the next turn picks it up. The choice is
|
(NixOS option; propagates via `HIVE_DEFAULT_MODEL` env var; falls
|
||||||
persisted to `/state/hyperhive-model` so it survives restart;
|
back to `"haiku"` if unset). The operator can flip it at runtime
|
||||||
override path: `HYPERHIVE_MODEL_FILE` env var for tests.
|
with `/model <name>` in the web terminal — the next turn picks it
|
||||||
|
up. The choice is persisted to `/state/hyperhive-model` so it
|
||||||
|
survives restart; override path: `HYPERHIVE_MODEL_FILE` env var
|
||||||
|
for tests.
|
||||||
|
|
||||||
|
Context-window size is looked up per-model via
|
||||||
|
`events::context_window_tokens(model)`. Resolution order (first
|
||||||
|
match wins):
|
||||||
|
|
||||||
|
1. `HIVE_CONTEXT_WINDOW_TOKENS_<KEY>` env var, where `KEY`
|
||||||
|
(lowercased) is a substring of the active model name. Injected
|
||||||
|
by the meta flake from `services.hive-c0re.contextWindowTokens`
|
||||||
|
(host-level NixOS option, defaults: haiku=200k, sonnet=1M,
|
||||||
|
opus=1M). Override these for all agents at once without a
|
||||||
|
per-agent config change.
|
||||||
|
2. `HIVE_CONTEXT_WINDOW_TOKENS` — single global override for any
|
||||||
|
model (useful in dev / test).
|
||||||
|
3. Hard fallback: `200_000` (conservative; only reached outside
|
||||||
|
NixOS where the env vars aren't set).
|
||||||
|
|
||||||
|
The effective window drives watermarks and is exposed at runtime
|
||||||
|
via `/api/state.context_window_tokens` so the UI can show a
|
||||||
|
percentage-of-window ctx badge.
|
||||||
|
|
||||||
`--continue` keeps a persistent session per agent (claude stores
|
`--continue` keeps a persistent session per agent (claude stores
|
||||||
sessions in `~/.claude/projects/`, which is bind-mounted
|
sessions in `~/.claude/projects/`, which is bind-mounted
|
||||||
|
|
@ -76,17 +102,20 @@ owns it explicitly in `turn::drive_turn`. There are two triggers:
|
||||||
persist in-flight task state, decisions, and file paths before the
|
persist in-flight task state, decisions, and file paths before the
|
||||||
conversation detail collapses into a summary.
|
conversation detail collapses into a summary.
|
||||||
|
|
||||||
The watermark is `HIVE_COMPACT_WATERMARK_TOKENS` (default `150_000`,
|
The compact watermark defaults to **75% of `context_window_tokens(model)`**
|
||||||
~75% of a 200k window); set it to `0` to disable proactive compaction
|
(dynamically derived — 150k for haiku, 750k for sonnet/opus). Override
|
||||||
entirely (the reactive path always applies). The proactive path is
|
with `HIVE_COMPACT_WATERMARK_TOKENS` (absolute token count); set to `0`
|
||||||
best-effort — a failed checkpoint turn or `/compact` is surfaced as a
|
to disable proactive compaction entirely (the reactive path always
|
||||||
`Note` but never fails the turn that already succeeded. The operator
|
applies). The proactive path is best-effort — a failed checkpoint turn
|
||||||
can also force a compaction any time via `/api/compact`.
|
or `/compact` is surfaced as a `Note` but never fails the turn that
|
||||||
|
already succeeded. The operator can also force a compaction any time
|
||||||
|
via `/api/compact`.
|
||||||
|
|
||||||
- **Auto session-reset** — a third path that fires when both
|
- **Auto session-reset** — a third path that fires when both
|
||||||
conditions hold: context is ≥ a watermark (`HIVE_AUTO_RESET_WATERMARK_TOKENS`,
|
conditions hold: context is ≥ a watermark (`HIVE_AUTO_RESET_WATERMARK_TOKENS`,
|
||||||
default `100_000`) AND the time since the last turn exceeds the
|
default **50% of `context_window_tokens(model)`**) AND the time since
|
||||||
assumed prompt-cache TTL (`HIVE_CACHE_TTL_SECS`, default `300`).
|
the last turn exceeds the assumed prompt-cache TTL
|
||||||
|
(`HIVE_CACHE_TTL_SECS`, default `3600`).
|
||||||
Claude's prompt cache lives ~5 minutes; if the cache is already
|
Claude's prompt cache lives ~5 minutes; if the cache is already
|
||||||
cold, resuming with `--continue` pays the full re-upload cost of
|
cold, resuming with `--continue` pays the full re-upload cost of
|
||||||
the current context with no benefit over starting fresh. So:
|
the current context with no benefit over starting fresh. So:
|
||||||
|
|
|
||||||
|
|
@ -382,7 +382,9 @@ Layout, top to bottom:
|
||||||
(input + cache_read + cache_write of the most recent
|
(input + cache_read + cache_write of the most recent
|
||||||
model call in the just-ended turn). This is the **actual
|
model call in the just-ended turn). This is the **actual
|
||||||
context window utilisation** — the number to watch when
|
context window utilisation** — the number to watch when
|
||||||
deciding whether to compact.
|
deciding whether to compact. When `context_window_tokens`
|
||||||
|
is available from `/api/state`, the badge tooltip shows the
|
||||||
|
percentage of window used.
|
||||||
- Cost badge: `cost · 1.3M` — cumulative tokens billed
|
- Cost badge: `cost · 1.3M` — cumulative tokens billed
|
||||||
across **every inference** in the last turn (sum of all
|
across **every inference** in the last turn (sum of all
|
||||||
per-call prompts). Tool-heavy turns rebill the cached
|
per-call prompts). Tool-heavy turns rebill the cached
|
||||||
|
|
@ -407,7 +409,10 @@ Layout, top to bottom:
|
||||||
Polling: `/api/state` is fetched **once** on cold load, and
|
Polling: `/api/state` is fetched **once** on cold load, and
|
||||||
again while `status === 'needs_login_in_progress'` (login
|
again while `status === 'needs_login_in_progress'` (login
|
||||||
session output isn't event-shaped yet). Every other badge
|
session output isn't event-shaped yet). Every other badge
|
||||||
updates from SSE; no periodic refresh timer runs.
|
updates from SSE; no periodic refresh timer runs. Snapshot
|
||||||
|
includes `context_window_tokens` (effective window size for
|
||||||
|
the agent's current model, from `events::context_window_tokens`)
|
||||||
|
used to compute percentage-of-window in the ctx badge tooltip.
|
||||||
- Inbox `<details>` block (collapsed): `inbox · N` — last 30
|
- Inbox `<details>` block (collapsed): `inbox · N` — last 30
|
||||||
messages addressed to this agent, fetched via
|
messages addressed to this agent, fetched via
|
||||||
`AgentRequest::Recent { limit: 30 }`. Reply messages (those
|
`AgentRequest::Recent { limit: 30 }`. Reply messages (those
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue