# Hyperhive — deployment, ops & boundaries Tracking the deployment-shape + operational-hardening work: container network isolation, the unifying gateway, the operator-vs-agent trust boundary, and process privilege separation. These items interlock. Today "the operator surface" and "the agent surface" are a *convention*, not a boundary — nothing stops a container from curling the core daemon on `localhost:`, or another agent's web UI. The gateway, network isolation, and privsep together turn that convention into an enforced boundary. Sequencing matters; see the order at the bottom. ## The boundary we're building toward Two principals, two paths: - **Operator** — reaches every UI (the dashboard + every per-agent page) through the gateway, on one origin. Operator-authority actions (approve / deny, answer-as-operator, lifecycle POSTs) are served by the core daemon and only reachable via the gateway. - **Agent** — speaks only for itself, only over its per-agent unix socket. The socket's identity *is* the agent (see `docs/conventions.md`, "identity = socket"). An agent must not be able to reach the core daemon's HTTP surface, another agent's socket, or another agent's web UI. Design rule that falls out of this: **operator-authority actions never get a per-agent-socket entry point.** They live on the core backend. Worked example — answering an operator-targeted question is a `POST /answer-question/{id}` on the core dashboard, *never* an `AgentRequest` variant. If it were a per-agent-socket request, an agent could `curl` its own socket and spoof an operator answer. The per-agent web UI POSTs cross-origin to the core for these (see the inline-answer feature — the loose-ends section on each agent page). ## Workstreams ### 1. Container network isolation Today containers share the host network namespace, so a container can reach `localhost:`, the dashboard, and every other agent's web port. **Until this changes, nothing below is actually enforced** — the operator/agent split is on the honour system. - Give each container a private veth / bridge with no route to the host's loopback-bound services. - The per-agent unix socket stays the only host-bound channel (it already is the intended one). - Open question: the per-agent web UI still needs to be reachable *by the operator's browser* — that is what the gateway is for (below). The container itself should not be able to reach the gateway or the core daemon. ### 2. Unifying gateway / reverse proxy (Moved here from TODO.md "Dashboard".) Today every agent's web UI is reached at `:/`, so operators juggle a port list. Stand up nginx (or similar) terminating one domain that fans requests to `/agent//...` out to each container's web port, and `/` to the main dashboard. Touches: a NixOS module on the host, the dashboard's per-agent link rendering, and the per-agent web server's base-path handling (currently assumes root). Lets bookmarks survive port reshuffles and unblocks per-agent stats links being relative URLs instead of hard-coded ports. Boundary payoff: once the dashboard and the per-agent pages are same-origin behind the gateway, the cross-origin CORS shim on `POST /answer-question/{id}` (added with the inline-answer feature) can be deleted — the per-agent page's POST becomes a plain same-origin request. Grep for `with_cors` / `Access-Control-Allow-Origin` in `hive-c0re/src/dashboard.rs` and remove it when this lands. The gateway is also the natural home for auth, if/when the operator surface ever needs it. ### 3. Privsep the core daemon from the web UI (Moved here from TODO.md "Security".) hive-c0re runs as root (it has to — `nixos-container` create / start / destroy, the meta git repo, every per-agent bind mount). The HTTP server lives in the same process, so every read-endpoint (`/api/state-file`, `/api/journal/{name}`, `/api/agent-config/{name}`) is one allow-list bug away from serving arbitrary host files. Split it: keep the privileged daemon doing lifecycle + git + ipc, run the web UI as an unprivileged user that talks to the daemon over a unix socket with a narrow request surface (`ReadAgentStateFile { agent, rel_path }` etc.). The unprivileged process can't read `/etc/shadow` even if every check in `get_state_file` is bypassed — it doesn't have the bits. Container-lifecycle POSTs (`/restart`, `/destroy`, etc.) become forwarded RPCs the privileged side authorises on its terms. Cheaper once the harness/state split lands (see TODO.md "Split harness-internal state from agent-visible state") — the unprivileged web server then only needs read access to `/agents//state/`, not `/agents//harness/`. ## Suggested sequencing 1. **Gateway** first — pure ergonomics win, unblocks same-origin, no behavioural risk. 2. **Network isolation** next — the step that makes the operator/agent boundary *real*. Everything before it is honour-system. 3. **Privsep** last — defence in depth on the core process itself; valuable independent of the other two, but the biggest refactor.