loose-ends question rows get a textarea + send button; the operator answers as operator by POSTing to the core dashboard's /answer-question route, not the per-agent socket — keeps the operator-authority path off the agent's own socket. cross-origin POST needs a CORS shim on that route for now; drops out once the gateway makes the page same-origin. also splits deployment/ops/boundaries/gateway work into TODO-ops.md.
119 lines
5 KiB
Markdown
119 lines
5 KiB
Markdown
# Hyperhive — deployment, ops & boundaries
|
|
|
|
Tracking the deployment-shape + operational-hardening work:
|
|
container network isolation, the unifying gateway, the
|
|
operator-vs-agent trust boundary, and process privilege
|
|
separation.
|
|
|
|
These items interlock. Today "the operator surface" and "the
|
|
agent surface" are a *convention*, not a boundary — nothing
|
|
stops a container from curling the core daemon on
|
|
`localhost:<port>`, or another agent's web UI. The gateway,
|
|
network isolation, and privsep together turn that convention
|
|
into an enforced boundary. Sequencing matters; see the order at
|
|
the bottom.
|
|
|
|
## The boundary we're building toward
|
|
|
|
Two principals, two paths:
|
|
|
|
- **Operator** — reaches every UI (the dashboard + every
|
|
per-agent page) through the gateway, on one origin.
|
|
Operator-authority actions (approve / deny, answer-as-operator,
|
|
lifecycle POSTs) are served by the core daemon and only
|
|
reachable via the gateway.
|
|
- **Agent** — speaks only for itself, only over its per-agent
|
|
unix socket. The socket's identity *is* the agent (see
|
|
`docs/conventions.md`, "identity = socket"). An agent must not
|
|
be able to reach the core daemon's HTTP surface, another
|
|
agent's socket, or another agent's web UI.
|
|
|
|
Design rule that falls out of this: **operator-authority
|
|
actions never get a per-agent-socket entry point.** They live on
|
|
the core backend. Worked example — answering an
|
|
operator-targeted question is a `POST /answer-question/{id}` on
|
|
the core dashboard, *never* an `AgentRequest` variant. If it
|
|
were a per-agent-socket request, an agent could `curl` its own
|
|
socket and spoof an operator answer. The per-agent web UI POSTs
|
|
cross-origin to the core for these (see the inline-answer
|
|
feature — the loose-ends section on each agent page).
|
|
|
|
## Workstreams
|
|
|
|
### 1. Container network isolation
|
|
|
|
Today containers share the host network namespace, so a
|
|
container can reach `localhost:<core-port>`, the dashboard, and
|
|
every other agent's web port. **Until this changes, nothing
|
|
below is actually enforced** — the operator/agent split is on
|
|
the honour system.
|
|
|
|
- Give each container a private veth / bridge with no route to
|
|
the host's loopback-bound services.
|
|
- The per-agent unix socket stays the only host-bound channel
|
|
(it already is the intended one).
|
|
- Open question: the per-agent web UI still needs to be
|
|
reachable *by the operator's browser* — that is what the
|
|
gateway is for (below). The container itself should not be
|
|
able to reach the gateway or the core daemon.
|
|
|
|
### 2. Unifying gateway / reverse proxy
|
|
|
|
(Moved here from TODO.md "Dashboard".)
|
|
|
|
Today every agent's web UI is reached at
|
|
`<host>:<per-agent-port>/`, so operators juggle a port list.
|
|
Stand up nginx (or similar) terminating one domain that fans
|
|
requests to `/agent/<name>/...` out to each container's web
|
|
port, and `/` to the main dashboard. Touches: a NixOS module on
|
|
the host, the dashboard's per-agent link rendering, and the
|
|
per-agent web server's base-path handling (currently assumes
|
|
root). Lets bookmarks survive port reshuffles and unblocks
|
|
per-agent stats links being relative URLs instead of hard-coded
|
|
ports.
|
|
|
|
Boundary payoff: once the dashboard and the per-agent pages are
|
|
same-origin behind the gateway, the cross-origin CORS shim on
|
|
`POST /answer-question/{id}` (added with the inline-answer
|
|
feature) can be deleted — the per-agent page's POST becomes a
|
|
plain same-origin request. Grep for `with_cors` /
|
|
`Access-Control-Allow-Origin` in `hive-c0re/src/dashboard.rs`
|
|
and remove it when this lands.
|
|
|
|
The gateway is also the natural home for auth, if/when the
|
|
operator surface ever needs it.
|
|
|
|
### 3. Privsep the core daemon from the web UI
|
|
|
|
(Moved here from TODO.md "Security".)
|
|
|
|
hive-c0re runs as root (it has to — `nixos-container` create /
|
|
start / destroy, the meta git repo, every per-agent bind
|
|
mount). The HTTP server lives in the same process, so every
|
|
read-endpoint (`/api/state-file`, `/api/journal/{name}`,
|
|
`/api/agent-config/{name}`) is one allow-list bug away from
|
|
serving arbitrary host files. Split it: keep the privileged
|
|
daemon doing lifecycle + git + ipc, run the web UI as an
|
|
unprivileged user that talks to the daemon over a unix socket
|
|
with a narrow request surface (`ReadAgentStateFile { agent,
|
|
rel_path }` etc.). The unprivileged process can't read
|
|
`/etc/shadow` even if every check in `get_state_file` is
|
|
bypassed — it doesn't have the bits. Container-lifecycle POSTs
|
|
(`/restart`, `/destroy`, etc.) become forwarded RPCs the
|
|
privileged side authorises on its terms.
|
|
|
|
Cheaper once the harness/state split lands (see TODO.md "Split
|
|
harness-internal state from agent-visible state") — the
|
|
unprivileged web server then only needs read access to
|
|
`/agents/<n>/state/`, not `/agents/<n>/harness/`.
|
|
|
|
## Suggested sequencing
|
|
|
|
1. **Gateway** first — pure ergonomics win, unblocks
|
|
same-origin, no behavioural risk.
|
|
2. **Network isolation** next — the step that makes the
|
|
operator/agent boundary *real*. Everything before it is
|
|
honour-system.
|
|
3. **Privsep** last — defence in depth on the core process
|
|
itself; valuable independent of the other two, but the
|
|
biggest refactor.
|