Operational notes
Config files
Everything arbiter persists lives under ~/.arbiter/:
| Path | Purpose |
|---|---|
api_key | Anthropic API key (mode 0600). Read if $ANTHROPIC_API_KEY is unset. |
openai_api_key | OpenAI API key (mode 0600). Read if $OPENAI_API_KEY is unset. |
gemini_api_key | Google Gemini API key (mode 0600). Read if $GEMINI_API_KEY is unset. |
admin_token | Admin bearer token (mode 0600). Read if $ARBITER_ADMIN_TOKEN is unset. |
tenants.db | SQLite ledger. WAL mode + SQLITE_OPEN_FULLMUTEX (serialised threading), foreign keys enforced. Schema migrates on open. |
agents/*.json | Local example agent constitutions (CLI-mode only — the API path doesn't read these). |
memory/t<tenant_id>/*.md | Legacy per-tenant agent file scratchpads. The agent file scratchpad now also has a DB-backed implementation (see structured memory + artifacts). |
workspaces/t<tenant_id>/ | Per-tenant sandbox workspace (mode 0700). Created on demand when the sandbox is enabled. See Sandbox. |
master_model | Override for the master agent's model. |
mcp_servers.json | Optional MCP server registry (see MCP servers). |
SQLite concurrency
The ledger DB is opened with SQLITE_OPEN_FULLMUTEX, which forces SQLite into serialized threading mode regardless of the underlying library's build defaults. This matters because:
- The API server's accept loop spawns one thread per connection.
- Without serialization, sharing a single
sqlite3*across threads is undefined behaviour and surfaces as either garbage error messages ("no such table: <…>") orSIGSEGVinsqlite3RunParser. SQLITE_OPEN_FULLMUTEXadds a per-connection mutex; the cost is invisible at our scale (hundreds of req/min ceiling on this tier).
If you build against a SQLite that already defaults to serialized mode (Linux distros typically do; macOS's system SQLite typically does not), the flag is a no-op.
CORS
Every response includes permissive CORS headers (Access-Control-Allow-Origin: *, methods GET, POST, PATCH, DELETE, OPTIONS, headers Authorization, Content-Type, Accept). OPTIONS preflights short-circuit before auth and return 204 so a SPA on a different origin can hit the API in dev with no proxy.
Bearer auth carries in the Authorization header — no cookies — so credentials are never sent. To restrict origins in production, terminate at a reverse proxy and override Access-Control-Allow-Origin there, or extend kCorsHeaders in src/api_server.cpp to read an allowlist from ARBITER_CORS_ORIGINS.
Per-tenant rate / concurrency limiting
The runtime keeps a per-tenant in-flight counter and a token-bucket rate limiter in front of the expensive routes (POST/v1/orchestrate, POST/v1/conversations/:id/messages, POST/v1/agents/:id/chat, POST/v1/a2a/agents/:id). Cheap reads (health, GET memory, GET schedules) are unaffected — the bucket isn't consumed.
| Env var | Meaning | Default |
|---|---|---|
ARBITER_TENANT_MAX_CONCURRENT | Max concurrent in-flight LLM requests per tenant. | 0 (unlimited) |
ARBITER_TENANT_RATE_PER_MIN | Token-bucket refill rate per tenant. | 0 (unlimited) |
ARBITER_TENANT_RATE_BURST | Token-bucket capacity (max burst above refill rate). | rate_per_min |
A 0 on either axis disables that axis (the limiter grants every acquire without taking the lock; zero overhead for operators not using the surface). Surplus requests get 429 Too Many Requests with Retry-After set to a best-guess wait. The body distinguishes the two failure modes:
{ "error": "rate limit exceeded",
"reason": "concurrent_request_limit",
"retry_after_seconds": 1 }
reason is one of concurrent_request_limit or rate_limit; clients can branch on it. Defaults are catch-the-runaway-loop, not fairness — operators wanting precise per-tenant SLAs should set explicit values via env or per-tenant overrides on the admin tenant row.
Deployment
Run behind a reverse proxy. TLS, rate limiting, and DDoS protection are out of scope for arbiter itself — it binds 127.0.0.1 by default specifically to encourage this layout.
Example nginx location block
location /v1/ {
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header Authorization $http_authorization;
proxy_buffering off; # critical for SSE
proxy_read_timeout 3600; # long LLM calls
add_header X-Accel-Buffering no;
}
X-Accel-Buffering: no is already set by arbiter on its SSE responses; the nginx directive here is belt-and-suspenders.
Scaling characteristics
- One thread per connection. Arbiter doesn't pool; each
/v1/orchestrategets a freshOrchestratorwith fresh agent history. Sub-agents spawned by/parallelare additional threads within that request. - SQLite on the write path. Fine for single-node deployments up to a few hundred req/min. Multi-node = swap for Postgres (schema ports cleanly).
- Per-request MCP subprocesses. Each request that invokes
/mcpor/browsespawns its own subprocess, killed at request end. Cold starts cost real wall-clock; cachenpx-installed packages on a tmpfs or a persistent volume for production.
Crash diagnostics
ApiServer::start() installs a SIGSEGV / SIGABRT / SIGBUS / SIGFPE handler that prints a backtrace to stderr before re-raising. Combined with the connection-level try/catch (which catches std::exception and turns it into a 500 with the what()), runaway throws don't take down the daemon — they get logged with the request method + path and surface as a clean 500 to the client.
Per-handler [memory] tenant=<id> entry.patch.* … breadcrumbs are emitted along the entry mutation paths so the last line printed before a crash localises the failing step.
Metrics
GET/v1/metrics exposes Prometheus-format counters and gauges covering request flow (per tenant + route), provider call health (per provider), sandbox container lifecycle, idempotency cache hits/misses, and rate-limit rejections. The endpoint is unauthenticated by design — restrict it at the reverse proxy if you don't want every client on the metrics network to scrape it.
See the metrics endpoint reference for the full metric list and a sample Prometheus scrape config.
Structured logging
Operational events (startup, recovery sweep, shutdown drain, sandbox enable/disable, sandbox idle reaping) are routed through a single logger that emits either human-readable or JSON lines on stderr. Switch with ARBITER_LOG_FORMAT:
| Value | Output |
|---|---|
human | Default. [HH:MM:SS] [level] event key=value key=value. |
json | One JSON object per line: {"ts":"...","level":"...","event":"...",...}. |
JSON mode is intended for log aggregators (Loki, ELK, Datadog) that index by field rather than parse free-form text. Per-request stderr logs from --verbose SSE mirroring keep their existing human format in v1 — only the operational-event stream is structured.
{"ts":"2026-05-11T14:23:01.412Z","level":"info","event":"sandbox_enabled","image":"arbiter/sandbox:latest","network":"none","memory_mb":"512","cpus":"1.00","pids":"256","timeout_s":"30"}
{"ts":"2026-05-11T14:23:01.413Z","level":"info","event":"recovery_sweep","orphaned_count":"2","new_state":"failed"}
{"ts":"2026-05-11T18:42:55.808Z","level":"info","event":"drain_started","in_flight":"4","deadline_s":"30"}
{"ts":"2026-05-11T18:42:58.221Z","level":"info","event":"drain_complete"}
Provider circuit breaker
A per-provider circuit breaker sits in front of the per-request retry loop. After 5 consecutive failures (5xx or 429 past the retry budget) against the same provider, the breaker opens for a 30 s cooldown. Calls while open return a structured circuit_open error from the done event's error_code field — agents see a fast-fail instead of every parallel request burning four retries against a clearly-unhealthy upstream. When the cooldown elapses, the next call admits as a half-open probe; success closes the breaker, failure reopens it with a fresh cooldown.
State per provider is exposed via the arbiter_provider_circuit_open_total counter on /v1/metrics. Defaults are tuned conservatively for v1 — operator-tunable thresholds (env vars) are a Phase 5 follow-up.
Admin audit log
Every mutation through /v1/admin/* (create tenant, disable tenant) writes an append-only row to the admin_audit table in tenants.db. Operators read the log via GET/v1/admin/audit. The runtime never edits or deletes audit rows; retention policy is the operator's call.
SSE heartbeat
Every primary SSE response (/v1/orchestrate, /v1/conversations/:id/messages, /v1/agents/:id/chat, /v1/requests/:id/events) emits a : SSE comment frame every 30 seconds when the stream is otherwise idle. Comment frames are ignored by spec-compliant clients but keep TCP alive past reverse proxies that drop idle connections (nginx defaults to 60s, Cloudflare to 100s). Long-running LLM calls survive these limits without per-deployment tuning.
The heartbeat is implementation-detail: clients don't subscribe to or rely on it; resumability is handled by since_seq on the resubscribe path, not by counting heartbeats.
Graceful shutdown
On SIGINT or SIGTERM the server:
- Stops accepting new connections (closes the listen socket).
- Issues
Orchestrator::cancel()against every in-flight orchestration via the in-flight registry, so streaming LLM calls wake up and the SSE handler can write a final frame. - Waits up to
ARBITER_DRAIN_SECONDS(default30) for in-flight connection threads to finish. - Tears down sandbox containers via
SandboxManager::stop_all(). - Exits.
Connections still active past the drain deadline are abandoned with a stderr warning — the OS closes their sockets when the process exits. Set ARBITER_DRAIN_SECONDS=0 to skip the wait entirely (useful for tests and tight CI shutdowns); set it higher than 30 if your longest expected LLM call is over half a minute.
File cap
Agent-generated files (via /write, ephemeral path) are captured in memory and forwarded as file SSE events. A per-response cap (default 10 MiB across all files in the same request) kicks in if an agent tries to flood the stream. Beyond the cap, /write attempts return an ERR: tool result and the file is dropped. The persistent path (/write --persist) has its own quota structure — see Artifacts.
Sandboxed /exec
By default the API server keeps /exec disabled and surfaces a clean ERR: to the agent. To enable shell execution for agents safely, configure the per-tenant Docker sandbox — see Sandbox for setup, env vars, container lifecycle, resource caps, and failure modes. When the sandbox is wired, /exec, /write, and /read all share the tenant's /workspace bind mount so agents can build cross-turn workflows that produce and re-read files.
Error response codes (cross-cutting)
| Code | Meaning |
|---|---|
| 200 | Normal — for JSON admin responses, and the opening frame of an SSE orchestrate stream. |
| 201 | Resource created (POST endpoints). |
| 304 | If-None-Match matched a strong ETag (artifact /raw reads). |
| 400 | Malformed request (bad JSON, missing required field, bad tenant id, sanitiser rejection). |
| 401 | Missing / invalid / mismatched bearer token, or tenant disabled=true. |
| 404 | Unknown endpoint, or id not found / wrong tenant on a GET / PATCH / DELETE. |
| 405 | Method not allowed on an existing route. |
| 409 | Conflict (duplicate id). |
| 410 | The row vanished mid-request (concurrent DELETE). |
| 413 | Quota exceeded (artifact stores). |
| 500 | Uncaught exception during handling. The body includes what(). |
| 503 | Admin endpoint called while the server has no admin token configured. |