The clawker control plane (CP) is a long-lived, privileged Go service that runs as cmd/clawker-cp (PID 1) inside the clawker-controlplane Docker container. It is the authoritative supervisor for every clawker-managed agent on the host — it owns the agent identity registry, the egress firewall lifecycle, the eBPF program lifetime, and the CP↔agent command channel.
You normally won’t think about the control plane. The first time any clawker command needs it (clawker firewall status, clawker run, clawker controlplane agents, …), the CLI brings it up transparently. The clawker controlplane verb group exists for debugging, upgrades, and recovery — not day-to-day use.
The control plane is not the firewall. The firewall (Envoy + CoreDNS + eBPF) is one of several subsystems CP manages. Disabling the firewall via settings.yaml does not disable the control plane — CP, mTLS, and the agent registry continue to run for any other clawker container. See the Firewall guide for the firewall itself.
What CP Does
The CP container is a single binary, clawker-cp, running as PID 1. Inside it:
- Ory auth stack — Hydra (OAuth2 token issuer,
client_credentials + private_key_jwt ES256), Kratos (identity), and Oathkeeper (HTTP auth proxy) are subprocess-managed by the same PID. Token validation is fail-closed.
- AdminService gRPC (mTLS + OAuth2 JWT, default port
7443 on host loopback) — the 13-method firewall control surface (FirewallInit, FirewallEnable, FirewallAddRules, FirewallSyncRoutes, FirewallBypass, …) plus ListAgents and GetSystemTime (public-scope, no bearer token required — used by the clock-sync readiness gate). Every CLI clawker firewall * and clawker controlplane agents call goes through this RPC.
- AgentService gRPC (mTLS, default in-container port
7444, reachable only over clawker-net) — the surface clawkerd uses to register itself with CP and hold open a long-lived Session.
- Agent registry — a sqlite database persisted on the host XDG data dir, keyed by SHA-256 of the agent’s mTLS leaf cert thumbprint plus container ID. CP is the sole writer; reads go through
ListAgents. The registry survives CP restarts.
- Overseer event bus + worldview — an in-process typed pub/sub serializing container lifecycle (start/stop/destroy/rename), agent session lifecycle (connecting/connected/failed/broken), and trust verdict events into a deep-copyable
State snapshot.
- Docker events feeder — subscribes to the local Docker daemon’s event stream (with reconnect), projects managed-label-filtered events onto the overseer bus.
- Agent watcher + clean self-shutdown — polls Docker every 30s for
purpose=agent, managed=true containers. After drain-to-zero (60s grace period elapsed AND 2 consecutive zero-count polls), it fires an ordered drain callback: actionQueue.Close → graceful gRPC stop → cancel bypass timers → Stack stop → netlogger.Stop (drains the eBPF egress event pipeline and flushes the OTLP BatchProcessor BEFORE BPF maps go away) → DNS GC stop → eBPF FlushAll → exit code 0. The on-failure restart policy does not retrigger.
- eBPF egress event emitter (netlogger) — drains a BPF ringbuf populated at every cgroup/connect/sendmsg/sock_create decision and emits OTLP log records on the same mTLS-gated infra lane the CP zerolog bridge uses. Distinct
service.name=ebpf-egress so OpenSearch routes the stream to its own index. Degrades to event=netlogger_unavailable (no panics) when the collector is unreachable; firewall enforcement is unaffected. See Egress Observability for the record shape.
- Aggregate
/healthz — host-loopback HTTP on HealthPort (default 7080) probes every internal service port before returning 200. Used by both clawker controlplane status and the host-side bootstrap to confirm readiness.
Container Privileges
The CP container runs with elevated permissions because it is the host-side supervisor that loads kernel-attached eBPF programs and brings up the sibling firewall containers (Envoy, CoreDNS) on your behalf. The full privilege set is:
| Privilege | Why CP needs it |
|---|
CAP_SYS_ADMIN | Required to attach cgroup-bound eBPF programs to agent containers’ cgroups and to mount the bpffs pin path. |
CAP_BPF | Required to load BPF programs and create/update pinned BPF maps. |
/sys/fs/bpf (RW bind mount) | Where BPF programs and maps are pinned. Pins survive across CP restarts (they’re attached to cgroups, not CP’s process). CP needs RW to mount the clawker subdirectory at boot, sync routes on rule changes, and flush state during clean shutdown. |
/sys/fs/cgroup (RO bind mount) | Required to enumerate agent containers’ cgroup paths so CP can attach BPF programs to the right cgroup. |
/var/run/docker.sock (bind mount, RO file flag) | CP subscribes to Docker events (container start/stop/destroy) and brings up the Envoy/CoreDNS sibling containers via the Docker API. The RO flag only prevents the socket file itself from being replaced inside the container — once the socket is open, the Docker daemon honors any API call over it, so CP has full Docker control regardless of the mount flag. |
apparmor=unconfined | Docker’s docker-default AppArmor profile denies writes under /sys/fs/** (except /sys/fs/cgroup/**), which blocks mkdir /sys/fs/bpf/clawker at eBPF load even with CAP_BPF + CAP_SYS_ADMIN. CP runs unconfined so the bpffs pin path is writable. This mirrors the upstream cilium-agent posture (appArmorProfile.type: Unconfined). Defense-in-depth on the CP container relies on the docker-default seccomp profile, namespaces, masked /proc paths, and no-new-privileges — all still applied. |
These privileges are not extended to agent containers. The agent container itself runs fully unprivileged: cap_add: [], the docker-default seccomp and AppArmor profiles, no /sys/fs/bpf or /sys/fs/cgroup mount, no Docker socket. The agent’s blast radius is bounded by its own container; CP’s privileges exist only to enforce that boundary, not to relax it.
CP is the privileged side of the security boundary. Anything with write access to the CP binary, the embedded ebpf-manager, the mounted CA, or the AdminService server certs can subvert firewall enforcement for every agent container on the host. Apply the release verification procedure to any clawker upgrade, and do not bind-mount additional host paths into the CP container via local modifications.
Guarantees
- eBPF programs have a deterministic owner. BPF cgroup programs and pinned maps survive the CP container’s death (they’re under
/sys/fs/bpf). Without a supervisor, rule changes would silently fail and bypass timers would never expire. CP is the single owner — its drain callback is the only clean exit path that detaches and flushes eBPF state.
- Agent identity is auditable. Every clawkerd instance binds itself to CP via mTLS Register before any privileged operation. The cert thumbprint is captured server-side from the live TLS handshake — agents cannot self-attest.
clawker controlplane agents lists every binding, including which container holds which identity.
- Containment is real. Because CP holds a long-lived Session to every agent’s
clawkerd, it can dispatch commands (init steps, MCP setup, shutdown signals) into a compromised container without re-authenticating each time.
- Auth is centralized. Hydra issues short-lived OAuth2 tokens for every CLI↔CP gRPC call, signed by the CLI-issued auth material. The CLI is the root of trust; CP only validates.
CP crashing is a security incident, not an availability one. If the CP container panics or exits uncleanly, the eBPF programs it attached remain pinned to your agent containers’ cgroups — traffic keeps getting filtered by whatever rules were loaded at crash time, but no new rules can be applied, no bypass timers can expire, and no CP↔agent dispatch is available. Run clawker controlplane status if you suspect something is wrong; a Container: stopped result with agents still up means you should clawker controlplane up to re-establish supervision.
How CP Boots
Two paths bring CP up:
- Transparent bootstrap — the first CLI call that needs CP (most firewall commands, container creation, anything that opens an
AdminClient) runs cpboot.EnsureRunning under a host-side mutex. Steps: ensure the CP image exists with a content-derived tag (clawker-controlplane:bin-<sha>, built on demand from the embedded binaries), ContainerCreate on clawker-net with a static IP, ContainerStart, then poll http://127.0.0.1:<HealthPort>/healthz until 200 or timeout. Idempotent — re-runs are no-ops once /healthz is green.
- Break-glass —
clawker controlplane up calls the same EnsureRunning path explicitly, useful when you want to bring CP up without triggering a side-effect command.
Either way, the CP image is built from binaries embedded in the clawker CLI itself (clawker-cp, ebpf-manager). There’s no separate image to pull. See Installation for the BPF toolchain requirements when building from source.
On every boot, CP reads firewall.enable from settings and — when enabled (the default) — starts the Envoy + CoreDNS firewall stack before reporting ready, so a green /healthz means the firewall is actually enforcing. That covers boots no CLI command observes, like Docker’s restart policy resurrecting a crashed CP. A failed stack bringup fails CP startup (the container exits non-zero): running half-protected would leave agents either unusable (their egress redirected at a dead proxy) or, worse, silently unenforced while you believe the firewall is on. The CLI surfaces the exit with a pointer at docker logs clawker-controlplane; fix the cause and rerun, or disable the firewall in settings to run unprotected.
Networking
CP joins clawker-net with a deterministic static IP computed by replacing the gateway’s last octet with 202 — so e.g. 192.168.215.202 on a default Docker bridge with gateway 192.168.215.1. The CLI talks to it over host loopback for AdminClient (mTLS gRPC on port 7443) and /healthz (plain HTTP on port 7080). The agent listener (7444) is only reachable from other containers on clawker-net.
When CP brings up the firewall, it places Envoy at <network>.200 and CoreDNS at <network>.201 on the same network (last-octet replacement, same scheme). Agent containers join clawker-net with --dns pointing at CoreDNS so DNS resolution is filtered from the very first lookup.
CLI Surface
All clawker controlplane subcommands are break-glass — useful for debugging, upgrades, and recovery, not normal use.
| Command | Purpose |
|---|
clawker controlplane up | Idempotent EnsureRunning. Brings CP up if it isn’t already; no-op if /healthz is green. When the firewall is enabled in settings (firewall.enable, the default), also brings the Envoy + CoreDNS firewall stack up and waits until it’s healthy. |
clawker controlplane down | Stops the CP container. clawker-cp’s SIGTERM handler runs the clean drain (actionQueue.Close → graceful gRPC stop → bypass timer cancel → Stack stop → netlogger stop → eBPF flush → exit 0). |
clawker controlplane status | Probes /healthz; if up, also fetches firewall subsystem state via the AdminService. Output via --format json for scripts. |
clawker controlplane agents | Lists every agent currently registered with CP — composite (project, agent_name) plus container ID, cert thumbprint, registration time, and last-seen time. Output via --format json for scripts. |
The clawker auth group manages the CLI-side auth material CP depends on:
| Command | Purpose |
|---|
clawker auth rotate | Regenerates the CA, server certs, and OAuth2 signing key bind-mounted into CP. Use when rotating keys, after a key compromise, or when reinstalling. |
See the full reference: clawker controlplane, clawker auth.
Verifying CP Is Up
clawker controlplane status
Container: running
Healthz: ✓
Firewall: ✓
Rules: 12 active
If the CP container is not running, the CLI reports Container: stopped and the health and firewall fields are omitted. Bringing CP back up:
To list the agents currently bound to CP:
clawker controlplane agents
AGENT PROJECT CONTAINER THUMBPRINT REGISTERED LAST SEEN
dev myapp a1b2c3d4e5f6 9f8e7d6c5b4a 2026-05-12T09:14:02Z 2026-05-12T09:42:18Z
review myapp 7890abcdef12 1234567890ab 2026-05-12T09:14:05Z 2026-05-12T09:42:18Z
Settings
CP-related ports and behavior live under control_plane: in settings.yaml (~/.config/clawker/settings.yaml). See Configuration → control_plane for the schema. The defaults work out of the box; override only if a port conflicts:
control_plane:
admin_port: 7443 # CLI ↔ CP gRPC (host loopback, mTLS + OAuth2)
health_port: 7080 # CLI ↔ CP /healthz (host loopback, plain HTTP)
agent_port: 7444 # clawkerd ↔ CP gRPC (clawker-net only, mTLS)
hydra_public_port: 4444
hydra_admin_port: 4445
oathkeeper_port: 4456
oathkeeper_api_port: 4457
kratos_public_port: 4433
kratos_admin_port: 4434
The Ory admin and API ports (hydra_admin_port, kratos_public_port, kratos_admin_port, oathkeeper_api_port) are container-internal — they are not published to the host. hydra_public_port and oathkeeper_port are published to 127.0.0.1 on the host. All ports appear in settings so the in-container subprocesses agree on their port assignments.
Troubleshooting
CP container won’t start.
Run docker logs clawker-controlplane (CP panic traces and Ory subprocess output land here, not in clawker’s rotating logs). The most common causes: stale port bindings from a half-killed previous run (clawker controlplane down then retry), or auth material out of sync (try clawker auth rotate).
clawker run / clawker container start fails with cp clock sync deadline exceeded (Docker Desktop).
Before starting a container, the CLI brings the control plane to full readiness — which includes waiting until CP’s clock (the Docker Desktop LinuxKit VM clock, where CP runs) has caught up to the host clock. The gate requires full convergence (zero leeway): CP’s clock must reach the host instant. If it doesn’t converge within ~30s the command fails with:
starting container: bootstrapping services: ensuring control plane is running: cp clock sync deadline exceeded
This almost always happens after the host sleeps and wakes: the VM clock drifts behind real time until its NTP source re-syncs. The bootstrap assertion is minted in the host clock, while Hydra validates its iat against CP’s clock with zero leeway — so exchanging it against a still-lagging CP clock would earn a Hydra Token used before issued 500. Rather than re-mint or skew-correct, the gate waits for the CP clock to reach the host before letting the container start, so nothing unusable is baked in: the baked assertion stays valid, and once the VM clock catches up a plain retry of clawker run / clawker container start succeeds — no need to delete and recreate. Wait a few seconds for the VM clock to catch up and retry, or restart Docker Desktop to force a resync. (The earlier symptom of this drift — that Hydra Token used before issued 500 at container start — is now caught by this pre-start gate instead.)
clawker firewall * commands hang or fail with connection refused.
CP isn’t running or /healthz is not green. clawker controlplane status confirms. clawker controlplane up brings it back.
Agents appear in clawker ps but not in clawker controlplane agents.
The agent’s clawkerd hasn’t completed the Register handshake with CP — either CP wasn’t running when the container started, or the agent’s mTLS material is invalid. docker logs clawker.<project>.<agent> (look for event=register_failed or TLS handshake errors) and clawker auth rotate are the typical recovery steps.
Want to know what CP is up to in real time.
The host-side CP log file is ~/.local/state/clawker/logs/clawker-controlplane.log (rotated). Stack traces from a CP panic land on the CP container’s stderr (docker logs clawker-controlplane), not in this file — so if the rotating log is silent but agents are misbehaving, check docker logs first.
See Also