--- title: "WWT AI Technology Center — Cubie Deployment Playbook" subtitle: "30-day pilot from infrastructure provisioning to ROI report" date: "2026-05-30" --- # WWT ATC Deployment Playbook **Target:** WWT AI Technology Center. **Cluster:** 8-16x NVIDIA H100 (or H200, or AMD MI300X — Cubie is architecture-agnostic). **Duration:** 30 days. **Outcome:** Customer-deliverable ROI report + WWT case-study draft + go/no-go decision for M3 friendly-customer engagement. --- ## Day 0 — Provisioning (T-3 days) **WWT side:** | Item | Spec | Owner | |---|---|---| | H100 cluster slot | 8x H100 (80GB) minimum; 16x preferred | ATC infrastructure | | Inference-gateway VM | Ubuntu 22.04 LTS, 16 vCPU, 64 GB RAM, 500 GB SSD | ATC platform | | Observability stack | Prometheus + Grafana + Loki (existing ATC) | ATC platform | | Networking | 25/100 Gbps inbound to gateway; standard ATC routing | ATC network | | Service account | Read access to DCGM, NVML, gateway logs | ATC platform | **Centillion side:** | Item | Spec | Owner | |---|---|---| | Cubie sidecar container | `centillion/cubie-admit-gate:v1.2.0` (pulled to ATC registry) | Centillion eng | | Cubie control-plane container | `centillion/cubie-control:v1.2.0` | Centillion eng | | Dual-ledger PostgreSQL + OpenBao | Standard HA pair | Centillion eng | | Telemetry exporter | `centillion/cubie-projector:v1.2.0` (Prometheus format) | Centillion eng | | Synthetic-traffic generator | `centillion/cubie-fuzzer:v1.2.0` (clean + adversarial + retry-storm mixes) | Centillion eng | | Deploy script | `wwt/integrations/atc-deploy.sh` | Centillion eng | --- ## Day 1 — Deploy + observe (T+1 day) ### Deploy sequence (executes in ~30 min) ```bash # 1. Pull images to ATC registry docker pull centillion/cubie-admit-gate:v1.2.0 docker pull centillion/cubie-control:v1.2.0 docker pull centillion/cubie-projector:v1.2.0 # 2. Provision dual-ledger docker-compose -f wwt/integrations/atc-stack.yml up -d \ cubie-ledger-postgres cubie-ledger-openbao # 3. Start Cubie control plane (observe-only mode) docker run -d --name cubie-control \ -e CUBIE_MODE=observe \ -e CUBIE_LEDGER_PG=postgres://... \ -e CUBIE_LEDGER_VAULT=https://... \ -p 8910:8910 \ centillion/cubie-control:v1.2.0 # 4. Deploy admit-gate sidecar next to inference gateway docker run -d --name cubie-admit-gate \ --network host \ -e CUBIE_UPSTREAM=http://:8080 \ -e CUBIE_CONTROL=http://localhost:8910 \ -e CUBIE_SHADOW_MODE=1 \ centillion/cubie-admit-gate:v1.2.0 # 5. Wire Prometheus scrape config curl -X POST http://:9090/-/reload \ --data @wwt/integrations/atc-prom-config.yml # 6. Verify decisions flowing curl http://localhost:8910/v1/admit/recent | jq '.[:5]' ``` ### Verify deploy - ✅ Cubie health endpoint: `curl http://localhost:8910/health` returns `{"status":"ok","mode":"observe"}` - ✅ Decision pulses: ≥1 admit decision per second logged in shadow mode - ✅ Prometheus shows `cubie_admit_decisions_total` counter incrementing - ✅ Dual-ledger writing: `psql -c "SELECT count(*) FROM consent_log WHERE created_at > NOW() - interval '5 minutes';"` returns non-zero --- ## Day 1-7 — Shadow-mode observation **Goal:** Capture baseline. Cubie observes; nothing is enforced; no traffic blocked. **Metric collection per day:** | Metric | Source | Target capture cadence | |---|---|---| | `cubie_admit_decisions_total{verdict="PASS"}` | Prometheus | 15s | | `cubie_admit_decisions_total{verdict="FAIL"}` | Prometheus | 15s | | `cubie_admit_decisions_total{verdict="FLUID"}` | Prometheus | 15s | | `cubie_admit_decisions_total{verdict="TAMPER"}` | Prometheus | 15s | | `cubie_admit_latency_seconds` (histogram) | Prometheus | 15s | | Gateway requests/sec | Existing gateway metric | 15s | | DCGM GPU utilization | DCGM exporter | 15s | | DCGM GPU memory used | DCGM exporter | 15s | | DCGM GPU power draw | DCGM exporter | 15s | **Day 7 checkpoint:** - Decision distribution (PASS vs FAIL vs FLUID) profile of the customer's real traffic - Latency overhead (p50, p95, p99 of `cubie_admit_latency_seconds`) - Confirm: ≥99.9% of decisions complete in <1 ms (target: <100 µs) --- ## Day 8-14 — Active admit on 5% slice **Goal:** Turn on enforcement for 5% of inbound traffic; measure denied-request behavior; confirm no false positives. **Activation:** ```bash # Update control plane to active-admit on 5% slice (header-tagged) curl -X PATCH http://:8910/v1/policy \ -d '{"mode":"active","slice":{"by":"header","key":"x-cubie-pilot","value":"on","pct":5}}' ``` Customer's gateway adds the `x-cubie-pilot: on` header to 5% of requests deterministically (hash of session ID or similar). Cubie's denials are enforced ONLY on those tagged requests. **Daily measurements (Day 8-14):** | Metric | Capture | |---|---| | Tagged requests passed | counter, Prometheus | | Tagged requests denied | counter, Prometheus | | Untagged requests (control) | same metrics, for comparison | | Inference latency, tagged vs untagged | histogram diff | | GPU utilization, tagged vs untagged | DCGM, sliced by tag | | GPU memory, tagged vs untagged | DCGM | | Energy/request, tagged vs untagged | PDU + DCGM power, divided by useful-req count | | Denied-request denial-reason histogram | denial-cert log | **Pass criteria (Day 14 checkpoint):** - ✅ False-positive rate on tagged slice < 0.5% (denials that, on review, were valid requests) - ✅ p99 latency on tagged slice within +200 µs of untagged (Cubie overhead absorbed in network noise) - ✅ GPU utilization on tagged slice ≥ untagged (capacity returned) - ✅ ≥1 denial class observed (proves Cubie is doing something, not just passing everything) If pass criteria are not met → halt, investigate, escalate to Centillion engineering. Reversible to shadow mode in seconds. --- ## Day 15-30 — Full active-admit + ROI report **Goal:** Full enforcement, multi-day soak, customer-facing ROI report. **Activation:** ```bash curl -X PATCH http://:8910/v1/policy \ -d '{"mode":"active","slice":{"all":true}}' ``` Cubie now decides admit/deny for 100% of inbound traffic. Bypass mode available with `curl -X POST .../bypass` (operator-pulled, audit-logged, time-bound). **Day 15-29 daily measurements:** | Metric | Capture | |---|---| | Total requests | counter | | Denials by class (LOCK, JAM, SHATTER, DEFLECT, DENY) | counter, sliced | | Useful requests (passed + actually got a useful response) | counter | | Wasted-cycle ratio (1 − useful / total) | derived | | GPU utilization | DCGM | | Effective GPU power (W) | PDU + DCGM | | Energy per useful request (Wh/req) | derived | | End-to-end latency (p50/p95/p99) | gateway | | Cost per useful request (USD/req) | billing × usage | **Day 30 — ROI report deliverable:** ```markdown # Cubie Pilot — ROI Report — — 2026-MM-DD ## Headline numbers (30-day window) | Metric | Baseline (before Cubie) | With Cubie | Delta | |---|---|---|---| | GPU utilization (avg) | X% | Y% | +Zpp | | Wasted-cycle ratio | X% | Y% | −Zpp | | Energy per useful request | X Wh | Y Wh | −Z% | | End-to-end p99 latency | X ms | Y ms | +Z ms (within noise floor) | | Cost per useful request | $X | $Y | −Z% | | Denial volume | n/a | X classified by class | n/a | ## Top denial classes (by volume) [insert from denial-cert log] ## Pen-test pass status [insert: shadow-mode + active-admit weeks both clean] ## Recommendation [Pass → proceed to M3 friendly-customer engagement] [Hold → identified caveats; fix in v1.2.1 patch] ``` --- ## Risks + bypass procedure **Risks tracked throughout pilot:** - Customer's traffic mix differs from synthetic — handled: observation Days 1-7 build the customer-specific baseline before enforcement - False-positive volume too high — handled: shadow-mode reveals it before active mode; active mode is reversible - Latency budget exceeded — handled: Cubie's decision is sub-microsecond on hot cache; if budget exceeded, log + alert + auto-pause - Customer ops team uncomfortable with active-admit — handled: bypass mode is always available; operator-pulled, audit-logged **Bypass procedure (operator-callable, 30 sec to invoke):** ```bash # Pause all enforcement; observe mode resumes curl -X POST http://:8910/v1/bypass \ -d '{"reason":"operator-paused","duration":"30m","operator":""}' # Resume enforcement curl -X POST http://:8910/v1/resume ``` Both operations write to the dual ledger. Customer's audit team can replay the full timeline. --- ## After pilot success (M2 → M3 transition) | Deliverable | Owner | |---|---| | ATC case-study draft | WWT marketing + Centillion engineering | | WWT internal demo | Joint | | Friendly-customer identification (Texas A&M class) | WWT channel | | MSA + NDA template | Legal | | Customer-specific shadow-mode protocol | Centillion (instantiate from this playbook) | | Pen-test engagement signed | Centillion |