---
title: "Cubie PoC Metrics + ROI Framework"
subtitle: "Customer-deliverable measurement framework for any 30-day pilot"
date: "2026-05-30"
status: "WWT-channel ready"
---

# Cubie PoC Metrics + ROI Framework

**Purpose.** A reusable measurement framework that any WWT-channel
customer pilot can instantiate to produce a defensible ROI report at
the end of a 30-day Cubie deployment. Mirrors the dashboard set in
`03_Day2_Operations_Runbook.md` and the activation cadence in
`01_ATC_Deployment_Playbook.md`.

---

## Measurement objectives (what we are proving)

1. **Cubie does not break the workload.** Latency overhead absorbed in noise; no useful requests blocked.
2. **Cubie recovers GPU capacity.** Wasted-cycle ratio falls; useful-request throughput rises per GPU-hour.
3. **Cubie classifies denials accurately.** Denial-class distribution matches the threat-model expectations from the customer's known traffic mix.
4. **Cubie is operationally safe.** Zero unscheduled bypasses; zero ledger write failures; all denials replay-auditable.
5. **Cubie ROI is positive.** Capacity recovered + GPU-hour cost saved >> Cubie license + ops cost.

Each objective maps to specific metrics below and is reported as Pass / Hold / Fail at Day 30.

---

## Metric catalog

### Tier A — Correctness (must-haves)

| ID | Metric | Source | Target |
|---|---|---|---|
| A1 | False-positive rate on tagged slice | denial-cert review (sample 100 denials/day) | < 0.5% |
| A2 | False-negative rate (escaped attack) | red-team simulator | < 0.1% per known-bad class |
| A3 | Denial-class distribution shift from baseline | Prometheus `cubie_admit_decisions_total{denial_class}` | no class jumps > 3σ vs baseline week |
| A4 | Decision-replay consistency | hash-of-decisions matches between ledger + projector | 100% |

### Tier B — Performance (load-bearing)

| ID | Metric | Source | Target |
|---|---|---|---|
| B1 | Admit-decision p50 latency | Prometheus histogram `cubie_admit_latency_seconds` | < 100 µs |
| B2 | Admit-decision p99 latency | Prometheus histogram | < 1 ms |
| B3 | Gateway end-to-end p99 latency, with vs without Cubie | gateway-side histogram | Δ < +200 µs |
| B4 | Admit-gate throughput sustained | Cubie request counter | > 100K decisions/sec on Tier 4 host |
| B5 | Cubie sidecar memory footprint | container metric | < 8 GB RSS |
| B6 | Cubie sidecar CPU steady-state | container metric | < 2 vCPU at 50K rps |

### Tier C — Business impact (the ROI numbers)

| ID | Metric | Source | Target / formula |
|---|---|---|---|
| C1 | GPU utilization (avg) | DCGM `DCGM_FI_DEV_GPU_UTIL` | Δ ≥ +5 percentage points vs baseline |
| C2 | Wasted-cycle ratio | `1 − useful_requests / total_requests` | Δ ≥ −15% absolute |
| C3 | Useful requests per GPU-hour | gateway success counter / DCGM GPU-hours | Δ ≥ +10% |
| C4 | Energy per useful request | PDU + DCGM power, divided by useful-req count | Δ ≥ −10% (Wh per useful req) |
| C5 | Cost per useful request | (GPU-hour cost × GPU-hours) / useful_req | Δ ≥ −10% |
| C6 | Mean time between cluster-wide denial spikes | denial-cert log | trend stable or improving |

### Tier D — Operational (Day-2 readiness)

| ID | Metric | Source | Target |
|---|---|---|---|
| D1 | Dual-ledger write success rate | `cubie_dual_ledger_writes_total{status="ok"}` / `total` | ≥ 99.999% |
| D2 | Attestation freshness violations | counter | 0 |
| D3 | Unscheduled bypass invocations | bypass audit log | 0 |
| D4 | Cubie sidecar uptime | container metric | ≥ 99.95% |
| D5 | Telemetry pipeline lag | exporter timestamp − scrape timestamp | < 30 sec |

---

## Spreadsheet layout (rows = days, columns = metrics)

Recommend a simple Sheets/Excel layout the customer maintains in parallel with the dashboard:

```
| Day | A1 % | A2 % | B1 µs | B2 µs | B3 µs | C1 %util | C2 % | C3 req/h | C4 Wh | C5 $/req | D1 % | D3 cnt |
|-----|------|------|-------|-------|-------|----------|------|----------|-------|----------|------|--------|
|  0  | bsln | bsln | n/a   | n/a   | n/a   | bsln     | bsln | bsln     | bsln  | bsln     | n/a  | n/a    |
|  1  |  .   |  .   |  .    |  .    |  .    |  .       |  .   |  .       |  .    |  .       |  .   |   0    |
|  …  |
| 30  | aggr | aggr | p99   | p99   | p99   | avg      | avg  | avg      | avg   | avg      | avg  | sum    |
```

`bsln` = day-0 baseline collected from the gateway BEFORE Cubie was deployed (so Cubie's contribution is measurable).

A starter CSV is in `04_PoC_Metrics_starter.csv` (next to this file).

---

## ROI calculation worksheet

```
Baseline week (pre-Cubie):
  total_requests_per_day       = R_base
  useful_requests_per_day      = U_base
  GPU_hours_per_day            = G_base
  GPU_hour_cost                = $/hour
  daily_GPU_cost               = G_base × $/hour
  daily_useful_throughput      = U_base / G_base    (req per GPU-hour)
  daily_cost_per_useful_req    = (G_base × $/hour) / U_base

Cubie week (active enforcement):
  total_requests_per_day       = R_cubie
  useful_requests_per_day      = U_cubie
  GPU_hours_per_day            = G_cubie   (typically ≤ G_base for same R)
  daily_useful_throughput      = U_cubie / G_cubie
  daily_cost_per_useful_req    = (G_cubie × $/hour) / U_cubie

Headline ROI (per day):
  capacity_recovered_useful_req_per_day = U_cubie − U_base
  cost_saved_per_day                    = (cost_per_useful_base − cost_per_useful_cubie) × U_cubie
  annual_cost_saved                     = cost_saved_per_day × 365

Cubie cost (annual):
  Tier 4 license (8-32 vCPU sidecar)    = $20K-$80K
  Ops + Day-2 (Silver tier)              = $15K-$30K
  Cubie incremental power (< 0.1% GPU)   = ~0

ROI multiple = annual_cost_saved / annual_Cubie_cost
```

A pilot is **GO** for paid expansion if `ROI multiple ≥ 3×` at Day 30. The recommended pilot target is ≥ 5× given the 30-day window is the worst case for amortization.

---

## Per-week milestones

### Week 1 — baseline + shadow mode
- Day 0: collect 24h baseline (all Tier C + Tier B metrics) before Cubie enabled
- Day 1-7: Cubie in shadow mode; collect denial-class distribution; validate p50/p99 latency
- Week 1 checkpoint: A1/A2 sample of 100 shadow-mode denials reviewed for false-positive estimation

### Week 2 — 5% slice enforcement
- Day 8: activate enforcement on tagged 5% slice
- Day 8-14: collect tagged vs untagged Tier C deltas
- Week 2 checkpoint: A1 < 0.5%, B3 Δ < +200 µs, C1 Δ ≥ +2pp

### Week 3 — 100% enforcement, soak
- Day 15: activate enforcement on 100% of traffic
- Day 15-21: continuous Tier B/C/D collection; daily Sev 3+ review
- Week 3 checkpoint: D1 ≥ 99.999%, D3 = 0, C2 Δ ≤ −10pp

### Week 4 — soak + ROI report
- Day 22-29: continued soak; pen-test rerun on tagged slice (if scoped)
- Day 30: generate ROI report, customer review, GO/HOLD decision

---

## Day-30 ROI report shape

```markdown
# Cubie Pilot — ROI Report — <Customer> — 30-day window <YYYY-MM-DD → YYYY-MM-DD>

## Headline ROI
- Annual cost saved (extrapolated):  $<X>
- Annual Cubie cost:                  $<Y>
- ROI multiple:                       <X/Y>×

## Tier A — Correctness (Pass / Hold / Fail per metric)
- A1 False-positive rate:    <%> [PASS/HOLD/FAIL]
- A2 False-negative rate:    <%> per class [PASS/HOLD/FAIL]
- A3 Denial-class drift:     [PASS/HOLD/FAIL]
- A4 Replay consistency:     [PASS/HOLD/FAIL]

## Tier B — Performance
- B1 admit p50:              <µs>
- B2 admit p99:              <µs>
- B3 e2e Δp99:               <µs>
- B4 throughput sustained:   <decisions/sec>

## Tier C — Business impact
- C1 GPU util Δ:             <±pp>
- C2 Wasted-cycle Δ:         <±pp>
- C3 Useful req/GPU-hour Δ:  <±%>
- C4 Energy/useful-req Δ:    <±%>
- C5 Cost/useful-req Δ:      <±%>

## Tier D — Operational
- D1 Ledger write success:   <%>
- D2 Attestation violations: <count>
- D3 Unscheduled bypasses:   <count>
- D4 Uptime:                 <%>

## Top 10 denial classes (volume + sample reasons)
[insert from denial-cert log]

## Pen-test result
[insert: clean / findings filed]

## Recommendation
[ ] GO — proceed to paid expansion / next vertical
[ ] HOLD — fix items in v1.2.x patch, re-pilot 2 weeks
[ ] FAIL — root-cause analysis and joint go/no-go review

## Customer sign-off
__________________________   __________________________
Customer ops lead             Customer security lead
__________________________   __________________________
WWT account team              Centillion engineering lead
```

---

## Open questions for the customer (collect BEFORE Day 0)

1. **What's the GPU-hour cost?** ($/hour per H100; needed for C5)
2. **What is the baseline useful-request definition?** (a successful 200 from gateway? Or also requires non-empty content? Customer-specific)
3. **What is the threat-model expectation for denial-class distribution?** (helps grade A3)
4. **What is the "acceptable false-positive threshold" per the customer's risk appetite?** (sets the A1 PASS bar; default 0.5% but some customers want < 0.1%)
5. **Is pen-test in scope at Day 22?** (sets dependency on Trail of Bits/Cure53)

---

## Companion files

- `04_PoC_Metrics_starter.csv` — empty spreadsheet header for the daily collection log
- `01_ATC_Deployment_Playbook.md` — referenced for activation sequence
- `03_Day2_Operations_Runbook.md` — referenced for dashboard set + telemetry pipeline

---

**This is the customer-deliverable.** It does not commit Centillion or WWT to any particular ROI number — it commits us to the measurement protocol. The number that comes out is the customer's number.