---
title: "Cubie Architecture + Sizing Reference"
subtitle: "Per-tier hardware specs, throughput, power, cost class"
date: "2026-05-30"
---

# Cubie Architecture + Sizing Reference

**Purpose.** Answer the WWT/Intel field question "what does this run on, at what scale, and what does it cost?" with concrete numbers per deployment tier. Same `CubeObjectV1` byte layout (192 bytes) runs on every tier from pacemaker to Sapphire Rapids; the hot-path implementation changes per tier; the data model does not.

---

## Tier matrix (one byte layout, six runtime profiles)

### Tier 1 — Bare-metal medical / embedded (pacemaker class)

| Spec | Value |
|---|---|
| Reference silicon | Cortex-M0+ (e.g. STM32L011) / RV32I (RISC-V no MMU) |
| Cubie code size | 40-60 KB no_std, no alloc, no panic-unwind |
| Memory footprint | 64 KB RAM peak |
| Throughput | ~500-2000 admit decisions/sec |
| Per-admit latency | 10-30 µs (no SIMD; pure popcount + compare) |
| Power envelope | <100 mW total |
| Compliance | IEC 62304 Class C; ISO 14971 risk management |
| What's stripped | HMAC (use PUF/eFuse for identity); full Belnap classify (bitboard projection only); thinking-style cold paths |
| Pricing class | per-device license, no per-call charging |

**Use cases:** pacemaker command channels, infusion pumps, surgical robots, IEC-62304-regulated implantables.

### Tier 2 — Industrial control / automotive ECU

| Spec | Value |
|---|---|
| Reference silicon | Cortex-M4F / RV32IMC |
| Cubie code size | 200-400 KB no_std, heapless heap |
| Memory footprint | 256 KB RAM |
| Throughput | ~5K-20K admit decisions/sec |
| Per-admit latency | 2-8 µs |
| Power envelope | 1-2 W |
| Compliance | AUTOSAR Adaptive Platform; ISO 26262 ASIL-D; IEC 61508 SIL-3 |
| Crypto | HMAC-SHA256 with hardware accelerator if present |
| Pricing class | per-ECU license OR per-vehicle license |

**Use cases:** ABS controllers, drive-by-wire, autonomous-vehicle perception trust gates, factory floor robotics.

### Tier 3 — Edge AI accelerator

| Spec | Value |
|---|---|
| Reference silicon | Cortex-A53/A72 (NVIDIA Jetson Orin Nano) **or** Intel Atom Edge x6000E (Elkhart Lake) |
| Cubie code size | 1-2 MB Linux ELF or no_std bare-metal |
| Memory footprint | 64 MB RAM |
| Throughput | ~50K-200K admit decisions/sec (NEON SIMD path) |
| Per-admit latency | ~0.5-2 µs |
| Power envelope | 5-15 W |
| Compliance | Common Criteria EAL 4 (PP-Computer Lite) |
| Crypto | aws-lc-rs (FIPS 140-3 if certified config) or in-tree HMAC |
| Use case | AI-camera SmartNIC, edge inference admission, retail computer-vision admit |
| Pricing class | per-device or per-edge-node |

### Tier 4 — Datacenter inference gateway (the most common WWT deployment shape)

| Spec | Value |
|---|---|
| Reference silicon | Intel Xeon Sapphire Rapids (8470+ class), Intel Emerald Rapids (8580+ class) |
| Cores allocated to Cubie | 4-16 (2-4 for admit gate, 2-4 for control plane, 2-4 for projector, 2-4 reserve) |
| Cubie binary size | ~30 MB (admit gate) + ~50 MB (control plane) |
| Memory footprint | 4-8 GB total (admit gate + control plane + ledger + projector) |
| Throughput | **500K-1M admit decisions/sec** (AVX-512 + AMX-int8 hot path) |
| Per-admit latency | **16.57 ns canonical** (cache-hot); ~50 ns full-path including dual-ledger write |
| Power envelope | ~80-150 W (incremental on the gateway host) |
| Network | 25-100 Gbps inbound; Cubie adds <1 ms p99 to gateway path |
| Compliance | SOC 2 Type II; ISO 27001 |
| Crypto | aws-lc-rs FIPS 140-3 module mode |
| Pricing class | per-GPU-month metered OR per-cluster license |
| Cost class | $20K-$80K/year depending on cluster size |

**This is the recommended WWT-channel default deployment.** Sapphire Rapids or Emerald Rapids reference host with 8-32 vCPU allocated; one container per admit gate; horizontal scale via additional containers.

### Tier 5 — SmartNIC IPU (line-rate admission)

| Spec | Value |
|---|---|
| Reference silicon | **Intel Mt Evans IPU** (E2100 series) **or** NVIDIA BlueField-3 |
| Throughput | **1M-3M admit decisions/sec at 200 Gbps line-rate** |
| Per-admit latency | <1 µs (in-NIC, before packet hits host) |
| Power envelope | included in NIC power budget (~25 W incremental) |
| Use case | bump-in-the-wire deployment in front of an existing AI gateway; pre-host filtering |
| Pricing class | per-NIC license |
| Cost class | $5K-$15K per NIC |

### Tier 6 — Confidential VM (Intel TDX)

| Spec | Value |
|---|---|
| Reference silicon | Intel Sapphire Rapids+ with TDX enabled, or Emerald Rapids |
| TDX configuration | 4th-gen Xeon Gold/Platinum required; TDX module enabled |
| Cubie binary | same as Tier 4 + TDX shim layer |
| Memory footprint | 8-16 GB (TDX has per-VM memory overhead) |
| Throughput | ~50K-200K admit decisions/sec (TDX page-table walks add overhead) |
| Per-admit latency | ~100-300 ns (TDX SEAMCALL overhead per cryptographic primitive) |
| Attestation | Intel ITA v2 composite attestation (sgx/tdx/sevsnp/nvgpu/gcpcs/tpm) |
| Compliance | EU AI Act Article 13/14/15 + GDPR + SOC 2 + Common Criteria EAL 5+ path |
| Crypto | aws-lc-rs FIPS attested via TDX MRTD |
| Pricing class | per-VM-month metered OR per-tenant license |

**Use case:** sovereign cloud, confidential AI workloads, regulated data plane that must prove silicon root of trust.

---

## Cross-tier byte-layout invariance

Same `CubeObjectV1` struct across every tier:

```rust
#[repr(C, align(64))]
pub struct CubeFlit64 {
    pub manifold: [u64; 4],       // 32B — topology + flags + metadata
    pub auth_hash: [u8; 32],      // 32B — HMAC-SHA256 or PUF projection
}

#[repr(C)]
pub struct CubeObjectV1 {
    pub flit: CubeFlit64,         // 64B
    pub capability_hmac: [u8; 32], // 32B
    pub lease_nonce: [u8; 32],    // 32B
    pub efuse_raw_bits: [u8; 32], // 32B (silicon identity binding)
    pub request_hash: u64,        // 8B
    pub payload_hash: u64,        // 8B
    pub adapter_id: u64,          // 8B
    pub sequence_id: u32,         // 4B
    pub reserved_padding: [u8; 4], // 4B
}  // total: 192 bytes; compile-time-asserted
```

**A pacemaker's 100KB binary and a Xeon AVX-512 build operate on bit-identical 192-byte request envelopes.** The hot-path optimizations vary by tier (popcount on M0+, VPTERNLOGQ on Xeon, AMX-int8 batched on SmartNIC); the data model is invariant.

---

## Per-tier hot-path operation cycle budgets

Reference numbers for the four hot-path operations (Q3 in the topology design doc):

| Operation | Xeon SPR (AVX-512/AMX) | Cortex-A53 (NEON) | Cortex-M0+ (no SIMD) |
|---|---|---|---|
| L2 equality (bitboard projection) | 1 cyc XOR+TEST | 1-2 cyc | 1 cyc (single u64) |
| L2 equality (full ternary) | 1 cyc VPTERNLOGQ+KORTESTQ | 3-5 cyc | 10-20 cyc (software u128) |
| Bloom probe | 1-2 cyc (VPGATHERDD) | 4-6 cyc | 12-18 cyc |
| HMAC fast-path | 4-6 cyc/block amortized (SHA-NI) | NEON SHA2 ext if present | strip — use PUF projection |
| Epoch derotation | 1-2 cyc (rotate intrinsic) | 2-3 cyc | 8-15 cyc |

---

## Reference deployment shapes (the appliance SKUs WWT can co-sell)

### "Cubie Edge Trust Appliance" — Tier 3 white-box

| Component | Spec | Vendor option |
|---|---|---|
| Chassis | 1U short-depth | Supermicro SYS-220H, Dell PowerEdge R250, HPE ProLiant DL20 Gen11 |
| CPU | Intel Atom C7000-series OR Xeon Bronze 3500 | — |
| RAM | 16-32 GB DDR5 | — |
| Storage | 240-480 GB NVMe | — |
| Network | 2x 25 Gbps SFP28 | — |
| Pre-load | Cubie v1.2.0 image, signed + attested | Centillion BSP |
| Price class | $5K-$15K | per appliance |

### "Cubie Datacenter Trust Gateway" — Tier 4 reference

| Component | Spec |
|---|---|
| Chassis | 1U or 2U; standard datacenter |
| CPU | Intel Xeon Gold 6526Y+ (16-cores, Sapphire Rapids) OR Xeon Platinum 8480+ |
| RAM | 64-128 GB DDR5 |
| Storage | 1.92 TB NVMe (dual for HA) |
| Network | 2x 100 Gbps QSFP28 |
| Pre-load | Cubie v1.2.0 datacenter image + Intel TDX prereqs |
| Price class | $30K-$80K per appliance |

### "Cubie SmartNIC Trust Module" — Tier 5 (line-rate)

| Component | Spec |
|---|---|
| Form factor | 200 Gbps SmartNIC (full-height, dual-slot PCIe Gen 5) |
| Silicon | Intel Mt Evans E2100 OR NVIDIA BlueField-3 |
| Cubie image | embedded in NIC firmware (signed) |
| Price class | $5K-$15K per NIC + per-NIC license |

### "Cubie TDX Confidential Module" — Tier 6 (sovereign cloud)

| Component | Spec |
|---|---|
| Chassis | Standard 2U Xeon SPR/EMR + TDX-enabled BIOS |
| Cubie | TDX-VM-deployed; Intel ITA v2 attestation |
| Price class | $50K-$100K per host + TDX licensing |

---

## Power + space envelope per cluster size

| Cluster GPUs | Recommended Cubie deployment | Incremental power | Incremental rack-U | Cost class |
|---|---|---|---|---|
| 8-16 H100 | Tier 4 software-only sidecar on existing gateway host | <50 W (just an extra container) | 0 | $20-40K license |
| 16-64 H100/H200 | Tier 4 dedicated appliance (1U) | ~150 W | 1U | $40-80K |
| 64-256 H100/MI300X | Tier 4 + Tier 5 (in-NIC) | ~300 W | 1U + per-NIC | $80K-$200K |
| 256+ (multi-rack) | Tier 4 + Tier 5 + Tier 6 (TDX for sovereign tenants) | ~500-1000 W | 2-4U | $200K-$500K+ |

These envelopes are TRIVIAL compared to the GPU power budget being protected (640 W per H100 SXM5; 750 W per H200 SXM5; 1000 W+ per Blackwell B100/B200). Cubie's incremental power < 0.1% of the protected envelope.

---

## Cross-tier deployment-mode matrix

| Mode | Tier 1-2 | Tier 3 | Tier 4 | Tier 5 | Tier 6 |
|---|---|---|---|---|---|
| Sidecar to existing gateway | n/a | ✅ | ✅ (default) | n/a | ✅ |
| In-process library | ✅ | ✅ | ✅ | n/a | ✅ |
| Bump-in-the-wire | n/a | ✅ | ✅ | ✅ (best fit) | ✅ |
| Reverse-proxy front | n/a | ✅ | ✅ | n/a | ✅ |

---

## Compliance certification path per tier

| Tier | Initial cert | Year-1 cert | Year-2 cert |
|---|---|---|---|
| 1 (pacemaker) | IEC 62304 Class C | FDA 510(k) | IEC 60601-1-2 EMC |
| 2 (industrial) | ISO 26262 ASIL-D | AUTOSAR Adaptive | IEC 61508 SIL-3 |
| 3 (edge AI) | SOC 2 Type II | ISO 27001 | Common Criteria EAL 4 |
| 4 (datacenter) | SOC 2 Type II | ISO 27001 | EU AI Act conformity |
| 5 (SmartNIC) | + NIC-vendor cert | — | — |
| 6 (TDX) | + Intel TDX validation | + Common Criteria EAL 5+ | + FedRAMP High readiness |

---

## What WWT customers should buy first

**Default starting point: Tier 4 software-only sidecar.** Lowest risk, lowest deployment cost, fastest ROI proof. 30-day pilot → if pass, expand to dedicated appliance + per-NIC SmartNIC for higher throughput.

**If sovereign / confidential-compute is required:** Tier 6 from day 1, deployed inside Intel TDX-enabled host. Higher initial cost but proves the regulatory + IP-isolation story that other detectors can't.

**If edge-deployed:** Tier 3 white-box per cell site / per branch office. Standardize on Intel Atom Edge for the lowest-power reference; Cortex-A53 for ARM-preference customers.
