Calculator · Internal

Inference economics model.

Capex, opex, throughput, margin, and payback for a GPU inference fleet, parametrized by model size, utilization, electricity price, and facility cost. Tweak inputs on the left to feel the curve.

20 MW · 19,040 × H100 80GB · Llama 4 Maverick 400B/17B MoE (INT4) · 6,346 instances (3 GPUs each) · batch 64

GPU

Model

Inputs

Fleet size20 MW

In-flight batch size64

Concurrent requests per instance. vLLM defaults max_num_seqs=1024, but real production sees 32-128 effective batch — KV cache caps it. b* (compute saturation) for this combo: 148.

Duty cycle75%

Fraction of time GPUs have batches to run. Cloud providers: 30-60% avg. 24/7 saturated: 90%+

Facility capex5M/MW

Industry avg $10–12M. Hyperscaler AI: $20–25M

H100 80GB cost18000

Default secondary $18,000. Slide to override.

Avg electricity price15/MWh

ERCOT avg ~$27. Negative = paid to consume

Token price discount30%

Discount vs current market rate for this model

Ops cost600K/MW/yr

Labor, networking, maintenance, insurance

GPU lifespan3 yrs

PUE1.2

1.0 = perfect. Air-cooled inference GPUs typically 1.2–1.4

Total capex

$442.72M

$100.00M facility + $342.72M GPUs

Annual revenue

$288.22M

@ $0.210/M tokens

Annual profit

$149.35M

51.8% margin

Payback

3.0 yrs

Strong

Throughput physics

Memory-bound (batch=1)1182 tok/s

Compute-bound (batch=∞)174618 tok/s

Saturation batch (b*)148

At batch=64 (with overhead)9144 tok/s/instance

Annual cost breakdown

GPU depreciation$114.24M/yr

Operations$12.00M/yr

Electricity (2% of total)$2.63M/yr

Facility depreciation$10.00M/yr

Total annual cost$138.87M/yr

Key insights

Batch curve. At batch=1 you get 1182 tok/s/instance (memory-bound). Compute-bound ceiling is 174618 tok/s/instance. The transition (b*) sits at batch ≈ 148. You're running at batch=64, which delivers 9144 tok/s/instance after real-world overhead (efficiency ~30% vs theoretical, calibrated to vLLM benchmarks).

Electricity is 2% of costs. GPU depreciation dominates. The electricity arbitrage helps but isn't the main driver.

Cost per million tokens: $0.1012. Selling at $0.210/M. That's a 108% markup.

Capex comparison. A 20 MW fleet costs $442.72M vs $500.00M at hyperscaler rates ($25M/MW). That's $57.28M in savings, or 11% cheaper.

Throughput. 1372.5T tokens/year at 75% duty cycle across 6,346 instances of Llama 4 Maverick 400B/17B MoE (INT4).

Throughput model: per-instance aggregate t/s = T_compute · b/(b+b*) · η · 1/√N, where T_compute = (GPU_TFLOPS · N) / (2·active_params), b* = T_compute / T_memory, T_memory = (GPU_BW · N) / (active_params · 0.5 bytes), η = 0.3 (real-world overhead — KV reads, sampling, dequant, framework, calibrated against vLLM Llama-70B INT4 H100 ≈ 2,200 tok/s @ b=128), N = ceil(model_vram / gpu_vram). Assumes continuous batching + INT4 quant + tensor-parallel sharding when N>1. 80% of IT power to GPUs. Facility depreciated 10y. Excludes financing, tax, land. Real production batch sizes are KV-cache-limited; observed effective batch is typically 32–128 even with max_num_seqs set higher.

Facility capex benchmarks

What to put in the facility-capex slider

Four realistic build profiles, $0.5M/MW (crypto shed) to $30M/MW (hyperscale AI). Each tier shows the cost band and 3–4 cited comps so you can pick a number that matches the actual deployment you're modeling, not a hand-waved average.

Crypto-grade mining shed (minimal viable)

$0.5–2M/MW

Industrial sheds built for ASICs. Single grid feed, no UPS, evaporative or pure outdoor air cooling. Not suitable for sustained AI inference without retrofit, but the land + shell + interconnect are sunk costs you can recover.

Pre-conversion mining facility baseline~$0.5M/MWCleanSpark CEO via BitcoinWorld (2026)
Riot Whinstone (Rockdale, TX) — 700 MWPurpose-built mining campusOperator disclosures
Crypto → AI conversion delta (CEO commentary)+$10–11M/MWCleanSpark / BitcoinWorld

Air-cooled inference shed, TX greenfield

$3–6M/MW

Lower-tier purpose-built, single grid feed, CRAH-based air cooling at 25–35 kW/rack, basic gen backup, modest UPS. Inference SLAs (99–99.9%) tolerate single-feed risk. This is the realistic tier for an air-cooled A100 / H100 deployment in West Texas.

TX industrial energy cost$0.072/kWhelectricchoice.com (2026)
Standard shell-and-core lower bound$5–8M/MWTrueLook construction breakdown
Texas / Mississippi / Louisiana cost advantage5 states = 74% of YTD spendPrograms.com data center stats (2026)
Lancium Clean Campus (Abilene) — flex-load wedgeCo-located w/ wind, BYOP economicsLancium / Crusoe newsroom

Standard enterprise Tier-II/III

$10–13M/MW

Conventional colo / wholesale build. N+1 to 2N power redundancy, full UPS, water-cooled chillers, certified to Uptime Tier III. The default reference build for an Equinix / Digital Realty / CyrusOne wholesale deal.

2026 industry forecast (standard enterprise)$11.3M/MWiRecruit Construction Trends (2026, +6% over 2025)
Standard shell-and-core typical range$10–12M/MWConstruct Elements / Cushman & Wakefield Cost Guide
CleanSpark Texas IT capacity valuation~$13M/MWJPMorgan via The Block (2026)
Northern VA wholesale lease$103–143/kW/monthCushman & Wakefield NoVA outlook

Hyperscale AI campus (liquid-cooled, GB200-ready)

$20–30M/MW

Direct-to-chip liquid cooling, 80–130 kW/rack rack density, dedicated substation, 2N power, multi-100 MW campus. The build profile for a Crusoe / Microsoft / Meta AI factory designed to host GB200 NVL72 racks.

AI-optimized facility forecast$20M+/MWiRecruit / Construct Elements (2026)
Crusoe × Blue Owl × Primary Digital JV (Abilene Phase 1)$3.4B / 206 MW = ~$16.5M/MWCrusoe / Lancium press release
Headline hyperscale GW campuses$45–55B / GWCushman & Wakefield Americas DC Update H2 2025
IREN × Microsoft 200 MW liquid-cooled$9.7B / 5 yr (revenue, not capex)Hut 8 / IREN deal coverage

Numbers are facility capex only — building shell, MEP, power infrastructure, cooling, fire/security, gen backup. GPU hardware, interconnect upgrades, and land are separate. The hyperscale band excludes GPUs (GB200 racks add ~$3M/rack on top, financed separately). For an air-cooled A100 inference build in West Texas, the $3–6M/MW band is the right starting point; drop to $1.5–3M/MW if you're acquiring a converted crypto shell with usable power delivery already in place.