Research note · 2026-05-18

AMD chips at 1/2 capex — am I missing something?

The hook is real on the surface, in two directions. MI300X rents for $1.50/GPU-hr at TensorWave while H100 sits at $2.99 on Lambda — literally half. And AMD chips are clickable on-demand today at 5+ neoclouds, while NVIDIA H100 is 36–52 weeks direct purchase and B200 is allocated through H2 2027 to Microsoft / Google / Meta / Amazon. But load in three years of power, cooling, and ops and MI300X ends up 3% more expensive than H100 at ownership — the rental discount is mostly neocloud silicon depreciation, not a per-node TCO advantage. Where AMD actually wins is HBM capacity per dollar: MI300X delivers HBM at $498/GB vs H100's $1,162/GB. That 2.3× edge is what makes Llama 3.1 405B fit on a single node instead of two. This page walks through what's real, what's marketing, and where the workload line actually falls.

The surface claim

The 1/2 capex hook is real — at five neoclouds, on one chip

MI300X rents for $1.50/GPU-hr at TensorWave, $1.71 on Crusoe spot, $1.85 at Vultr. H100 SXM rents for $2.99 at Lambda and $3.49 at RunPod. The discount lives at five providers — TensorWave, Hot Aisle, Vultr, Crusoe spot, AMD Dev Cloud — and disappears at Azure ($6/hr) and Oracle ($6/hr) where MI300X is listed at H200 prices.

Cheapest MI300X

$1.50 / GPU-hr

TensorWave, 8-GPU on-demand

Cheapest H100 SXM

$2.99 / GPU-hr

Lambda, on-demand

Spread at the cheapest tier

−50%

MI300X discount vs cheapest H100 — the LinkedIn-post claim, verified

The SemiAnalysis breakeven

SemiAnalysis InferenceMAX finds MI300X needs to rent under $1.90/GPU-hr to win $/M tokens on Llama 3 70B chat against H200 + TensorRT-LLM, and MI325X under $2.50/GPU-hr. TensorWave ($1.50), Crusoe spot ($1.71) and Vultr ($1.85) clear the MI300X bar; TensorWave MI325X at $1.95 clears the MI325X bar. Median MI300X is $1.99 vs H100 median $3.49 (43% lower) — but median is pulled up by Azure / Oracle at $6/hr, where the AMD discount has been competed away.

The other surface claim

And AMD chips are massively easier to actually get

The rental discount is one half of the story. The other half — bigger, structurally — is supply. NVIDIA H100 is 36–52 weeks direct purchase, H200 reserved pools are sold out, and B200 is allocated through H2 2027 to Microsoft / Google / Meta / Amazon. AMD MI300X / MI325X / MI355X are clickable on-demand today at 5+ neoclouds. Same TSMC CoWoS bottleneck; different allocation policy.

AMD on-demand today

3 chips / 6

MI300X, MI325X, MI355X clickable now at 5+ neoclouds

NVIDIA H100 direct

36–52 weeks

Direct purchase lead time, hyperscaler-gated

B200 allocation

H2 2027

Blackwell pre-committed to Microsoft / Google / Meta / Amazon

Why this gap exists

Both AMD and NVIDIA run through the same TSMC CoWoS packaging bottleneck for HBM-stacked accelerators — capacity is fully allocated through mid-2027 (Spheron). What's different is allocation policy: Microsoft, Google, Meta, and Amazon placed multi-billion-dollar forward orders for Blackwell in 2025 and consumed most of NVIDIA's 2026-27 supply (Lyceum). AMD has no equivalent pre-commitment regime — Meta's 6 GW MI450 deal lands H2 2026, but MI300X / MI325X are not blocked by it. That's the structural reason a startup can swipe a card for 8 MI300X at Crusoe today but waits 9+ months for an H200.

Sources: lead times via Spheron GPU shortage 2026 brief; MI300X availability at Crusoe; MI325X at TensorWave; MI355X via Phoronix and Tom's Hardware on TensorWave's 8,192-GPU cluster. NVIDIA allocation reporting via Lyceum and Spheron, summarizing SemiAnalysis / Bloomberg coverage.

What you're missing — part 1

The 50% rental discount becomes a 3% TCO premium

Silicon capex is one column of the bar. Load in three years of power, the cooling stack the rack actually requires, and a flat $150K/node-yr ops envelope, and the five configurations land inside a $137K spread — about 16% of the average. MI300X comes out 3% more expensive than H100 at ownership. The AMD pitch is not lower TCO. It's much more HBM per TCO dollar.

Silicon capex

Power (3 yr)

Cooling delta (3 yr)

Ops & labor (3 yr)

Best $/HBM-GB

the rental hook

−50%

MI300X rental vs H100 ($1.50 vs $2.99) — surface hook

the TCO reality

+3%

MI300X TCO vs H100 over 3 yr ($765K vs $744K) — the discount disappears at ownership

the right axis

−57%

MI300X $/HBM-GB vs H100 ($498 vs $1,162) — the real AMD edge

3-yr window, 85% duty cycle, $0.07/kWh blended industrial. Silicon capex from channel listings (Supermicro AS-8125GS-TNMR2 for AMD; HGX list for NVIDIA). Cooling delta scales $50/GPU/yr per 100W over a 700W baseline. Ops fixed at $150K/node/yr. TCO totals: MI300X $765K, H100 $744K, H200 $789K, MI355X $851K, B200 $881K — $137K spread across all five.

What you're missing — part 2

Where AMD actually wins is HBM per dollar — not per hour

The 1/2 capex framing flattens the wrong axis. The right axis is HBM capacity per dollar of rental, where MI300X / MI325X / MI355X each sit ahead of every NVIDIA part on the page. That capacity is what makes Llama 3.1 405B fit on a single node, not two — and it's what drives the off-grid economics deeper in the page.

HBM per dollar — the AMD value frontier

Silicon list price against HBM capacity. Up and to the left is better. AMD's three Instinct parts all sit above every NVIDIA isoprice contour — MI300X delivers HBM at $94/GB of silicon list, compared to $375/GB for H100. AMD mean is $106/GB vs NVIDIA mean $291/GB — a 64% edge that compounds at every level above silicon.

Silicon unit price is channel / street, May 2026: MI300X $18K, MI325X $22K, MI355X $40K, H100 $30K, H200 $35K, B200 $45K. Dashed contours show equal-$/HBM-GB curves at the silicon level. AMD's three parts all sit above the $150/GB contour; NVIDIA's sit at $250+.

Spec head-to-head — AMD wins capacity & bandwidth, NVIDIA wins compute (until MI355X)

May 2026 datasheet head-to-head. MI300X has 2.4× the HBM of an H100 and 58% more bandwidth, but lost on FP8 compute by ~32%. MI355X is the first AMD part that closes the compute gap on paper — 2.2× B200 FP8 and FP4 (peak), at +40% TDP.

HBM capacity

higher is better

MI300X 2.4× H100 · MI355X 3.6× H100

HBM bandwidth

higher is better

MI300X +58% vs H100 · MI355X +4% vs B200

FP8 dense compute

higher is better

MI355X 2.2× B200 · MI300X 32% under H100

FP4 dense (native)

higher is better

MI355X 2.2× B200 · only Blackwell + MI355X support FP4

TDP

higher is worse

MI300X drop-in air-cool with H100 · MI355X needs liquid

Unit price (channel)

higher is worse

MI300X $18K vs H100 $30K · the silicon side of the hook

The capacity arithmetic that actually matters

Llama 3.1 405B in FP16 (~810 GB of weights) fits on a single 8-way MI300X node. The equivalent H100 deployment needs 16 GPUs (two HGX nodes plus cross-node NVLink / InfiniBand) — twice the chassis, twice the rack, twice the network.
DeepSeek-R1 671B in FP16 (~1.3 TB) fits on a single 8-way MI355X node (2.3 TB HBM). On B200 (1.44 TB / node) it barely fits; H100 needs 2× nodes.
Bandwidth headroom scales the memory-bound throughput floor at batch=1 by the same ratio. MI300X delivers ~58% higher single-stream tok/s than H100 for the same model, before any AITER kernel tuning.

Sources: AMD product pages for MI300X, MI325X, MI355X; NVIDIA H100 / H200 / B200 datasheets; channel pricing via IntuitionLabs and Silicon Analysts. Dense numbers, no sparsity.

Three years from launch to second-source

MI300X shipped in late 2023. By late 2025, AMD had silicon (MI300X → MI325X → MI355X), software (ROCm 7 + AITER), and marquee buyers (Meta serving Llama 3.1 405B, OpenAI's 6 GW MI450 deal). Three tracks, one trajectory.

Workload-by-workload, where AMD beats NVIDIA on $/M tokens

AMD wins when HBM capacity or bandwidth is the binding constraint (large dense models, single-stream latency-bound chat). NVIDIA wins when software composability binds (MoE expert-parallel, disaggregated prefill/decode, FP4 with TensorRT-LLM). MI355X is the first AMD part that closes the compute gap on paper — early benchmarks land within 10% of B200 on gpt-oss-class workloads.

Llama 3.1 405B FP16, single 8-GPU node

Dense, capacity-binding

AMD wins

Why it wins

1×8 MI300X fits the model; ≥$/Mtok competitive when rented <$1.99/GPU-hr

What the loser brings

H100 needs TP=16 (2 nodes); H200 needs TP=8 but tighter HBM

AMD chips at 1/2 capex — am I missing something?

The 1/2 capex hook is real — at five neoclouds, on one chip

The SemiAnalysis breakeven

And AMD chips are massively easier to actually get

Why this gap exists

The 50% rental discount becomes a 3% TCO premium

Where AMD actually wins is HBM per dollar — not per hour

HBM per dollar — the AMD value frontier

Spec head-to-head — AMD wins capacity & bandwidth, NVIDIA wins compute (until MI355X)

The capacity arithmetic that actually matters

Three years from launch to second-source

Workload-by-workload, where AMD beats NVIDIA on $/M tokens

Llama 3.1 405B FP16, single 8-GPU node

Llama 3 70B chat, 1k in / 1k out, low latency

DeepSeek-R1 671B MoE, batched serving

gpt-oss-120B, FP4 disaggregated

<30B dense, mid-batch latency

MLPerf Inference v5.0, Llama-3.1-70B/405B

Where AMD wins is a quadrant, not a vibe

Read this as the answer to "am I missing something?"

The compatibility tax shrank from 2–3× to 10–20% — for the right workloads

ROCm stack parity vs the NVIDIA equivalent

MI300X is the cleanest off-grid drop-in

What MI300X/MI325X/MI355X do to off-grid power budgets

GPUs deployed per MW (PUE 1.2, 80% IT → GPU)

Total HBM capacity per MW (TB)

Llama 3.1 405B FP16 instances on a 100 MW off-grid campus

Cooling stack required → facility capex band

Off-grid pairing call

Inference payoff calculator, AMD-defaulted

The buyer list shifted decisively in 2025–2026

Meta

Microsoft Azure

Oracle OCI

OpenAI

Meta (Feb 2026)

Crusoe / TensorWave / Vultr

What to do with this

AMD chips at 1/2 capex — am I missing something?

The 1/2 capex hook is real — at five neoclouds, on one chip

The SemiAnalysis breakeven

And AMD chips are massively easier to actually get

Why this gap exists

The 50% rental discount becomes a 3% TCO premium

Where AMD actually wins is HBM per dollar — not per hour

HBM per dollar — the AMD value frontier

Spec head-to-head — AMD wins capacity & bandwidth, NVIDIA wins compute (until MI355X)

The capacity arithmetic that actually matters

Three years from launch to second-source

Workload-by-workload, where AMD beats NVIDIA on $/M tokens

Llama 3.1 405B FP16, single 8-GPU node

Llama 3 70B chat, 1k in / 1k out, low latency

DeepSeek-R1 671B MoE, batched serving

gpt-oss-120B, FP4 disaggregated

<30B dense, mid-batch latency

MLPerf Inference v5.0, Llama-3.1-70B/405B

Where AMD wins is a quadrant, not a vibe

Read this as the answer to "am I missing something?"

The compatibility tax shrank from 2–3× to 10–20% — for the right workloads

ROCm stack parity vs the NVIDIA equivalent

MI300X is the cleanest off-grid drop-in

What MI300X/MI325X/MI355X do to off-grid power budgets

GPUs deployed per MW (PUE 1.2, 80% IT → GPU)

Total HBM capacity per MW (TB)

Llama 3.1 405B FP16 instances on a 100 MW off-grid campus

Cooling stack required → facility capex band

Off-grid pairing call

Inference payoff calculator, AMD-defaulted

The buyer list shifted decisively in 2025–2026

Meta

Microsoft Azure

Oracle OCI

OpenAI

Meta (Feb 2026)

Crusoe / TensorWave / Vultr

What to do with this