I work in the industry Explain the basics

What actually predicts a data-center rejection: 1258 decisions, 30 surviving signals

Raymond Xu

May 19, 2026 · 6 min read

We just shipped what we believe is the largest publicly-cited corpus of US data-center entitlement decisions: 1258 outcomes across 42 states, 2022–2026, every row carrying a HEAD-verified primary-source URL. With a corpus this size we can actually answer the question consultants get paid $400/hour to guess at: what predicts a community fight? The answer disagrees with a surprising amount of conventional wisdom — and the strongest non-policy signal that survived statistical shrinkage is one no industry analyst would have picked manually.

1258 outcomes, 42 states, $87 of API spend

The corpus expanded 5.3× over the prior public version (236 → 1258 outcomes) through nine waves of ingest. Each row was extracted by DeepSeek V4 Flash from a primary-source news article — county commission minutes, regional newspapers, DC industry trades — discovered via Anthropic web_search and (in the final waves) Serper. Every URL was HEAD-checked before merge; the few that 404'd were dropped. Total API cost: $87, vs an Opus-4.7 baseline estimate of $300.

Outcomes tagged: 1258
Resistance rate: 28%
States covered: 43
Variables modeled: 22

Each outcome was tagged against 22 variables: moratorium status, organized opposition, PILOT offered, existing industrial zoning, pre-filing engagement, and 17 others. Three of those variables — prior_denial_50mi_36mo, prior_approvals_50mi_36mo_count, hyperscaler_precedent_50mi — were then recomputed deterministically from county centroids using a 50-mile radius and 36-month window. That enrichment found 47–85% of the model-guessed values were wrong; the deterministic values went into the fit.

The 30 signals that survived lasso shrinkage

On top of the 22 hand-picked variables, we joined each outcome to its county's full Census ACS 5-year profile — 124 features covering income, education, race, age, industry employment, housing, broadband, migration, language, disability, and veteran status. The combined 70-feature matrix (after dropping collinear raw count totals) went into an L1 lasso logistic regression. Forty features shrank to zero. The thirty that survived are the ones actually doing predictive work.

30 of 70 features non-zero · CV AUC 0.82 · 5-fold lasso, z-scaledresistance protective zeroed

Plain English: we tested 70 possible predictors; lasso kept 30 and pushed weak ones to zero. CV AUC 0.82 means that across five train/test splits, a rejected project ranked riskier than an approved one about 82% of the time. Z-scaled means every input was put on the same scale before comparing bar lengths.

↑ pushes toward resistance

moratorium active

+0.87

organized opposition

+0.45

negative press 12mo

+0.26

water intensive cooling

+0.19

nearby prior denial

+0.10

% white residents

+0.08

% arts/food jobs

+0.07

% veterans

+0.07

school proximity miles

+0.04

% public admin jobs

+0.01

% edu/health jobs

+0.01

median age

+0.01

hyperscaler nearby

+0.00

↓ pushes toward approval

PILOT tax deal offered

-0.53

existing industrial zoning

-0.36

% transport/warehouse/util jobs

-0.22

% moved from other county

-0.20

new substation needed

-0.13

% computer homes

-0.13

% retail jobs

-0.13

% Black residents

-0.13

% some college

-0.13

county land area

-0.12

750+ MW campus

-0.11

% finance jobs

-0.09

pre filing engagement

-0.09

% broadband homes

-0.05

state preemption strong

-0.04

nearby homes density

-0.04

median rent

-0.02

40 features shrunk to zero by lasso (no predictive signal): nondisclosure disclosed · groundwater stress high · county fiscal stress high · battlefield or historic proximity · rural pristine setting · foreign or shell applicant · state legislation pending · nearby approvals count · people/sqmi · median home value · income inequality · median income · income/person · % ag jobs · % Asian residents · % BA+ adults · % below poverty · % construction jobs · % disabled residents · % drive alone · % foreign-born · % grad-degree adults · % $100k+ households · % Hispanic residents · % homes built 2010+ · % pre-1950 homes · % information jobs · % 60+ min commute · % <$25k households · % manufacturing jobs · % Native residents · % no vehicle · % out of labor force · % owner-occupied · % professional jobs · % public transit · % <15 min commute · % unemployed · % vacant homes · % work from home

The top of the list looks like conventional wisdom — active moratorium (β=2.73), organized opposition (β=0.94), PILOT offered (β=-1.54), existing industrial zoning (β=-0.88). The bottom is where it gets interesting: percentage of county employment in transportation and utilities (β=-0.22, protective), percentage of county residents who moved in from a different county (β=-0.20, protective), and an arts-and-food employment share (β=+0.07, resistance-pushing). None of these came from the hand-picked schema. They came from the data.

Half the conventional wisdom is statistical noise

Several variables that have been in industry checklists for years turn out to have essentially no predictive power once the corpus is large enough to test them. Battlefield or historic district within 2 miles: previously claimed at 1.37× resistance odds, now sits at β=-0.08 (null effect). USGS high-groundwater stress: claimed 1.08×, now β=-0.32 in the opposite direction. Foreign-capital or shell-entity applicant: claimed 1.49×, now β=+0.14 (real but small). NDA/codename pattern: claimed 1.08×, now β=-0.12 (slightly protective, opposite of original).

Variable	Old β (N=225)	New β (N=1006)	What changed
Moratorium active at filing	+1.40	+2.73	Almost doubled. Moratoriums are decisively the strongest resistance signal.
Water-intensive cooling proposed	-0.23	+0.76	Sign flipped — communities now resist water-cooled proposals.
Pre-filing community engagement	-1.15	-0.41	Overstated 3×. Still helpful, but not the silver bullet.
Hyperscaler precedent within 50 mi	-0.25	+0.05	Was overstated. Communities don't approve because a Microsoft DC is nearby.
New substation required	-0.02	-0.50	Was claimed null. Actually moderately protective.
Existing industrial zoning	-0.54	-0.88	Stronger than originally measured.
Battlefield / historic district nearby	+0.31	-0.08	Was wrong-direction noise. Null effect.
Groundwater stress (USGS)	+0.07	-0.32	Sign flipped. Real effect is slightly protective, not resistance-pushing.
Rural pristine setting	+0.47	+0.07	Much weaker. Original overcredited landscape framing.
≥750 MW gigawatt-scale	-0.30	-0.59	Bigger projects are MORE likely approved (PILOT + state cover).
PILOT / tax abatement offered	-1.23	-1.54	Remains the strongest protective tool by 2×.

Two of the most-cited 'silver bullet' interventions in industry advice are overstated by 3× or more. Pre-filing community engagement was published as β=-1.15 (3.1× more likely approved); the refit on 1258 outcomes gives β=-0.41 (1.5× more likely). Hyperscaler precedent within 50 miles was claimed at β=-0.25 (1.3× more likely approved); after correct deterministic measurement, it's β=+0.05 — null effect. Communities don't see 'a Microsoft DC two counties over' as license to approve yours.

Most striking: water-intensive cooling. The original published coefficient was -0.23 (slightly protective). The refit on the 5×-larger corpus gives +0.76 — a sign flip. Communities have shifted hard on water use over the 2022–2026 window. Filings that propose evaporative or chilled-water cooling are now meaningfully more likely to draw resistance than air-cooled designs.

The demographic signals no one was tracking

Six Census features survived shrinkage and weren't anywhere in the original schema. Older counties (median age higher) are more resistant. Whiter counties are more resistant; Blacker counties are less. More-educated counties (higher bachelor's-or-above share) are more resistant — a counterintuitive finding that's consistent with literature on educated-NIMBY mobilization. Counties with high arts-and-food employment shares (the 'amenity economy' — Sedona, Hudson Valley, Asheville) are more resistant.

% county employment in transport/utilities-0.22

Strongest brand-new signal. Counties where the labor market already absorbs heavy-industry workforce approve readily.

% recent in-migrants from other counties-0.20

Population-churn counties accept. NIMBY-ism is a function of stable communities.

% white population share+0.10

Whiter counties resist more. Bachelor's-plus share correlates the same direction.

% Black population share-0.13

Less resistance, opposite of white-share signal.

Median age+0.16

Older counties resist more — demographic component of NIMBY.

% bachelor's or higher+0.16

Counter-intuitive. Consistent with 'educated NIMBY' literature.

% arts / food employment+0.07

Cultural-amenity economies (Sedona, Asheville-type) resist more.

% veteran population+0.07

Veteran-heavy counties resist more, independently of age.

The strongest new signal is percentage of county workforce in transportation and utilities — every standard deviation above mean cuts resistance odds by ~25%. Counties whose labor market already absorbs heavy-industry workforce treat data-center siting as more of the same. The second strongest new signal is the migration variable: counties with higher inflow of recent in-county-movers (population churn) are dramatically less resistant. NIMBY-ism is a function of stable communities, not specific opposition to data centers.

Three practical implications for siting + outreach

First, the demographic risk profile of a candidate county is now part of the underwriting. A site in a slow-growth, older, whiter, more-educated county carries hidden risk premium beyond its explicit policy variables. Sites in transport/utilities-heavy or recent-in-migration counties carry hidden discount. The 22-variable schema didn't capture this because none of the original variables were demographic; the Census join exposes it.

Second, PILOT remains the single most leveraged tool in the toolkit. At β=-1.54 it shifts approval odds by a factor of 4.7 — bigger than community engagement, hyperscaler precedent, and existing industrial zoning combined. If you can structure a material PILOT into the filing, it dominates almost every other intervention.

Third, the prior published model was overconfident. The original leave-one-out AUC of 0.96 was an artifact of overfitting on 225 rows; the honest AUC on 1006 non-pending decisions is 0.805. That's a real model — it correctly ranks resistance probability — but it's not the 96% accuracy the smaller corpus seemed to promise. Practitioners should treat resistance scoring as a structured prior, not a deterministic verdict.

How it works under the hood

Ingestion: DeepSeek V4 Flash classifier reads article body text (HTML-stripped, ~4KB cap) and emits structured JSON with the 22 variables tagged. Discovery is two-stage — Anthropic web_search via Haiku 4.5 for waves 1–8, and Serper API (Google search) for wave 9 at one-tenth the cost. Every URL HEAD-checked; deduplication on normalized project_name + outcome key. The full pipeline lives in scripts/data-center-resistance/.

Analysis: Census ACS 5-year (2022) pulled via the free Census API — 124 features × 3144 counties. The county-level join uses the US Census Bureau 2020 gazetteer for FIPS centroids. Univariate Pearson r is computed against the binary 'resistant' outcome (denied / withdrawn / voided / moratorium-blocked / delayed >12mo vs approved). Lasso fit uses proximal-gradient L1-penalized logistic regression with 5-fold CV at standardized scale. Ridge fit for the published coefficients uses the L2-regularized fit with bootstrap standard errors (300 resamples).

Primary sources + reproducibility

Get started

Type your site in. See the de-rate.

The calculator returns an effective MW number, the binding rule, and a $/MW-yr net value as you type.

Book a call How it works

I work in the industry Explain the basics

← Blog

What actually predicts a data-center rejection: 1258 decisions, 30 surviving signals

Raymond Xu

May 19, 2026 · 6 min read

1258 outcomes, 42 states, $87 of API spend

Outcomes tagged: 1258
Resistance rate: 28%
States covered: 43
Variables modeled: 22

The 30 signals that survived lasso shrinkage

30 of 70 features non-zero · CV AUC 0.82 · 5-fold lasso, z-scaledresistance protective zeroed

↑ pushes toward resistance

moratorium active

+0.87

organized opposition

+0.45

negative press 12mo

+0.26

water intensive cooling

+0.19

nearby prior denial

+0.10

% white residents

+0.08

% arts/food jobs

+0.07

% veterans

+0.07

school proximity miles

+0.04

% public admin jobs

+0.01

% edu/health jobs

+0.01

median age

+0.01

hyperscaler nearby

+0.00

↓ pushes toward approval

PILOT tax deal offered

-0.53

existing industrial zoning

-0.36

% transport/warehouse/util jobs

-0.22

% moved from other county

-0.20

new substation needed

-0.13

% computer homes

-0.13

% retail jobs

-0.13

% Black residents

-0.13

% some college

-0.13

county land area

-0.12

750+ MW campus

-0.11

% finance jobs

-0.09

pre filing engagement

-0.09

% broadband homes

-0.05

state preemption strong

-0.04

nearby homes density

-0.04

median rent

-0.02

Half the conventional wisdom is statistical noise

Variable	Old β (N=225)	New β (N=1006)	What changed
Moratorium active at filing	+1.40	+2.73	Almost doubled. Moratoriums are decisively the strongest resistance signal.
Water-intensive cooling proposed	-0.23	+0.76	Sign flipped — communities now resist water-cooled proposals.
Pre-filing community engagement	-1.15	-0.41	Overstated 3×. Still helpful, but not the silver bullet.
Hyperscaler precedent within 50 mi	-0.25	+0.05	Was overstated. Communities don't approve because a Microsoft DC is nearby.
New substation required	-0.02	-0.50	Was claimed null. Actually moderately protective.
Existing industrial zoning	-0.54	-0.88	Stronger than originally measured.
Battlefield / historic district nearby	+0.31	-0.08	Was wrong-direction noise. Null effect.
Groundwater stress (USGS)	+0.07	-0.32	Sign flipped. Real effect is slightly protective, not resistance-pushing.
Rural pristine setting	+0.47	+0.07	Much weaker. Original overcredited landscape framing.
≥750 MW gigawatt-scale	-0.30	-0.59	Bigger projects are MORE likely approved (PILOT + state cover).
PILOT / tax abatement offered	-1.23	-1.54	Remains the strongest protective tool by 2×.

The demographic signals no one was tracking

% county employment in transport/utilities-0.22

Strongest brand-new signal. Counties where the labor market already absorbs heavy-industry workforce approve readily.

% recent in-migrants from other counties-0.20

Population-churn counties accept. NIMBY-ism is a function of stable communities.

% white population share+0.10

Whiter counties resist more. Bachelor's-plus share correlates the same direction.

% Black population share-0.13

Less resistance, opposite of white-share signal.

Median age+0.16

Older counties resist more — demographic component of NIMBY.

% bachelor's or higher+0.16

Counter-intuitive. Consistent with 'educated NIMBY' literature.

% arts / food employment+0.07

Cultural-amenity economies (Sedona, Asheville-type) resist more.

% veteran population+0.07

Veteran-heavy counties resist more, independently of age.

Three practical implications for siting + outreach

How it works under the hood

Primary sources + reproducibility

Get started

Type your site in. See the de-rate.

The calculator returns an effective MW number, the binding rule, and a $/MW-yr net value as you type.

Book a call How it works