What actually predicts a data-center rejection: 1258 decisions, 30 surviving signals
Raymond Xu
May 19, 2026 · 6 min read
When a community fights a proposed data center, the same forces are usually at work — but for years, the industry has been guessing at which ones actually matter. We built a database of every US data-center proposal that was approved, denied, or withdrawn between 2022 and 2026 (1258 decisions in 42 states), then ran statistical tests to find out what actually predicts whether a community pushes back. Half of the conventional wisdom turns out to be wrong, and a few things nobody talks about turn out to matter more than anything else.
1258 decisions, every one cited to a news article
We expanded our public database 5.3× over the prior version by reading thousands of local news articles. An AI model (DeepSeek V4 Flash) reads each article and extracts the key facts: what project was being proposed, who the developer was, how big it was, what the community board voted, and what the article tells us about local conditions. Every fact links back to its source article; every URL was tested to make sure it still loads before being added to the database. The whole pipeline cost $87 in AI compute — about 30 minutes of one consultant's billable rate.
- Outcomes tagged
- 1258
- Resistance rate
- 28%
- States covered
- 43
- Variables modeled
- 22
Each decision was tagged against 22 attributes describing the local conditions: was there a building moratorium in effect? Were organized community groups opposing? Did the developer offer a tax-abatement deal (a 'PILOT')? Did the parcel already have industrial zoning? And so on. We then joined every decision to its county's full demographic profile from the US Census — over 100 features covering income, education, age, race, employment by industry, housing, broadband, and migration patterns.
What actually predicts a community fight
We then ran a statistical method called 'lasso regression' that picks the smallest set of predictors that best explains why some communities approve data centers and others reject them. Out of the 70+ features we tested, 40 turned out to be statistically meaningless — random noise. The 30 that survived are the ones doing the actual predictive work.
Plain English: we tested 70 possible predictors; lasso kept 30 and pushed weak ones to zero. CV AUC 0.82 means that across five train/test splits, a rejected project ranked riskier than an approved one about 82% of the time. Z-scaled means every input was put on the same scale before comparing bar lengths.
At the top: an active building moratorium is by far the strongest predictor of rejection — 15× higher odds. Organized community opposition is the second strongest. The strongest predictor of *approval* is whether the developer offered a tax-abatement deal (a PILOT): communities with PILOT offers approve at nearly 5× the baseline rate. Existing industrial zoning on the parcel is the third-strongest approval signal. None of those are surprising.
What conventional wisdom got wrong
Several things that have been on industry checklists for years turn out to have no real predictive power. Proximity to a Civil War battlefield or historic district was thought to make rejection 37% more likely; the data shows essentially no effect. High groundwater stress was thought to push toward rejection; the data shows the opposite. Foreign capital or shell-entity ownership was thought to be a 49% resistance booster; the actual effect is much smaller.
| Variable | Old β (N=225) | New β (N=1006) | What changed |
|---|---|---|---|
| Moratorium active at filing | +1.40 | +2.73 | Almost doubled. Moratoriums are decisively the strongest resistance signal. |
| Water-intensive cooling proposed | -0.23 | +0.76 | Sign flipped — communities now resist water-cooled proposals. |
| Pre-filing community engagement | -1.15 | -0.41 | Overstated 3×. Still helpful, but not the silver bullet. |
| Hyperscaler precedent within 50 mi | -0.25 | +0.05 | Was overstated. Communities don't approve because a Microsoft DC is nearby. |
| New substation required | -0.02 | -0.50 | Was claimed null. Actually moderately protective. |
| Existing industrial zoning | -0.54 | -0.88 | Stronger than originally measured. |
| Battlefield / historic district nearby | +0.31 | -0.08 | Was wrong-direction noise. Null effect. |
| Groundwater stress (USGS) | +0.07 | -0.32 | Sign flipped. Real effect is slightly protective, not resistance-pushing. |
| Rural pristine setting | +0.47 | +0.07 | Much weaker. Original overcredited landscape framing. |
| ≥750 MW gigawatt-scale | -0.30 | -0.59 | Bigger projects are MORE likely approved (PILOT + state cover). |
| PILOT / tax abatement offered | -1.23 | -1.54 | Remains the strongest protective tool by 2×. |
Two 'silver bullets' the industry has been recommending for years are overstated by 3× or more. Pre-filing community engagement (running open houses before the formal vote) was claimed to triple approval odds; the actual effect is 50% — still positive, but much smaller. Having a Microsoft, Google, or AWS data center within 50 miles of the proposed site was claimed to boost approval; with corrected measurement, it has essentially no effect. Communities don't approve a new data center because there's already one two counties over.
The most striking finding is water-intensive cooling. The original published model said water cooling slightly helped approval. The new data shows the opposite — water cooling is now meaningfully predictive of rejection. Communities have shifted hard on water use between 2022 and 2026, and proposals that mention evaporative cooling now face significantly higher resistance than air-cooled designs.
What demographics actually tell us
The findings that no one was tracking come from the Census data. Older counties resist data centers more. Counties with higher percentages of white residents resist more; counties with higher percentages of Black residents resist less. More-educated counties (more bachelor's degrees) resist more — a finding consistent with sociology literature on 'educated NIMBY' mobilization. Counties with strong arts-and-food economies (Sedona, Asheville, Hudson Valley) resist more.
The strongest brand-new signal is percentage of county workforce employed in transportation and utilities. Counties where the labor market already absorbs heavy-industry workers treat data centers as 'more of the same' and approve at much higher rates. The second strongest is migration: counties where lots of people recently moved in from other counties approve more readily. Anti-development sentiment is a function of stable, established communities — not specific to data centers.
What this means in practice
First, the demographic profile of a candidate county now belongs in the diligence process. A site in a slow-growth, older, whiter, more-educated county carries hidden risk premium that doesn't show up in any of the explicit policy variables. Conversely, sites in counties with transport/utilities-heavy labor markets or high recent in-migration carry hidden discount.
Second, if you can offer a meaningful PILOT (tax-abatement) deal, it dominates almost every other intervention. The effect size is larger than community engagement, hyperscaler precedent, and industrial zoning combined.
Third, the new model is more honest about uncertainty. The old version reported 96% predictive accuracy; that was a statistical artifact of training on too few rows. The new version reports ~81% — still useful, but practitioners should treat resistance scoring as a structured prior, not a verdict.
How the analysis was done
The data pipeline runs in three stages. First, an AI model finds candidate news articles for each county via web search, then reads each article and extracts the data center decision details. Every URL is verified to still load before being added. Second, county demographic data is pulled from the US Census Bureau's free API — 124 features covering income, education, age, race, industry employment, and more — and joined to each decision by county FIPS code. Third, a statistical model (lasso regression) is fit to find which combination of features best predicts approval vs. rejection.
The full corpus and source code is open — the data lives in lib/resistance/outcomes-*.ts and the analysis scripts live in scripts/data-center-resistance/. Every outcome row carries a primary-source URL anyone can verify. The whole pipeline is re-runnable: when new data center decisions happen, we run another wave of ingestion, refit the model, and update the public chart.
Glossary
- Entitlement
- The land-use approval a project needs from local government before construction can start — zoning, special-use permit, conditional use, site plan review, etc. The decision a county commission or planning board makes.
- PILOT
- Payment In Lieu Of Taxes — an agreement where a developer pays a fixed (usually lower) annual fee instead of full property tax, in exchange for the right to build. Widely used to attract data centers.
- Moratorium
- A temporary county-level ban on data-center applications, usually adopted by board ordinance for 6–18 months while the county studies regulations. Different from outright zoning denial.
- Lasso regression
- A statistical method that fits a model AND drops irrelevant predictors at the same time — it pushes the coefficients of weak predictors all the way to zero, leaving only the variables that actually carry signal.
- Census ACS
- American Community Survey 5-year — the US Census Bureau's free database of demographic, economic, and housing data, published at the county level. The source for the income, age, race, education, and industry-employment features we joined to each entitlement decision.
- FIPS
- A 5-digit code uniquely identifying every US county. 51059 = Fairfax County, VA. Used here as the join key between data-center decisions and Census features.
Primary sources + reproducibility
Get started
Type your site in. See the de-rate.
The calculator returns an effective MW number, the binding rule, and a $/MW-yr net value as you type.