Skip to content
DataCentersExposed

Methodology

Everything on this site comes from public sources. The headline rule: if we can't link to a primary source, we don't publish the claim. Below is exactly which pipelines run today and which are planned — and the live coverage dashboard shows how complete each one actually is, gaps included.

The facility spine

The spine of the database is OpenStreetMap (Overpass) — every node and way tagged as a data center in the United States.Live Because an OSM tag is a single open source, those rows are labeled mapped, not operating; a site is only called operating once a second source corroborates it (an EPA registry id, a permit, a violation, an abatement, or a named primary source). We cluster building-level nodes within ~500m of the same operator into a single campus so a 60-building hyperscale site counts once.Live Frontier campuses and their measured capacity come from Epoch AI's Frontier Data Centers dataset (CC BY 4.0).Live

LLC unmasking

Operators file local permits and tax-break registrations under throwaway LLCs and codename project names. We decode these from SEC EDGAR full-text filings and OpenCorporateschains, plus regulator disclosure lists that name both the owner LLC and the operator (e.g. the Texas Comptroller's data-center registry).Live Aliases are stored separately so you can search by codename and find the real parent. Extraction false-positives are periodically purged; the count of decoded codenames on /coverage reflects only verified decodes.

Tax abatements & subsidies

Subsidy figures come from official disclosures: Virginia's Department of Taxation RD40 biennial report (the statewide data-center sales/use-tax exemption — $1.94B forgone in FY25), the Texas Comptroller and Wisconsin DOR certified-facility lists, and state ACFR GASB-77 notes.Live Where a state withholds the dollar amount by law (Texas, Wisconsin), we record the program and flag the amount as withheldrather than inventing a number. Per-deal data from Good Jobs First's Subsidy Tracker is not yet ingested at scale (no public API).Planned

Permits, violations & water

Permits and violations come from EPA ECHO: Clean Water Act / NPDES via cwa_rest_services and Clean Air Act via air_rest_services, filtered to NAICS 518210 and legacy SIC 7374/7379.Live An air or water source with an EPA high-priority-violator flag or recent non-compliance becomes a violation row tied to the facility, with a legal-status marker (alleged / litigated / confirmed / settled) — we never mark a case settledwithout a documented penalty. Distance-based "within 1 mile" enforcement counts are computed with PostGIS, not editorial judgment. Water-use context comes from USGS principal-aquifer assignment.Live

Water use — measured, then modeled

Water is layered by basis, best-available first. Measured figures come from state withdrawal reporting (Virginia DEQ, Arizona ADWR and peers), company disclosures, and public-records requests; permitted figures are the licensed ceiling from a withdrawal permit. LiveMost facilities have neither — municipally-supplied sites never appear in state withdrawal data because their use rolls up inside the city utility's total. For those, rather than show a blank, we display a clearly-labeled low-confidence model with two numbers: direct onsite cooling (annual energy × a climate-banded water-use-effectiveness of 1.2–2.5 L/kWh, anchored to the ~1.8 L/kWh industry average in the Lawrence Berkeley / US DOE 2024 data-center energy report) and indirectwater consumed generating the facility's electricity (annual energy × a balancing-area grid water-consumption factor derived from NREL operational water-use factors and eGRID-subregion grid averages). Live The only input is capacity (MW); a facility with no disclosed MW gets no estimate — we never invent a number. Every estimate links here, shows its inputs, and is replaced the moment a measured figure lands. Estimates are kept out of all site-wide water totals.

Capacity (MW)

Power capacity prefers a measured nameplate figure (power_mw) and falls back to ISO interconnection-queue bands (interconnection_mw), shown with the facility's source and confidence.Live Broader ISO-queue coverage (all balancing authorities via gridstatus) and air-permit generator nameplates are being widened.Planned

News & legislation

Data-center news is monitored via GDELT with scored sentiment; state legislation is tracked via LegiScan.Live

Public meeting transcripts

Data centers are approved in county boards, planning commissions, and city councils. For every tracked meeting we fetch the real on-the-record text and find the moments a facility is actually discussed. LiveThree sources, all the jurisdiction's own public record: Granicusclosed-caption transcripts published with US meeting video (e.g. Loudoun County, "Data Center Alley"), the Legistar official record — agenda items, the action taken, and recorded roll-call votes — for the ~70% of large US jurisdictions it powers, and Public-icaption transcripts for UK & Ireland councils (the same endpoint Bellingcat's open-source CouncilSearcher uses). Verbatim captions are auto-generated live speech-to-text: accurate to the spoken word but carrying recognition errors ("loudon" for "Loudoun"). We reproduce them uncorrected and label them as such — we never paraphrase or clean the record. Each transcript links back to its source caption file and the meeting video at the timestamp the facility was discussed. We also ingest the CC0 LocalView corpus (≈140k US local-government meeting videos on YouTube, 2006–2023): the meetings that discuss data centers surface as embeddable video cued to the moment, attributed to the county — never guessed onto a specific site unless the meeting names it.Live

Personalized bill impact

The impact calculator estimates the data-center share of a household's rising grid-capacity costs, for addresses in PJM territory.Live It is explicitly an estimate, and it shows its work — the full derivation is below and on the capacity tracker.

Capacity-cost math

Three steps, every input public. (1) Price:PJM's Base Residual Auction clearing price for the delivery year currently being billed (2026/27: $329.17/MW-day; the 2024/25 baseline was $28.92), from PJM's published auction reports. (2) Household share: capacity costs are allocated by contribution to system peak; a typical home contributes roughly 1.0–1.8 kW at peak, and we use 1.4 kW as the midpoint, always showing the range. Price × 365 ÷ 1,000 gives $/kW-year; × peak contribution ÷ 12 gives the monthly figure. (3) Data-center share:we apply the attribution range published by Monitoring Analytics, PJM's independent market monitor — 63% of the 2025/26 increase, 40% of 2027/28 total costs — as a 40–63% band. This is the monitor's attribution, clearly labeled, not our own causal claim; how and when capacity costs reach retail bills varies by utility, rate class, and state regulation. Outside PJM we show no dollar figure rather than an unsourced one. In June 2026 we removed an earlier per-utility figure table whose citations could not be verified — logged in the corrections log.

Planned pipelines

Document corpus + OCR, lobbying disclosures, satellite change-detection, a daily briefing, and a public API + bulk data dumps are designed into the schema but not yet live.Planned When a pipeline ships, this page and the coverage dashboard update together — and the capability flips from Soon to Yes on the comparison table.

Confidence levels

Every facility, abatement, and link carries an explicit high / medium / low confidence value. "High" means a primary public source independently asserts it. "Medium" means strong indirect evidence (e.g. an LLC name matched in a state filing). "Low" means triangulated but not confirmed. Low-confidence records are visible but visually marked.

Corrections

Mistakes get fixed and logged. The corrections log is public. Email us with documentation and we'll respond within 72 hours.