Data Center Cooling Explained: Air vs Liquid vs Immersion

Editorial note: This is an independent technical guide. No affiliate links. No sponsored content. Written by a cloud infrastructure engineer with direct experience across all three cooling methodologies.

Data center cooling is having an identity crisis. For forty years, the industry ran on a simple premise: move cold air through hot servers, exhaust the heat, repeat. It worked well enough when a fully loaded server rack drew 5–10 kilowatts. It stops working — physically, economically, and environmentally — when a single rack of AI accelerators draws 50, 80, or 120 kilowatts, which is where the industry is headed right now.

The shift to AI training and inference infrastructure isn’t just a workload change. It’s a thermal engineering problem that is forcing a fundamental rethink of how data centers are built. I spent nine years managing infrastructure at cloud providers that were navigating exactly this transition, and the engineering trade-offs involved are less obvious than the marketing from cooling vendors would have you believe.

Air Cooling: How It Actually Works and Where It Fails

Conventional air cooling is elegant in its simplicity. Cold air — typically maintained at 18–27°C inlet temperature — enters server racks from the front, absorbs heat from components, and exits as hot exhaust at the rear. The data center’s CRAC (Computer Room Air Conditioning) units then remove that heat from the room and reject it outside via chillers or cooling towers.

The physics work well up to a point. A standard 42U rack with 1U and 2U servers typically draws 10–20 kilowatts. ASHRAE guidelines for air-cooled data centers are designed around this assumption. The cold aisle / hot aisle containment architecture that most modern data centers use — alternating aisles of server fronts and server rears — is optimized to keep supply air cold and exhaust air hot and separated, maximizing the efficiency of this heat exchange.

The problem with AI infrastructure is density. An NVIDIA H100 SXM5 module has a 700W TDP. A DGX H100 system with eight H100s draws up to 10.2 kilowatts — from a single 10U chassis. A full rack of DGX systems can draw 40–60 kilowatts. At that density, the volume of cold air required to remove that much heat becomes physically impractical. You’d need airflow velocities that create unacceptable acoustic noise and physical pressure differentials that interfere with server operation. Air simply cannot move fast enough to carry that much heat away.

⚡ Key Insight:
Water is roughly 3,500 times more effective at transferring heat than air at the same flow rate. This isn’t a marginal difference — it’s the fundamental reason liquid cooling is not optional for high-density AI compute infrastructure.

Direct Liquid Cooling: The Pragmatic Middle Ground

Direct Liquid Cooling (DLC) — sometimes called cold plate cooling or direct-to-chip cooling — brings liquid cooling directly to the highest heat-generating components while leaving the rest of the server air-cooled. Cold plates are mounted directly on CPUs, GPUs, and memory modules. Liquid circulates through these plates, absorbing heat, and returns to a heat exchanger that rejects the heat to facility cooling water.

The appeal of DLC is its compatibility with existing infrastructure. A data center designed for air cooling can be partially retrofitted for DLC without rebuilding the facility. The liquid loop for each server connects to a facility-level Cooling Distribution Unit (CDU) via quick-disconnect fittings, and the rest of the room’s air cooling infrastructure remains in place to handle the heat from components not covered by cold plates — storage, networking hardware, power supplies.

Air Cooling Max Rack Density
15–25 kW per rack

DLC Max Rack Density
40–100 kW per rack

Immersion Cooling Max Density
100–250 kW per tank

Air Cooling PUE (typical)
1.4–1.6

DLC PUE (typical)
1.15–1.3

Immersion PUE (typical)
1.02–1.1

DLC’s main limitation is coverage. Cold plates handle 60–80% of a server’s heat load — the components they’re directly mounted on. The remaining 20–40% still requires air cooling, which means you can’t fully eliminate air handling infrastructure. For racks drawing up to about 60–80 kilowatts, this hybrid approach is practical and cost-effective. Above that threshold, the residual air cooling requirements become the constraint again.

NVIDIA’s current AI server platforms — including the HGX H100 and HGX H200 configurations — support DLC through liquid-cooled GPU baseboards. Meta, Microsoft, and Google have all deployed significant DLC infrastructure for AI training clusters. It’s the dominant cooling approach for hyperscale AI compute in 2026, not because it’s the most thermally efficient option, but because it’s compatible with existing facility infrastructure and server form factors.

Rear Door Heat Exchangers: The Bridge Technology

Worth mentioning as a middle step between pure air cooling and full DLC: rear door heat exchangers (RDHx) mount on the back of standard server racks and use chilled water flowing through a panel of cooling coils to capture hot exhaust air before it reaches the room. The server itself remains entirely air-cooled — the RDHx intercepts the hot exhaust and transfers its heat to facility water.

RDHx units handle rack densities up to about 30–40 kilowatts reliably, which covers a lot of existing enterprise infrastructure. They retrofit onto standard racks without modifying servers, making them attractive for brownfield deployments. For AI workloads at today’s densities, they’re insufficient — but for general-purpose compute racks being run alongside AI infrastructure, they’re a cost-effective way to improve cooling efficiency without a full DLC deployment.

Immersion Cooling: The Thermal Engineering Ceiling

Single-phase immersion cooling submerges servers entirely in a dielectric fluid — a non-conductive liquid that doesn’t harm electronics. The fluid absorbs heat directly from every component surface simultaneously, circulates to an external heat exchanger, and returns cooled. Two-phase immersion uses a fluid with a lower boiling point — the fluid boils at component surfaces, rises as vapor, condenses on cooled coils above the tank, and drips back down.

The thermal performance numbers for immersion cooling are in a different category from any air or DLC solution. Single-phase immersion can handle rack densities exceeding 100 kilowatts per tank. Two-phase systems push past 200 kilowatts. PUE figures of 1.02–1.05 are achievable — meaning that for every watt of compute power, only 0.02–0.05 additional watts are consumed for cooling infrastructure. The best air-cooled facilities struggle to reach 1.3 PUE.

The practical barriers to immersion cooling adoption are significant, which explains why it remains a small fraction of deployed capacity despite its thermal advantages. The capital cost of immersion tanks, dielectric fluid, and the facility modifications required is substantially higher than DLC per kilowatt of capacity. Servicing immersed hardware requires extracting servers from fluid, which adds operational complexity and time. The dielectric fluids used — engineered hydrocarbons or synthetic esters — have their own handling, disposal, and material compatibility requirements. And the server ecosystem around immersion cooling, while growing, is less mature than the ecosystem around air-cooled and DLC hardware.

What AI Workloads Are Changing About These Trade-offs

The economics of data center cooling are shifting in ways that make liquid cooling increasingly compelling even beyond pure thermal necessity. AI training clusters run at sustained high utilization — 80–95% GPU utilization continuously for training runs that last days or weeks. This is fundamentally different from general-purpose compute workloads that average 20–40% utilization. The thermal steady state for AI infrastructure is maximum load, essentially all the time.

At sustained maximum utilization, the operational cost difference between a PUE of 1.5 (air cooling) and 1.1 (DLC) becomes significant at scale. For a 10MW AI compute facility — which is modest by hyperscale standards — the difference in annual energy cost for cooling alone is in the millions of dollars. The capital cost premium of DLC over air cooling pays back in two to four years at current energy prices in most markets.

The next generation of AI accelerators is making this more acute. NVIDIA’s Blackwell B200 has a 1,000W TDP per GPU. The GB200 NVL72 system — 72 Grace Blackwell superchips in a single rack — draws approximately 120 kilowatts. NVIDIA explicitly requires liquid cooling for GB200 NVL configurations. Air cooling is not an option at this power density. The industry isn’t choosing liquid cooling because it’s preferable — it’s choosing it because the physics of air cooling make it impossible at the densities AI infrastructure demands.

Heat Reuse: The Underappreciated Advantage of Liquid Cooling

One aspect of liquid cooling that gets less attention than it deserves is heat reuse. Air-cooled data centers reject heat at temperatures of 35–45°C — warm enough to be uncomfortable but not warm enough to be useful for most heating applications. Liquid-cooled systems, particularly immersion, can reject heat at 60–80°C — temperatures that are directly usable for district heating, industrial processes, or agricultural applications.

Several European data centers are already selling waste heat to municipal district heating networks. A liquid-cooled AI compute facility can theoretically achieve near-zero net energy consumption for the cooling infrastructure if the waste heat is fully utilized. This is an emerging economic model that doesn’t exist for air-cooled facilities and is one of the strongest long-term arguments for immersion cooling beyond pure thermal performance.

What This Means for Enterprise and Edge Deployments

For enterprise data centers running mixed workloads — AI inference alongside general-purpose compute, databases, and networking infrastructure — the practical approach in 2026 is a tiered cooling strategy. High-density AI racks get DLC. General-purpose racks get air cooling, potentially with RDHx units for the denser configurations. The facility’s chilled water infrastructure serves both.

Edge deployments — AI inference servers in telco facilities, retail locations, or manufacturing environments — face different constraints. These locations rarely have facility chilled water infrastructure, and the space and power envelopes are tight. Self-contained DLC systems designed for edge deployment are an emerging product category, with integrated CDUs that reject heat via air-to-water heat exchangers without requiring facility cooling water.

For a deeper look at the servers and infrastructure being deployed in AI data center buildouts, see our Data Centers coverage, updated regularly with new deployments and infrastructure analysis.

Frequently Asked Questions

Is immersion cooling safe for standard server hardware?

Single-phase immersion cooling using engineered dielectric fluids is safe for most server components — CPUs, GPUs, memory, and storage. The main compatibility concerns are mechanical hard drives (the fluid can interfere with the air bearing in spinning disks), certain types of thermal interface materials, and some cable jacket materials. Most modern servers designed for immersion use all-flash storage and are built with immersion-compatible materials. Standard servers not designed for immersion can be deployed in single-phase immersion with preparation, but it voids manufacturer warranties.

What is PUE and why does it matter?

PUE (Power Usage Effectiveness) is the ratio of total facility power to IT equipment power. A PUE of 1.0 would mean all power goes directly to computing with zero overhead — physically impossible. A PUE of 2.0 means for every watt of compute, another watt is consumed by cooling, lighting, and power distribution overhead. Lower is better. The global average data center PUE is approximately 1.58. Hyperscale facilities average around 1.2. Immersion-cooled AI facilities are achieving 1.03–1.08.

Can I implement liquid cooling in an existing air-cooled data center?

Yes, with varying degrees of facility modification. Rear door heat exchangers require minimal modification — they connect to facility chilled water and mount on existing racks. DLC requires running liquid distribution infrastructure (CDUs, manifolds, quick-disconnect piping) alongside existing racks, which is a significant but achievable retrofit. Full immersion cooling requires substantial facility modification — tanks are large, heavy, and require structural floor loading considerations, plus dedicated fluid management infrastructure.

Why don’t all data centers just use immersion cooling?

Capital cost, operational complexity, and ecosystem maturity are the main barriers. Immersion infrastructure costs significantly more per kilowatt of capacity than air cooling or DLC for upfront deployment. The operational procedures for maintaining immersed hardware are more complex than rack-based maintenance. And the server vendor ecosystem — while growing — still has gaps in immersion-compatible products. As AI workload density continues increasing and dielectric fluid costs decline, these barriers are eroding. Expect immersion to become mainstream for new AI compute facilities within five years.

What cooling does NVIDIA recommend for its latest GPU systems?

NVIDIA’s HGX H100 and HGX H200 systems support both air cooling and DLC configurations, with DLC strongly recommended for high-density rack deployments. The GB200 NVL72 system — NVIDIA’s current flagship AI training platform — requires liquid cooling and is not available in an air-cooled configuration. This represents a formal acknowledgment from NVIDIA that air cooling is no longer viable for top-tier AI infrastructure density.

WRITTEN BY

Priya Nair

Priya Nair

Cloud & Server Editor

9 years in cloud infrastructure managing large-scale AI training pipelines. Has worked directly with air, DLC, and immersion-cooled infrastructure across hyperscale and enterprise environments.

Specialties: AMD EPYC & Intel Xeon · Multi-GPU server architecture · Cloud vs. on-premise · AI training infrastructure

Stay up to date with the latest AI hardware reviews, buying guides, and analysis at AiGigabit.com. Bookmark us for daily updates.

Leave a Reply

Your email address will not be published. Required fields are marked *