A Novel 3D-Printed Copper Tree Could Cut AI Cooling by 32%

A Novel 3d Printed Copper Tree Could Cut Ai Cooling By 32
Follow Us:
1k
1k

A team of researchers led by University of Illinois Urbana-Champaign professor Nenad Miljkovic and engineers at San Diego startup Fabric8Labs spent 1,000 iterations of a topology optimization algorithm to produce the most efficient cold plate they could 3D-print in copper.

The result, reported May 7 in Cell Reports Physical Science and previously described by Gizmodo, resembles a tree more than a heat sink. It provides 32% greater cooling performance than conventional cold plates used in high-density AI servers and reduces pressure resistance on the liquid cooling loop by 68%.

32%
Increased Cooling Performance

"After 1,000 iterations, it ends up with this really beautiful tree-like structure, which is optimized for heat flow," Miljkovic, who directs UIUC's Air Conditioning and Refrigeration Center, told Gizmodo. "We can make these optimized three-dimensional structures that you couldn't make using classical manufacturing."

The approach relies on electrochemical additive manufacturing, or ECAM, a high-resolution technique for 3D-printing metals that Fabric8Labs has been developing for industrial applications. Copper is deposited layer by layer with controlled electrochemical reactions to produce the branched geometries predicted by the algorithm but not achievable by Computer Numerical Control, or CNC, machining.

The cooling plates are designed to drop into existing hardware. "They can replace existing cold plates in advanced servers with little to no modification if we print them correctly," Miljkovic told me. "The plates are pure copper, as are existing solutions."

Ian Winfield, VP of Product and Applications at Fabric8Labs, framed the gains in terms data center operators care about. "ECAM cold plates offer substantial performance gains over traditional microchannel cold plates," he told me. "Chip junction temperature reductions in the range of 7°C/kW have been achieved by changing from a skived fin microchannel to a 3D design enabled by ECAM." Winfield added that hydraulic performance can also outperform a skived fin cold plate by leveraging optimized flow paths, and that ECAM allows customization of the cold plate design "to meet system-level constraints and design goals."

Why the 68% Number Matters More

That's the state-of-the-art for the science. What about the technology for deployment of these geometries within an actual rack?

Cooling power for dense racks consumes not only electrical power but also power for pumping coolant through the manifolds. Conventional cold plates have machined channels that impede flow. In contrast, the geometry achieved by 3D-printing offers far less resistance. These structures result in much smaller pumps, smaller diameter tubing, and substantially reduced losses at every level of the cooling loop.

0% Pressure resistance on the
liquid cooling loop

From the standpoint of deployment, the team estimates that the overall cooling requirement for a 1-gigawatt data center is reduced to about 11 megawatts. That compares with roughly 500 megawatts of cooling for an equivalent load of compute capacity cooled by conventional air systems — close to half of the facility's total power.

The Death of Over-Provisioning

This isn't merely an improvement in overall efficiency. Rather, it represents an entirely different equation for the size and cost of systems used to purchase capacity from the electric grid. For most of the first two decades of cloud computing, cooling systems operated at a single condition: full capacity.

The rationale was that costs for any thermal event (CPU throttling, memory error, melted backplane) far exceeded those of simply running chillers at full capacity. So over-provisioning was the safe default.

That calculation has now largely been reversed. AI workloads consume substantially more power per rack than any previous generation of equipment. Liquid cooling is no longer an option offered by vendors. It is required to physically accommodate such large quantities of heat. And the moment liquid is in the loop, the cost of the pump itself becomes significant.

The system developed by Miljkovic's group sits at the point of convergence. Fully optimized cold plates fabricated by ECAM deliver substantially greater cooling with significantly less pressure drop than is achieved with prior approaches that depend on over-provisioning. When implemented at scale, the approach reduces the cost of cooling to that of a tunable optimization problem.

What This Means for SME Hosts

This is of major concern to data center operators. Hyperscale entities can absorb a new cooling configuration as a line item on a billion-dollar project. In contrast, small and medium-sized data center operators experience direct reductions in profit margins for every kilowatt of cooling overhead.

Increases over the past two years in the cost of power in the major corridors of the data center belt have reduced available grid capacity enough to spark discussions about moratoriums on new construction by local governments.

ECAM Cold Plate
ECAM Cold Plate Manufactured by Fabric8Labs

Wayne Diamond, CEO of Hosted.com, told HostingDiscussion in a recent piece on green hosting that the conversation has flipped: "green hosting is entering the decision-making process rather than coming as an afterthought." Customers now ask the question before they sign.

That shift is the second mandate sitting on top of the first. Energy costs are higher. Requirements for disclosure of carbon emissions are more stringent. And conditions for use of energy efficiency as a criterion for vendor selection are increasingly being implemented.

A reduction of 32% in the power required to cool a rack sold to a customer isn't only an opportunity to discuss environmental issues. It also provides a basis for offering a substantially lower cost per kilowatt of cooling on the quote sheet.

Winfield said the manufacturing economics back that up: "ECAM cost is competitive compared to other manufacturing methods."

What's Still in the Lab

ECAM manufacturing is already in commercial production at Fabric8Labs, but the optimized cold plate geometry described in the paper has yet to be validated on real chips at scale. Miljkovic's group has said the next challenge is to demonstrate cold plates on real chips and to collaborate with cloud providers for trials on actual hyperscale servers. Until those results are available, the 32% performance figure represents a controlled test benchmark and not a measure of deployment.

Winfield said Fabric8Labs is preparing for that ramp: "Our production site is online and we are adding capacity to support the surging demand."

But the importance of the time scale is secondary to the direction of the technological trajectory. The cost curve for additive manufacturing of cooling plates goes in one direction, while the average power consumption per rack for new AI silicon goes in the opposite direction.

The point at which the two curves intersect corresponds to the conditions under which ECAM-cooled racks will become the standard option for those attempting to provide AI compute at low cost.

This will be particularly important for those who have spent the past five years acquiring increasing amounts of chillers, pumps, and diesel fuel for backup. For these operators, the time for achieving the technological crossover can't arrive soon enough.