Why do hot spots form?
Hot spots are an increasing problem in microprocessor design for two reasons: First, transistor density has continued to increase over the past decade, even as clock speeds have flatlined. This means more and more transistors are packed into a smaller space, which means each transistor has less and less area to dissipate heat. The second problem is that CPU voltage largely stopped scaling. The chart below captures this:
This chart shows CPU voltages and feature sizes graphed in micrometers (0.13 = 130nm, 0.03 = 32nm). 130nm CPUs shipped 15 years ago with operating voltages between ~1.5v and 1.75v. Jump back 15 years again, to 1985, and Intel’s 80386 required 5 volts. In 1994, the 486DX2 used an operating voltage of 3.3v. AMD’s K6, in 1999, had an operating voltage of 2.1v. Had this scaling continued, modern microprocessors would require well below 0.5v today.
Unfortunately, there are minimum voltage levels required to turn a transistor on in the first place. Voltage scaling, like frequency, stalled out at 1V. If you’re an overclocker, you’re probably aware that modern CPUs respond poorly to significant voltage increases — modern chips have a smaller voltage range (in absolute terms) than older CPUs did.
Hot spots are a problem because voltage stopped scaling, but density didn’t. This is part of why Intel and AMD have poured so much effort into improving power gating and reducing idle power consumption. The more silicon you can turn off at any given moment, the greater total power you can divert into those areas of the chip that you want to operate. The rise of so-called “dark” silicon is directly tied to these problems, but the technique isn’t foolproof.
Two other issues also complicate the situation. First, microprocessors don’t tend to shed much heat laterally (across the die), though there are ways to improve this by designing a chip so that hot areas are placed next to cool ones. Second, by the time heat reaches the heatsink + fan, it’s already radiated through the chip, across the thermal interface material between the CPU and its lid, through the lid, and then through the thermal interface material between the CPU and its heatsink. Each of these steps decreases the total amount of heat that reaches the heatsink to be radiated away. This is why some high-end overclockers de-lid their processors to improve performance — removing the lid can improve overclocking by several hundred megahertz.
DARPA’s Icecool initiative
DARPA is working with Lockheed Martin to develop microfluidic cooling methods that would pump microscopic amounts of water directly through CPUs as a means of cooling chips directly.
The project began four years ago and is now starting to bear real fruit. In Phase I, Lockheed demonstrated that it could effectively cool a “thermal demonstration die dissipating 1 kW/cm2 die-level heat flux with multiple local 30 kW/cm2 hot spots.” Its microfluidic solution cooled this test case effectively, despite the fact that this is 4-5x more than most current processors. Lockheed continues:
“In Phase II of the program, the team has moved on to cooling high power RF amplifiers to validate the electrical performance improvements enabled by improved thermal management. Utilizing its ICECool technology, the team has been able to demonstrate greater than six times increase in RF output power from a given amplifier, while still running cooler than its conventionally cooled counterpart.”
Right now, Lockheed is working to integrate its technology with Qorvo, which uses Gallium Nitride (GaN) technology and builds radios and RF equipment. GaN transistors typically operate at high frequencies and very high temperatures — far more than conventional silicon processors.
Nonetheless, there’s reason to believe that microfluidic cooling can be adapted for microprocessors at some point in the future. The challenge would be validating and deploying it. Intel and AMD would have to do the work themselves, and chip layouts would have to change significantly to incorporate an on-die cooling solution of this sort. There would also be extensive costs related to prototyping, validating, and designing compatible hardware across the entire PC ecosystem. Integrating cooling directly into the microprocessor would also have an impact on the third-party cooler industry, and the entire question of “upgradeable” CPUs. Finally, while this method would allow for some significant short-term clock speed improvement, it wouldn’t provide a permanent long-term solution — not so long as increasing CPU voltages has such a dramatic effect on power consumption.
In short: It’s complicated and expensive. That doesn’t mean, however, that we won’t see it adopted at some point. Right now, one of the biggest challenges in modern computing is that we can’t turn the entire processor on and run it at full power for any length of time. Microfluidics could dramatically improve that situation, and IBM has demonstrated someimpressive gains in this field as well. I suspect we’ll see microfluidics evaluated more seriously if other scaling solutions and technologies can’t provide a solution.