While digging into Linux power management over the past few weeks, I stumbled upon an interesting phenomenon, on multi-socket machines, the core frequency of one socket affects the power consumption of other sockets, even when controlling for their frequency, utilization and workload!

While some power interaction between sockets is expected due to heat transfer and thermal effects, what makes this discovery particularly interesting is the magnitude of power dependence between sockets and the surprising mechanism behind it. This blog post documents my investigation and explains how Intel’s power management policy creates this hidden power relationship.

Experimental Setup

To study this phenomenon, I designed a minimal setup as shown in Figure 1. Using c220g5 (Intel Xeon Silver 4114) machines on CloudLab with 2 sockets, I created the following configuration:

  • Socket 1 (Active socket): Running multiple busy loops (programs with infinite for loops) to control utilization. Each busy loop fully utilizes one core. The frequency of all cores on Socket 1 was fixed at 0.8 GHz, the lowest possible frequency on the machine.
  • Socket 0 (Passive socket): Running an energy measurement script pinned to a single core, regularly polling the energy counter using the powercap-info command.

By measuring energy consumption over short intervals, I could calculate power consumption by dividing energy by time.

Figure 1: Experimental Setup using c220g5 nodes on CloudLab

The Mysterious Socket Power Dependence

For my experiment, I varied the frequency of Socket 0 while measuring power consumed by both sockets. Since Socket 1’s core frequency was fixed throughout the experiment, I expected that changing Socket 0’s frequency would have little to no impact on Socket 1’s power consumption. However, my results showed something completely different!

Figure 2 shows the power consumed by both sockets when Socket 1 is at 40% utilization as I vary Socket 0’s frequency. When I increased Socket 0’s frequency from 0.8 GHz to 2.0 GHz, Socket 1’s power consumption remained stable as expected. However, when I increased Socket 0 to 2.2 GHz, Socket 1’s power consumption suddenly jumped (brown line)!

This pattern appeared consistently across different utilization levels and even when swapping which socket was active. Two aspects of this result were particularly surprising:

  • The power dependency was triggered only when Socket 0 ran at 2.2 GHz (the highest frequency of the machine)
  • The difference in Socket 1’s power consumption between 2.0 GHz and 2.2 GHz settings was approximately 5W — a significant amount when Socket 1’s power consumption is around 30W.
Figure 2: Power consumption of both sockets when varying Socket 0's frequency while Socket 1 runs at 40% utilization

The Culprit: Uncore frequency

To identify the cause of this socket power dependence, I collected various microarchitectural metrics. One metric stood out with a perfect correlation to the observed power behavior: uncore frequency.

The “uncore” refers to components of the CPU that are not part of the cores themselves—including memory controllers, last-level cache, and interconnects between cores. These components have their own clock frequencies that can be adjusted separately from core frequencies.

Figure 3 shows the uncore frequency when varying Socket 0’s core frequency. The uncore frequency remained at its minimum value (1.2 GHz) when Socket 0’s core frequency was between 0.8 GHz and 2.0 GHz. However, when the core frequency reached 2.2 GHz, the uncore frequency suddenly jumped to its maximum value of 2.0 GHz.

Figure 3: Uncore frequency behavior when varying Socket 0's core frequency

To verify that uncore frequency was indeed responsible for the power dependency, I conducted another experiment where I fixed the uncore frequency at 1.2 GHz while varying Socket 0’s core frequency. Figure 4 shows the results—Socket 1’s power consumption no longer depended on Socket 0’s core frequency when the uncore frequency was fixed.

Figure 4: Power consumption of both sockets when varying Socket 0's frequency with uncore frequency fixed at 1.2 GHz

Why does this happen?

This behavior is explained by Intel’s Uncore Frequency Scaling policy. According to research by Guo et al.:

Uncore frequency is dynamically adjusted only when all the active cores are running at a frequency lower than the base frequency. When at least one core is running at a higher frequency, the uncore consistently stays at the maximum frequency”

For my test machine, the base frequency is 2.2 GHz. Therefore, when I set Socket 0’s core frequency to 2.2 GHz, the uncore frequency automatically increased to its maximum value, causing increased power consumption on the adjacent socket as well.

What’s the rationale behind this design?

While this explains the mechanism behind the socket power dependency, it raises the question of why Intel designed their uncore frequency scaling this way.

The best explanation comes from a paper by Bircher et al., which suggests that when a core runs at high frequency, it can be stalled by delays in cache snooping requests if the uncore frequency is too low. Running the uncore at maximum frequency when any core reaches maximum frequency helps prevent these bottlenecks.

However, this optimization only makes sense for memory-intensive applications. In my experiment with CPU-bound busy loops requiring minimal memory access, this behavior seems unnecessary yet still impacts system-wide power consumption.

If you have alternative explanations for this behavior or have observed similar phenomena, I’d love to hear from you!

Key Takeaways

This investigation has revealed a significant and often overlooked power dependency in multi-socket systems, where the core frequency of one socket can unexpectedly impact the power consumption of adjacent sockets due to scaling of uncore frequency.

This finding has important implications for power management in multi-socket systems:

  • Power Modeling: Accurate power models must consider the cross-socket effects of uncore frequency scaling to provide reliable power estimates for diverse workloads.
  • Workload Placement: Careful consideration should be given to the placement of high-frequency workloads. Running applications that heavily utilize the maximum core frequency on one socket could inadvertently increase the power draw of other sockets, potentially impacting overall system efficiency and thermal management.