Leyan Li | Strong Scaling Note

Context

Why strong scaling can be misleading at first glance

When I first look at a strong scaling result, the most obvious question is whether runtime goes down as I increase the number of processes. That is a useful first check, but it is not enough. A faster runtime can still hide poor efficiency if the added parallel resources are not being used well.

In MPI workloads, each increase in process count usually reduces the amount of local computation per rank, while communication and synchronisation do not shrink in the same way. That imbalance means the computation-to- communication ratio gradually worsens as the job scales.

What I Look For

Three signals that matter more than raw runtime

First, I check parallel efficiency, not only speedup. If runtime is improving but efficiency falls sharply, that often means the program is approaching a scaling limit.

Second, I look for where the curve bends. A noticeable flattening usually suggests that communication overhead, idle time, or synchronisation has started to offset the benefit of adding more processes.

Third, I think about the workload itself. Some problems are simply too small to benefit from aggressive scaling, even if the implementation is sound. Good results depend not only on code quality, but also on choosing a problem size that matches the hardware and the decomposition strategy.

Takeaway

Scaling results should explain behaviour, not just report numbers

The main lesson I take from strong scaling experiments is that they are most useful when they help explain where performance stops improving and why. Good analysis is not just a table of runtimes. It is an argument about the system behaviour underneath those numbers.

That is why I prefer to pair scaling plots with comments on communication cost, decomposition choices, and efficiency trends. The interesting part is not only that the program scaled to a certain process count, but what the curve reveals about the design of the workload itself.