(cont’d from previous comment)
As I have mentioned at the beginning, the reports up to 2005 contained highly overoptimistic projections for on-chip frequency and supply voltage, which became dramatically more pessimistic in the 2007 edition. The reports clearly state, however, that these numbers are meant as targets and are not necessarily “on the road to sure implementation”, especially where it has been highlighted that solutions were needed and not yet known. They can therefore not necessarily serve as a clear indictment of the ITRS’ predictive powers, but I remain puzzled by some of their projections and comments on these before 2007. Getting clarification on this from industry insiders was the next thing I had planned for this project before we paused it.
Specifically, tables 4c and 4d in the Overall Roadmap Technology Characteristics, found in a subsection of the Executive Summary titled Performance of Packaged Chips, contain on-chip frequency forecasts in MHz, which became dramatically more pessimistic in 2007 than they had been in the previous 3 editions. A footnote in the 2007 edition states:
after 2007, the PIDS model fundamental reduction rate of ~ −14.7% for the transistor delay results in an individual transistor frequency performance rate increase of ~17.2% per year growth. In the 2005 roadmap, the trend of the on-chip frequency was also increased at the same rate of the maximum transistor performance through 2022. Although the 17% transistor performance trend target is continued in the PIDS TWG outlook, the Design TWG has revised the long-range on-chip frequency trend to be only about 8% growth rate per year. This is to reflect recent on-chip frequency slowing trends and anticipated speed-power design tradeoffs to manage a maximum 200 watts/chip affordable power management tradeoff.
Later editions seem to have reduced the expected scaling factor even further (1.04 in the 2011 edition), but there were also changes made to the metric employed, so I am not sure how to interpret the numbers (though I would expect the scaling factor to be unaffected by those changes).
Relatedly, a paragraph in the System Drivers document titled Maximum on-chip (global) clock frequency states that the on-chip clock frequency would not continue scaling at a factor of 2 per generation for several reasons. The 2001 edition states 3 reasons for this, the 2003 and 2005 edition state 4. But only in 2007 was the limitation from maximum allowable power dissipation added to this list of reasons. This strikes me as very puzzling. The paragraph, as it appears in the 2007 edition, is (emphasis added):
Maximum on-chip (global) clock frequency—(...) Through the 2000 ITRS, the MPU maximum on-chip clock frequency was modeled to increase by a factor of 2 per generation. Of this, approximately 1.4× was historically realized by device scaling (17%/year improvement in CV/I metric); the other 1.4× was obtained by reduction in number of logic stages in a pipeline stage (e.g., equivalent of 32 fanout-of-4 inverter (FO4 INV) delays13 at 180 nm, going to 24–26 FO4 INV delays at 130 nm). As noted in the 2001 ITRS, there are several reasons why this historical trend could not continue: 1) well-formed clock pulses cannot be generated with period below 6–8 FO4 INV delays; 2) there is increased overhead (diminishing returns) in pipelining (2–3 FO4 INV delays per flip-flop, 1–1.5 FO4 INV delays per pulse-mode latch); 3) thermal envelopes imposed by affordable packaging discourage very deep pipelining, and 4) architectural and circuit innovations increasingly defer the impact of worsening interconnect RCs (relative to devices) rather than contribute directly to frequency improvements. Recent editions of the ITRS flattened the MPU clock period at 12 FO4 INV delays at 90 nm (a plot of historical MPU clock period data is provided online at public.itrs.net), so that clock frequencies advanced only with device performance in the absence of novel circuit and architectural approaches. In 2007, we recognize the additional limitation from maximum allowable power dissipation. Modern MPU platforms have stabilized maximum power dissipation at approximately 120W due to package cost, reliability, and cooling cost issues. With a flat power requirement, the updated MPU clock frequency model starts with 4.7 GHz in 2007 and is projected to increase by a factor of at most 1.25× per technology generation, despite aggressive development and deployment of low-power design techniques.
Finally, the Overall Roadmap Technology Characteristics tables 6a and 6b (found in a subsection titled Power Supply and Power Dissipation in the Executive Summary) contains projected values of the supply power () which also became dramatically more pessimistic in the 2007 edition.
I have indicated my puzzlement at these points in an email I have sent out to a number of industry insiders, then asking:
Do the 3 revisions made to the roadmap in 2007 that I’ve pointed out reflect a failure of previous editions to predict the “breakdown in the serial speed version of Moore’s Law” and the relevant issues that would cause it? Or do they merely reflect the ambitiousness and aggressiveness of the targets that were set before admitting defeat became inevitable?
I have received some very kind replies to those emails, but most have focused on the technical reasons for the “breakdown” in Dennard scaling. The only comment I have received on this last question was from Robert Dennard, who sent me a particularly thoughtful email that came with 4 attachments (which mainly provided more technical detail on transistor design, however). At the end of his email, he wrote:
I cannot comment on wishful thinking vs hard facts. Predicting the future is difficult. Betting against Moore’s Law was often a losing game. Texas Instruments quit way to early.
Indeed, which bets it is most rational to make depends on expected payoff ratios as well as on probability estimates. This distinction between targets and mere predictions complicates the question quite a bit.
This was an interesting project, it would be great to pick it up again.
Thanks! But as you know my overview you link to is about the breakdown of Dennard Scaling, which is related to but really quite distinct from Moore’s Law. I’m not sure how much this matters, but it struck me as misleading.