Note that, after several decades of past success, the serial speed formulation of Moore’s Law did in fact break down in 2004 for the reasons described (Fuller & Millett 2011).
That sounds like hindsight bias. Was this breakdown predicted in advance? Furthermore was it predicted in advance and accepted as such by the community? If a few experts predicted this and a few others predicted it wouldn’t happen, I wouldn’t classify that as a success for the inside view; unless:
A) pretty much all experts who took the inside view predicted this
B) the experts had not been predicting the same thing in the past and been proven wrong
Personally I watched this happen, but I don’t think there was any consensus or expectation in the broader community of software developers that the serial speed formulation of Moore’s Law was going to fail until it did start to fail. Perhaps people who worked on the actual silicon hardware had different expectations?
I lived through this transition too, as a software developer. At least in my circles it was obvious that serial processing speeds would hit a wall, that wall would be somewhere in the single-digit Ghz, and from then on scaling would be added concurrency.
The reason is very simple to explain: we were approaching physical limits at human scale. A typical chip today is ~2Ghz. The speed of light is a fundamental limiter in electronics, and in one cycle of a 2Ghz chip light moves only 15cm. Ideally that’s still one order of magnitude of breathing room, but in reality there are unavoidable complicating factors: electricity doesn’t move across gates at the speed of light, circuits are not straight lines, etc. If you want a chip that measures on the order of 1-2 cm in size, then you are fundamentally limited to device operations in the single-digit Ghz range.
Singularity technology like molecular nanotechnology promises faster computation through smaller devices. A molecular computer the size of a present-day computer chip would offer tremendous serial speedups for core sizes a micrometer or smaller in size, but device-wide operations would still be limited to ~2Ghz. Getting around that would require getting around Einstein.
The ITRS reports have long been the standard forecasts in the field, kind of like the IPCC reports but for semiconductors. The earliest one available online (2000) warned about (e.g.) serial speed slowdown due to problems with Dennard scaling (identified in 1974), though it’s a complicated process to think through what lessons should be drawn from the early ITRS reports. I’m having a remote researcher look through this for another project, and will try to remember to report back here when that analysis is done.
I’m having a remote researcher look through this for another project, and will try to remember to report back here when that analysis is done.
That project has been on hold for close to a month, the reason being that we wanted to focus on other things until we get hold of the 1994, 1997 and 1999 editions of the ITRS roadmap report that it should be possible to order through the ITRS website. However, we never heard back from whoever monitors the email address that you’re supposed to send the order form to, nor were we able to reach anyone else willing to sell us the documents...
I am happy to report on what I have found so far, though.
Main findings:
What I understand to be the main bottlenecks encountered in performance scaling (three different kinds of leakage current that become less manageable as transistors get smaller) were anticipated well in advance of actually becoming critical, and the time frames that were given for this were quite accurate.
The ITRS reports flagged these issues as being part of the “Red Brick Wall”, a name given to a collection of known challenges to future device scaling that had “no known manufacturable solutions”. It was well understood, then, that some aspects of device and performance scaling were in danger of hitting a wall somewhere in 2003-2005. While the ITRS reports and other sources warned that this might happen, I have seen no examples of anybody predicting that it would.
The 2001-2005 reports contain projected values for on-chip frequency and power supply voltage (V_{dd}) that were, with the benefit of hindsight, highly overoptimistic, and those same tables became dramatically more pessimistic in the 2007 edition. It must be noted, however, that the ITRS reports state that such projected values are meant as targets rather than as predictions. I am not sure to what extent this can be taken as a defence of these overly optimistic projections.
An explanation given in the 2007 edition for the pessimistic corrections made to the on-chip frequency forecasts gives the impression that earlier editions made a puzzling oversight. I may well be missing something here, as this point seems very surprising.
Such aspects as stated in the 2 previous points have left me feeling fairly puzzled about how accurately the ITRS reports can be said to have anticipated the “2004 breakdown”. I had just started contacting industry insiders when Luke instructed me to pause the project. While the replies I received confirmed that my understanding of the technical issues was on the right track, none have given any clear answer to my requests to help me make sense of these puzzling aspects of the ITRS reports.
The reasons for the breakdown as I currently understand them
Three types of leakage current have become serious issues in transistor scaling. They are subthreshold leakage, gate oxide leakage and junction leakage. All of them create serious challenges to further reducing the size of transistors. They also render further frequency scaling at historical rates impracticable. One question I have not yet been able to answer is to what extent subthreshold leakage may play a more important role than the two other kinds as far as limits to performance scaling are concerned.
Here is an image of a MOSFET, a Metal-Oxide-Semiconductor Field-Effect Transistor, the kind of transistor used in microprocessors since 1970. (The image is from Cambridge University.)
The way it’s supposed to work is: When the transistor is off, no current flows. When it is on, current flows from the “Source” (the white region marked “n+” on the left) to the “Drain” (the white region marked “n+” on the right), along a thin layer underneath the “Oxide Layer” called the “Channel” or “Inversion Layer”. No other current is supposed to flow within the transistor.
The way the current is allowed to pass through is by applying a positive voltage to the gate electrode, which creates an electric field across the oxide layer. This field repels holes (positive charges) and attracts electrons in the region marked “p-type substrate”, with the result of forming an electron-conducting channel between the source and the drain. Ideally, the current through this channel is supposed to start flowing precisely when the voltage on the gate electrode reaches the value V_{th}, for threshold voltage.
(Note that the kind of MOS transistor shown above is an “nMOS”; current integrated circuits combine nMOS with “pMOS” transistors, which are essentially the complement of nMOS in terms of positive and negative charges/p-type and n-type doping. This technology combining nMOS and pMOS transistors is known as “CMOS”, where the C stands for Complementary.)
In reality, current leaks. Substhreshold leakage flows from the source to the drain when the voltage applied to the gate electrode is lower than the threshold voltage, i.e. when the transistor is supposed to be off. Gate oxide leakage flows from the gate electrode into the body (through the oxide layer). Junction leakage flows from the source and from the drain into the body (through the source-body and the drain-body junctions).
Gate oxide leakage and junction leakage are (mainly? entirely?) a matter of charges tunnelling through the ever thinner oxide layer or junction, respectively.
Subthreshold leakage does not depend on quantum effects and has been appreciable for much longer than the other two types of leakage, although it has only started to become unmanageable around 2004. It can be thought of as an issue of electrons spilling over the energy barrier formed by the (lightly doped) body-substrate between the (highly doped) source and drain regions. The higher the temperature of the device, the higher the energy distribution of the electrons; so even if the energy barrier is higher than the average energy of the electrons, those electrons in the upper tail of the distribution will spill over.
The height of this energy barrier is closely related to the threshold voltage, and so the amount of leakage current depends heavily on this voltage, increasing by about a factor of 10 each time the threshold voltage drops by another 100 mV.
Increasing the threshold voltage thus reduces leakage power, but it also makes the gates slower, because the number of electrons that can flow through the channel is roughly proportional to the difference between supply voltage and threshold voltage.
Historically, threshold voltages were so high that it was possible to scale the supply voltage, the threshold voltage, and the channel length together without subthreshold leakage becoming an issue. This concurrent scaling of supply power and linear dimensions is an essential aspect of Dennard scaling, which historically permitted to increase the number and speed of transistors exponentially without increasing overall energy consumption. But eventually the leakage started affecting overall chip power. For this reason, the threshold voltage and hence the supply voltage could no longer be scaled down as previously. Since, however, the power it takes to switch a gate is the product of the switched capacitance and the square of the supply voltage, and the overall power dissipation is the product of this with the clock frequency, something had to give as the supply voltage no longer scaled as previously. This issue now severely limits the potential to further increase the clock frequency.
Gate oxide leakage, from what I understand, also has a direct impact on transistor performance, as the current from source to drain is related to the capacitance of the gate oxide, which is related to its area and inversely related to its thickness. Historically, the reduction in thickness compensated for the reduction in area as the device was scaled down in every generation, allowing to maintain and even improve the performance. As further reductions of the gate oxide thickness have become impossible due to excessive tunnelling, the industry has resorted to using higher-k materials for the gate oxide, i.e. material with a higher dielectric constant kappa, as an alternative way of increasing the gate oxide capacitance (k and kappa are often used interchangeably in this context). However, recent roadmaps state that the success of this approach is already proving insufficient.
Overall, I believe it is pertinent to say that all kinds of leakage negatively affect performance by simply reducing the available usable power and by causing excessive heat dissipation, placing higher demands on packaging.
Two (gated) papers that describe these leakage issues in more detail (and there is a lot more detail to it) can be found here and here.
Did the industry predict these problems and their consequences?
People in the industry were well aware of these limitations, long before they actually became critical. However, whether solutions and workarounds would be found was a matter of much greater uncertainty.
Robert Dennard et al’s seminal 1974 paper Design of Ion-Implanted MOSFETs with Very Small Physical Dimensions, that described the very favourable scaling properties of MOSFET transistors and gave rise to the term “Dennard scaling”, explicitly mentions the scaling limitations posed by subthreshold leakage:
One area in which the device characteristics fail to scale is in the subthreshold or weak inversion region of the turn-on characteristic. (…) In order to design devices for operation at room temperature and above, one must accept the fact that the subthreshold behavior does not scale as desired. This nonscaling property of the subthreshold characteristic is of particular concern to miniature dynamic memory circuits which require low source-to-drain leakage currents.
This influential 1995 paper by Davari, Dennard and Shahidi presents guidelines for transistor scaling for the years up to 2004. This paper contains a subsection titled “Performance/Power Tradeoff and Nonscalability of the Threshold Voltage”, which explains the problems described above in a lot more detail than I have. The paper also mentions tunnelling through the gate oxide layer, concluding on both issues that they would remain relatively unproblematic up until 2004.
Subthreshold leakage is textbook material. The main textbook I have consulted is Digital Integrated Circuits by Jan Rabaey, and I have compared some aspects of the 1995 and 2003 editions. Both contained this sentence in the “History of…” chapter:
Interestingly enough, power consumption concerns are rapidly becoming dominant in CMOS design as well, and this time there does not seem to be a new technology around the corner to alleviate the problem.
Regarding gate oxide leakage, this comparatively very accessible 2007 article from the IEEE spectrum recounts the story of how engineers at Intel and elsewhere have developed transistors that use high-kappa dielectrics as a way of maintaining the shrinking gate oxide’s capacitance even as its thickness would cease to be reduced to prevent excessive tunnelling. According to this article, work on such solutions began in the mid-1990s, and Intel eventually launched new chips that made use of this technology in 2007. The main impression this article leaves me with is that the problem was very easy to foresee, but that finding out which solutions might work was a matter of extensive tinkering with highly unpredictable results.
The leakage issues are all mentioned in 2001 ITRS roadmap, the earliest edition that is available online. One example from the Executive Summary:
For low power logic (mainly for portable applications), the main issue is low leakage current, which is absolutely necessary in order to extend battery life. Device performance is then maximized according to the low leakage current requirements. Gate leakage current must be controlled, as well as sub-threshold leakage and junction leakage, including band-to-band tunneling. Preliminary analysis indicates that, balancing the gate leakage control requirements against performance requirements, high kappa may be required for the gate dielectric by around the year 2005.
From reading the reports, it is hard to make out whether the implications of these issues were correctly understood, and I have had to draw on a lot of other literature to get a better sense of where the industry stood on this. Getting a hold of earlier editions (the 1999 one in particular) and talking to industry insiders might shed a lot more light on the weight that was given to the different issues that were flagged as part of the “Red Brick Wall” I’ve mentioned above, i.e. as issues that had no known manufacturable solutions (I did not receive an answer to my inquiry about this from the contact person at the ITRS website). The Executive Summary of the 2001 edition states:
The 1999 ITRS warned that there was a wide range of solutions needed but unavailable to meet the technology requirements corresponding to 100 nm technology node. The 1999 ITRS edition also reported the presence of a potential “Red Brick Wall” or “100 nm Wall” (as indicated by the red cells in the technology requirements) that, by 2005, could block further scaling as predicted by Moore’s Law. However, technological progress continues to accelerate. In the process of compiling information for 2001 ITRS, it was clarified that this “Red Brick Wall” could be reached as early as 2003.
Two accessiblearticles from 2000 give a clearer impression of how this Red Brick Wall was perceived in the industry at the time. Both particularly emphasise gate oxide leakage.
As I have mentioned at the beginning, the reports up to 2005 contained highly overoptimistic projections for on-chip frequency and supply voltage, which became dramatically more pessimistic in the 2007 edition. The reports clearly state, however, that these numbers are meant as targets and are not necessarily “on the road to sure implementation”, especially where it has been highlighted that solutions were needed and not yet known. They can therefore not necessarily serve as a clear indictment of the ITRS’ predictive powers, but I remain puzzled by some of their projections and comments on these before 2007. Getting clarification on this from industry insiders was the next thing I had planned for this project before we paused it.
Specifically, tables 4c and 4d in the Overall Roadmap Technology Characteristics, found in a subsection of the Executive Summary titled Performance of Packaged Chips, contain on-chip frequency forecasts in MHz, which became dramatically more pessimistic in 2007 than they had been in the previous 3 editions. A footnote in the 2007 edition states:
after 2007, the PIDS model fundamental reduction rate of ~ −14.7% for the transistor delay results in an individual transistor frequency performance rate increase of ~17.2% per year growth. In the 2005 roadmap, the trend of the on-chip frequency was also increased at the same rate of the maximum transistor performance through 2022. Although the 17% transistor performance trend target is continued in the PIDS TWG outlook, the Design TWG has revised the long-range on-chip frequency trend to be only about 8% growth rate per year. This is to reflect recent on-chip frequency slowing trends and anticipated speed-power design tradeoffs to manage a maximum 200 watts/chip affordable power management tradeoff.
Later editions seem to have reduced the expected scaling factor even further (1.04 in the 2011 edition), but there were also changes made to the metric employed, so I am not sure how to interpret the numbers (though I would expect the scaling factor to be unaffected by those changes).
Relatedly, a paragraph in the System Drivers document titled Maximum on-chip (global) clock frequency states that the on-chip clock frequency would not continue scaling at a factor of 2 per generation for several reasons. The 2001 edition states 3 reasons for this, the 2003 and 2005 edition state 4. But only in 2007 was the limitation from maximum allowable power dissipation added to this list of reasons. This strikes me as very puzzling.
The paragraph, as it appears in the 2007 edition, is (emphasis added):
Maximum on-chip (global) clock frequency—(...) Through the 2000 ITRS, the MPU maximum on-chip clock frequency was modeled to increase by a factor of 2 per generation. Of this, approximately 1.4× was historically realized by device scaling (17%/year improvement in CV/I metric); the other 1.4× was obtained by reduction in number of logic stages in a pipeline stage (e.g., equivalent of 32 fanout-of-4 inverter (FO4 INV) delays13 at 180 nm, going to 24–26 FO4 INV delays at 130 nm). As noted in the 2001 ITRS, there are several reasons why this historical trend could not continue: 1) well-formed clock pulses cannot be generated with period below 6–8 FO4 INV delays; 2) there is increased overhead (diminishing returns) in pipelining (2–3 FO4 INV delays per flip-flop, 1–1.5 FO4 INV delays per pulse-mode latch); 3) thermal envelopes imposed by affordable packaging discourage very deep pipelining, and 4) architectural and circuit innovations increasingly defer the impact of worsening interconnect RCs (relative to devices) rather than contribute directly to frequency improvements. Recent editions of the ITRS flattened the MPU clock period at 12 FO4 INV delays at 90 nm (a plot of historical MPU clock period data is provided online at public.itrs.net), so that clock frequencies advanced only with device performance in the absence of novel circuit and architectural approaches. In 2007, we recognize the additional limitation from maximum allowable power dissipation. Modern MPU platforms have stabilized maximum power dissipation at approximately 120W due to package cost, reliability, and cooling cost issues. With a flat power requirement, the updated MPU clock frequency model starts with 4.7 GHz in 2007 and is projected to increase by a factor of at most 1.25× per technology generation, despite aggressive development and deployment of low-power design techniques.
Finally, the Overall Roadmap Technology Characteristics tables 6a and 6b (found in a subsection titled Power Supply and Power Dissipation in the Executive Summary) contains projected values of the supply power (V_{dd}) which also became dramatically more pessimistic in the 2007 edition.
I have indicated my puzzlement at these points in an email I have sent out to a number of industry insiders, then asking:
Do the 3 revisions made to the roadmap in 2007 that I’ve pointed out reflect a failure of previous editions to predict the “breakdown in the serial speed version of Moore’s Law” and the relevant issues that would cause it? Or do they merely reflect the ambitiousness and aggressiveness of the targets that were set before admitting defeat became inevitable?
I have received some very kind replies to those emails, but most have focused on the technical reasons for the “breakdown” in Dennard scaling. The only comment I have received on this last question was from Robert Dennard, who sent me a particularly thoughtful email that came with 4 attachments (which mainly provided more technical detail on transistor design, however). At the end of his email, he wrote:
I cannot comment on wishful thinking vs hard facts. Predicting the future is difficult. Betting against Moore’s Law was often a losing game.
Texas Instruments quit way to early.
Indeed, which bets it is most rational to make depends on expected payoff ratios as well as on probability estimates. This distinction between targets and mere predictions complicates the question quite a bit.
This was an interesting project, it would be great to pick it up again.
Did the industry predict these problems and their consequences?
People in the industry were well aware of these limitations, long before they actually became critical. However, whether solutions and workarounds would be found was a matter of much greater uncertainty.
Robert Dennard et al’s seminal 1974 paper Design of Ion-Implanted MOSFETs with Very Small Physical Dimensions, that described the very favourable scaling properties of MOSFET transistors and gave rise to the term “Dennard scaling”, explicitly mentions the scaling limitations posed by subthreshold leakage:
One area in which the device characteristics fail to scale is in the subthreshold or weak inversion region of the turn-on characteristic. (…) In order to design devices for operation at room temperature and above, one must accept the fact that the subthreshold behavior does not scale as desired. This nonscaling property of the subthreshold characteristic is of particular concern to miniature dynamic memory circuits which require low source-to-drain leakage currents.
This influential 1995 paper by Davari, Dennard and Shahidi presents guidelines for transistor scaling for the years up to 2004. This paper contains a subsection titled “Performance/Power Tradeoff and Nonscalability of the Threshold Voltage”, which explains the problems described above in a lot more detail than I have. The paper also mentions tunnelling through the gate oxide layer, concluding on both issues that they would remain relatively unproblematic up until 2004.
Subthreshold leakage is textbook material. The main textbook I have consulted is Digital Integrated Circuits by Jan Rabaey, and I have compared some aspects of the 1995 and 2003 editions. Both contained this sentence in the “History of…” chapter:
Interestingly enough, power consumption concerns are rapidly becoming dominant in CMOS design as well, and this time there does not seem to be a new technology around the corner to alleviate the problem.
Regarding gate oxide leakage, this comparatively very accessible 2007 article from the IEEE spectrum recounts the story of how engineers at Intel and elsewhere have developed transistors that use high-kappa dielectrics as a way of maintaining the shrinking gate oxide’s capacitance even as its thickness would cease to be reduced to prevent excessive tunnelling. According to this article, work on such solutions began in the mid-1990s, and Intel eventually launched new chips that made use of this technology in 2007. The main impression this article leaves me with is that the problem was very easy to foresee, but that finding out which solutions might work was a matter of extensive tinkering with highly unpredictable results.
The leakage issues are all mentioned in 2001 ITRS roadmap, the earliest edition that is available online. One example from the Executive Summary:
For low power logic (mainly for portable applications), the main issue is low leakage current, which is absolutely necessary in order to extend battery life. Device performance is then maximized according to the low leakage current requirements. Gate leakage current must be controlled, as well as sub-threshold leakage and junction leakage, including band-to-band tunneling. Preliminary analysis indicates that, balancing the gate leakage control requirements against performance requirements, high kappa may be required for the gate dielectric by around the year 2005.
From reading the reports, it is hard to make out whether the implications of these issues were correctly understood, and I have had to draw on a lot of other literature to get a better sense of where the industry stood on this. Getting a hold of earlier editions (the 1999 one in particular) and talking to industry insiders might shed a lot more light on the weight that was given to the different issues that were flagged as part of the “Red Brick Wall” I’ve mentioned above, i.e. as issues that had no known manufacturable solutions (I did not receive an answer to my inquiry about this from the contact person at the ITRS website). The Executive Summary of the 2001 edition states:
The 1999 ITRS warned that there was a wide range of solutions needed but unavailable to meet the technology requirements corresponding to 100 nm technology node. The 1999 ITRS edition also reported the presence of a potential “Red Brick Wall” or “100 nm Wall” (as indicated by the red cells in the technology requirements) that, by 2005, could block further scaling as predicted by Moore’s Law. However, technological progress continues to accelerate. In the process of compiling information for 2001 ITRS, it was clarified that this “Red Brick Wall” could be reached as early as 2003.
Two accessiblearticles from 2000 give a clearer impression of how this Red Brick Wall was perceived in the industry at the time. Both particularly emphasise gate oxide leakage.
When IBM bottled out of going to 4 GHz CPUs, in 2004, they had a roadmap extending to at least 5. Maybe beyond, I don’t remember, but certainly there was general loose talk in the trade rags about 10 or 20 GHz in the foreseeable.
That sounds like hindsight bias. Was this breakdown predicted in advance? Furthermore was it predicted in advance and accepted as such by the community? If a few experts predicted this and a few others predicted it wouldn’t happen, I wouldn’t classify that as a success for the inside view; unless:
A) pretty much all experts who took the inside view predicted this
B) the experts had not been predicting the same thing in the past and been proven wrong
Personally I watched this happen, but I don’t think there was any consensus or expectation in the broader community of software developers that the serial speed formulation of Moore’s Law was going to fail until it did start to fail. Perhaps people who worked on the actual silicon hardware had different expectations?
I lived through this transition too, as a software developer. At least in my circles it was obvious that serial processing speeds would hit a wall, that wall would be somewhere in the single-digit Ghz, and from then on scaling would be added concurrency.
The reason is very simple to explain: we were approaching physical limits at human scale. A typical chip today is ~2Ghz. The speed of light is a fundamental limiter in electronics, and in one cycle of a 2Ghz chip light moves only 15cm. Ideally that’s still one order of magnitude of breathing room, but in reality there are unavoidable complicating factors: electricity doesn’t move across gates at the speed of light, circuits are not straight lines, etc. If you want a chip that measures on the order of 1-2 cm in size, then you are fundamentally limited to device operations in the single-digit Ghz range.
Singularity technology like molecular nanotechnology promises faster computation through smaller devices. A molecular computer the size of a present-day computer chip would offer tremendous serial speedups for core sizes a micrometer or smaller in size, but device-wide operations would still be limited to ~2Ghz. Getting around that would require getting around Einstein.
The ITRS reports have long been the standard forecasts in the field, kind of like the IPCC reports but for semiconductors. The earliest one available online (2000) warned about (e.g.) serial speed slowdown due to problems with Dennard scaling (identified in 1974), though it’s a complicated process to think through what lessons should be drawn from the early ITRS reports. I’m having a remote researcher look through this for another project, and will try to remember to report back here when that analysis is done.
That project has been on hold for close to a month, the reason being that we wanted to focus on other things until we get hold of the 1994, 1997 and 1999 editions of the ITRS roadmap report that it should be possible to order through the ITRS website. However, we never heard back from whoever monitors the email address that you’re supposed to send the order form to, nor were we able to reach anyone else willing to sell us the documents...
I am happy to report on what I have found so far, though.
Main findings:
What I understand to be the main bottlenecks encountered in performance scaling (three different kinds of leakage current that become less manageable as transistors get smaller) were anticipated well in advance of actually becoming critical, and the time frames that were given for this were quite accurate.
The ITRS reports flagged these issues as being part of the “Red Brick Wall”, a name given to a collection of known challenges to future device scaling that had “no known manufacturable solutions”. It was well understood, then, that some aspects of device and performance scaling were in danger of hitting a wall somewhere in 2003-2005. While the ITRS reports and other sources warned that this might happen, I have seen no examples of anybody predicting that it would.
The 2001-2005 reports contain projected values for on-chip frequency and power supply voltage (V_{dd}) that were, with the benefit of hindsight, highly overoptimistic, and those same tables became dramatically more pessimistic in the 2007 edition. It must be noted, however, that the ITRS reports state that such projected values are meant as targets rather than as predictions. I am not sure to what extent this can be taken as a defence of these overly optimistic projections.
An explanation given in the 2007 edition for the pessimistic corrections made to the on-chip frequency forecasts gives the impression that earlier editions made a puzzling oversight. I may well be missing something here, as this point seems very surprising.
Such aspects as stated in the 2 previous points have left me feeling fairly puzzled about how accurately the ITRS reports can be said to have anticipated the “2004 breakdown”. I had just started contacting industry insiders when Luke instructed me to pause the project. While the replies I received confirmed that my understanding of the technical issues was on the right track, none have given any clear answer to my requests to help me make sense of these puzzling aspects of the ITRS reports.
The reasons for the breakdown as I currently understand them
Three types of leakage current have become serious issues in transistor scaling. They are subthreshold leakage, gate oxide leakage and junction leakage. All of them create serious challenges to further reducing the size of transistors. They also render further frequency scaling at historical rates impracticable. One question I have not yet been able to answer is to what extent subthreshold leakage may play a more important role than the two other kinds as far as limits to performance scaling are concerned.
Here is an image of a MOSFET, a Metal-Oxide-Semiconductor Field-Effect Transistor, the kind of transistor used in microprocessors since 1970. (The image is from Cambridge University.)
The way it’s supposed to work is: When the transistor is off, no current flows. When it is on, current flows from the “Source” (the white region marked “n+” on the left) to the “Drain” (the white region marked “n+” on the right), along a thin layer underneath the “Oxide Layer” called the “Channel” or “Inversion Layer”. No other current is supposed to flow within the transistor.
The way the current is allowed to pass through is by applying a positive voltage to the gate electrode, which creates an electric field across the oxide layer. This field repels holes (positive charges) and attracts electrons in the region marked “p-type substrate”, with the result of forming an electron-conducting channel between the source and the drain. Ideally, the current through this channel is supposed to start flowing precisely when the voltage on the gate electrode reaches the value V_{th}, for threshold voltage.
(Note that the kind of MOS transistor shown above is an “nMOS”; current integrated circuits combine nMOS with “pMOS” transistors, which are essentially the complement of nMOS in terms of positive and negative charges/p-type and n-type doping. This technology combining nMOS and pMOS transistors is known as “CMOS”, where the C stands for Complementary.)
In reality, current leaks. Substhreshold leakage flows from the source to the drain when the voltage applied to the gate electrode is lower than the threshold voltage, i.e. when the transistor is supposed to be off. Gate oxide leakage flows from the gate electrode into the body (through the oxide layer). Junction leakage flows from the source and from the drain into the body (through the source-body and the drain-body junctions).
Gate oxide leakage and junction leakage are (mainly? entirely?) a matter of charges tunnelling through the ever thinner oxide layer or junction, respectively.
Subthreshold leakage does not depend on quantum effects and has been appreciable for much longer than the other two types of leakage, although it has only started to become unmanageable around 2004. It can be thought of as an issue of electrons spilling over the energy barrier formed by the (lightly doped) body-substrate between the (highly doped) source and drain regions. The higher the temperature of the device, the higher the energy distribution of the electrons; so even if the energy barrier is higher than the average energy of the electrons, those electrons in the upper tail of the distribution will spill over.
The height of this energy barrier is closely related to the threshold voltage, and so the amount of leakage current depends heavily on this voltage, increasing by about a factor of 10 each time the threshold voltage drops by another 100 mV. Increasing the threshold voltage thus reduces leakage power, but it also makes the gates slower, because the number of electrons that can flow through the channel is roughly proportional to the difference between supply voltage and threshold voltage.
Historically, threshold voltages were so high that it was possible to scale the supply voltage, the threshold voltage, and the channel length together without subthreshold leakage becoming an issue. This concurrent scaling of supply power and linear dimensions is an essential aspect of Dennard scaling, which historically permitted to increase the number and speed of transistors exponentially without increasing overall energy consumption. But eventually the leakage started affecting overall chip power. For this reason, the threshold voltage and hence the supply voltage could no longer be scaled down as previously. Since, however, the power it takes to switch a gate is the product of the switched capacitance and the square of the supply voltage, and the overall power dissipation is the product of this with the clock frequency, something had to give as the supply voltage no longer scaled as previously. This issue now severely limits the potential to further increase the clock frequency.
Gate oxide leakage, from what I understand, also has a direct impact on transistor performance, as the current from source to drain is related to the capacitance of the gate oxide, which is related to its area and inversely related to its thickness. Historically, the reduction in thickness compensated for the reduction in area as the device was scaled down in every generation, allowing to maintain and even improve the performance. As further reductions of the gate oxide thickness have become impossible due to excessive tunnelling, the industry has resorted to using higher-k materials for the gate oxide, i.e. material with a higher dielectric constant kappa, as an alternative way of increasing the gate oxide capacitance (k and kappa are often used interchangeably in this context). However, recent roadmaps state that the success of this approach is already proving insufficient.
Overall, I believe it is pertinent to say that all kinds of leakage negatively affect performance by simply reducing the available usable power and by causing excessive heat dissipation, placing higher demands on packaging.
Two (gated) papers that describe these leakage issues in more detail (and there is a lot more detail to it) can be found here and here.
(cont’d)
(cont’d from previous comment)
Did the industry predict these problems and their consequences?
People in the industry were well aware of these limitations, long before they actually became critical. However, whether solutions and workarounds would be found was a matter of much greater uncertainty.
Robert Dennard et al’s seminal 1974 paper Design of Ion-Implanted MOSFETs with Very Small Physical Dimensions, that described the very favourable scaling properties of MOSFET transistors and gave rise to the term “Dennard scaling”, explicitly mentions the scaling limitations posed by subthreshold leakage:
This influential 1995 paper by Davari, Dennard and Shahidi presents guidelines for transistor scaling for the years up to 2004. This paper contains a subsection titled “Performance/Power Tradeoff and Nonscalability of the Threshold Voltage”, which explains the problems described above in a lot more detail than I have. The paper also mentions tunnelling through the gate oxide layer, concluding on both issues that they would remain relatively unproblematic up until 2004.
Subthreshold leakage is textbook material. The main textbook I have consulted is Digital Integrated Circuits by Jan Rabaey, and I have compared some aspects of the 1995 and 2003 editions. Both contained this sentence in the “History of…” chapter:
Regarding gate oxide leakage, this comparatively very accessible 2007 article from the IEEE spectrum recounts the story of how engineers at Intel and elsewhere have developed transistors that use high-kappa dielectrics as a way of maintaining the shrinking gate oxide’s capacitance even as its thickness would cease to be reduced to prevent excessive tunnelling. According to this article, work on such solutions began in the mid-1990s, and Intel eventually launched new chips that made use of this technology in 2007. The main impression this article leaves me with is that the problem was very easy to foresee, but that finding out which solutions might work was a matter of extensive tinkering with highly unpredictable results.
The leakage issues are all mentioned in 2001 ITRS roadmap, the earliest edition that is available online. One example from the Executive Summary:
From reading the reports, it is hard to make out whether the implications of these issues were correctly understood, and I have had to draw on a lot of other literature to get a better sense of where the industry stood on this. Getting a hold of earlier editions (the 1999 one in particular) and talking to industry insiders might shed a lot more light on the weight that was given to the different issues that were flagged as part of the “Red Brick Wall” I’ve mentioned above, i.e. as issues that had no known manufacturable solutions (I did not receive an answer to my inquiry about this from the contact person at the ITRS website). The Executive Summary of the 2001 edition states:
Two accessible articles from 2000 give a clearer impression of how this Red Brick Wall was perceived in the industry at the time. Both particularly emphasise gate oxide leakage.
(cont’d)
(cont’d from previous comment)
As I have mentioned at the beginning, the reports up to 2005 contained highly overoptimistic projections for on-chip frequency and supply voltage, which became dramatically more pessimistic in the 2007 edition. The reports clearly state, however, that these numbers are meant as targets and are not necessarily “on the road to sure implementation”, especially where it has been highlighted that solutions were needed and not yet known. They can therefore not necessarily serve as a clear indictment of the ITRS’ predictive powers, but I remain puzzled by some of their projections and comments on these before 2007. Getting clarification on this from industry insiders was the next thing I had planned for this project before we paused it.
Specifically, tables 4c and 4d in the Overall Roadmap Technology Characteristics, found in a subsection of the Executive Summary titled Performance of Packaged Chips, contain on-chip frequency forecasts in MHz, which became dramatically more pessimistic in 2007 than they had been in the previous 3 editions. A footnote in the 2007 edition states:
Later editions seem to have reduced the expected scaling factor even further (1.04 in the 2011 edition), but there were also changes made to the metric employed, so I am not sure how to interpret the numbers (though I would expect the scaling factor to be unaffected by those changes).
Relatedly, a paragraph in the System Drivers document titled Maximum on-chip (global) clock frequency states that the on-chip clock frequency would not continue scaling at a factor of 2 per generation for several reasons. The 2001 edition states 3 reasons for this, the 2003 and 2005 edition state 4. But only in 2007 was the limitation from maximum allowable power dissipation added to this list of reasons. This strikes me as very puzzling. The paragraph, as it appears in the 2007 edition, is (emphasis added):
Finally, the Overall Roadmap Technology Characteristics tables 6a and 6b (found in a subsection titled Power Supply and Power Dissipation in the Executive Summary) contains projected values of the supply power (V_{dd}) which also became dramatically more pessimistic in the 2007 edition.
I have indicated my puzzlement at these points in an email I have sent out to a number of industry insiders, then asking:
I have received some very kind replies to those emails, but most have focused on the technical reasons for the “breakdown” in Dennard scaling. The only comment I have received on this last question was from Robert Dennard, who sent me a particularly thoughtful email that came with 4 attachments (which mainly provided more technical detail on transistor design, however). At the end of his email, he wrote:
Indeed, which bets it is most rational to make depends on expected payoff ratios as well as on probability estimates. This distinction between targets and mere predictions complicates the question quite a bit.
This was an interesting project, it would be great to pick it up again.
Update: see also Erik DeBenedictis’ comments on this topic.
(cont’d from previous comment)
Did the industry predict these problems and their consequences?
People in the industry were well aware of these limitations, long before they actually became critical. However, whether solutions and workarounds would be found was a matter of much greater uncertainty.
Robert Dennard et al’s seminal 1974 paper Design of Ion-Implanted MOSFETs with Very Small Physical Dimensions, that described the very favourable scaling properties of MOSFET transistors and gave rise to the term “Dennard scaling”, explicitly mentions the scaling limitations posed by subthreshold leakage:
This influential 1995 paper by Davari, Dennard and Shahidi presents guidelines for transistor scaling for the years up to 2004. This paper contains a subsection titled “Performance/Power Tradeoff and Nonscalability of the Threshold Voltage”, which explains the problems described above in a lot more detail than I have. The paper also mentions tunnelling through the gate oxide layer, concluding on both issues that they would remain relatively unproblematic up until 2004.
Subthreshold leakage is textbook material. The main textbook I have consulted is Digital Integrated Circuits by Jan Rabaey, and I have compared some aspects of the 1995 and 2003 editions. Both contained this sentence in the “History of…” chapter:
Regarding gate oxide leakage, this comparatively very accessible 2007 article from the IEEE spectrum recounts the story of how engineers at Intel and elsewhere have developed transistors that use high-kappa dielectrics as a way of maintaining the shrinking gate oxide’s capacitance even as its thickness would cease to be reduced to prevent excessive tunnelling. According to this article, work on such solutions began in the mid-1990s, and Intel eventually launched new chips that made use of this technology in 2007. The main impression this article leaves me with is that the problem was very easy to foresee, but that finding out which solutions might work was a matter of extensive tinkering with highly unpredictable results.
The leakage issues are all mentioned in 2001 ITRS roadmap, the earliest edition that is available online. One example from the Executive Summary:
From reading the reports, it is hard to make out whether the implications of these issues were correctly understood, and I have had to draw on a lot of other literature to get a better sense of where the industry stood on this. Getting a hold of earlier editions (the 1999 one in particular) and talking to industry insiders might shed a lot more light on the weight that was given to the different issues that were flagged as part of the “Red Brick Wall” I’ve mentioned above, i.e. as issues that had no known manufacturable solutions (I did not receive an answer to my inquiry about this from the contact person at the ITRS website). The Executive Summary of the 2001 edition states:
Two accessible articles from 2000 give a clearer impression of how this Red Brick Wall was perceived in the industry at the time. Both particularly emphasise gate oxide leakage.
(cont’d)
When IBM bottled out of going to 4 GHz CPUs, in 2004, they had a roadmap extending to at least 5. Maybe beyond, I don’t remember, but certainly there was general loose talk in the trade rags about 10 or 20 GHz in the foreseeable.