In terms of raw speed, Moore’s Law has broken down for at least six or eight years. Chips have continued to advance in terms of transistors per area and other metrics, but their clock speed now is roughly what it was in 2005; and while parallelisation is nice, it is much more difficult to take advantage of than plain speed advances. Take an algorithm written in 1984 and run it on the hardware of 2004, and you will get an enormous speedup with zero work; but to get a further speedup from the hardware of 2014, you have to think about how to parallelise, and that’s hard work even when it’s possible—and not every algorithm can be usefully parallelised.
Hmm, you have a point, I still use my 2004 convertible Toshiba tablet with 1.7 GHz Pentium M and 1.5GB RAM (which superficially matches today’s tablets’ specs), but I cannot use a ’2000 desktop anymore for anything.
Taking advantage of new hardware has always required changing programs to make better use of the hardware. A Pentium 4 wasn’t just a faster Pentium Pro. It had a different architecture, new instructions, different latencies and throughputs for various instructions, and vector processing extensions. To make full use of the P4′s power, people definitely had to modify their code, all the way down to the assembly level. In fact early in the release cycle there were reports of many programs actually running slower on P4s than P3s under certain conditions. Software developers and compiler designers had to force themselves to use the new and largely unfamiliar MMX/SSE instruction sets to get the most out of those new chips.
But all of this is just part of the broader trend of hardware and software evolving together. Univac programs wouldn’t be very good fits for the power of the 386, for instance. Our programming practices have evolved greatly during the last several decades. One example is that x86 programmers had to learn how to write efficient programs using a relatively small number of registers and a small number of memory accesses. This was something of a handicap, as programmers were used to memory access being roughly the same speed as register access (maybe 2-4 times slower), rather than 10 or 20 times slower (or more!) as they were on later x86 architectures. This forced the development of cache-aware algorithms and so on.
And where fast performance is really needed (HPC and servers), software developers have always had to modify their code, usually for every single chip iteration. It is very uncommon, for instance, for code written on one supercomputer to run perfectly well on a new supercomputer, without updating the code and compilers.
Anyway, it’s not just misleading to say Moore’s law has broken down for the past 6-8 years, it’s flat-out wrong. Moore’s law is about the number of transistors that can fit on a ‘single chip’, and it has indeed been going strong and keeps going strong:
When you shrink the transistors down, the distance between them gets smaller, thus you can run circuits at higher speed. This is still true today. The problem is this: In the past, when you shrunk transistors down, you could lower their operating voltage without sacrificing error rate. This is no longer the case, and it seems to be a limitation of silicon CMOS tech, and not a limitation of photolithography techniques. Thus today we have chips that are in principle capable of operating easily at up to 10 GHz or more, but they would dissipate impractical levels of power while doing so. So chips are run at speeds far lower than they can theoretically run at. The cure for this problem is to try to do the same amount of work with fewer transistors, even if it means slightly slower speeds. The payoff of using 2x fewer transistors or a task more than outweighs the disadvantage of having it run 2x slower (A 2x reduction in chip frequency/voltage results in more than 2x reduction in power usage). This is in many ways opposite of the trend we had in the late 1990′s and early 2000′s. Thus we now have hybrid architectures that use a huge number of very simple, low-transistor-count cores (like nVidia’s CUDA cores or Intel’s MIC architecture) that run at slow speed, but with very high parallelism. These architectures have made computing MUCH faster. The price of this speed increase is that now mainstream computer hardware has become parallel. So, non-HPC programmers now have to deal with issues that were traditionally only reserved for HPC programmers. Thus the huge amount of tension and anxiety that we now see in mainstream programming.
Essentially, as computing has been getting better, the average cpu you have in your laptop has become more and more like what a supercomputer of 20 years ago used to look like. As a result, it has inherited the same programming difficulties.
In terms of raw speed, Moore’s Law has broken down for at least six or eight years. Chips have continued to advance in terms of transistors per area and other metrics, but their clock speed now is roughly what it was in 2005
Moore’s Law is precisely about transistors per area, not about clock speed. So it hasn’t broken down.
Moore’s original formulation referred to transistors per area per dollar, yes. However, the same exponential growth has been seen in, for example, memory per dollar, storage per dollar, CPU cycles per second per dollar, and several others; and the phrase “Moore’s Law” has come to encompass these other doublings as well.
If it’s about all of these things, it doesn’t seem very useful to say it’s broken down if it only stops working in one of these areas and continues in the others.
The benefits of parallelization are highly depended on the task, but there are quite a lot of tasks that are very amenable to it. It’s difficult to rewrite a system from the ground up to take advantage of parallelization, but systems are designed with it in mind from the beginning, they can simply be scaled up as a larger number of processors becomes economically feasible. For quite a few algorithms, setting up parallelization is quite easy. Creating a new bitcoin block, for instance, is already a highly parallel task. As far as society-changing applications, there’s a wide variety of tasks that are very susceptible to parallelization. Certainly, human-level intelligence does not appear to require huge serial power; human neurons have a firing rate in at most a few hundred hertz. Self-driving cars, wearable computers, drones, database integration … I don’t see a need for super-fast processors for any of these.
In terms of raw speed, Moore’s Law has broken down for at least six or eight years. Chips have continued to advance in terms of transistors per area and other metrics, but their clock speed now is roughly what it was in 2005; and while parallelisation is nice, it is much more difficult to take advantage of than plain speed advances. Take an algorithm written in 1984 and run it on the hardware of 2004, and you will get an enormous speedup with zero work; but to get a further speedup from the hardware of 2014, you have to think about how to parallelise, and that’s hard work even when it’s possible—and not every algorithm can be usefully parallelised.
Hmm, you have a point, I still use my 2004 convertible Toshiba tablet with 1.7 GHz Pentium M and 1.5GB RAM (which superficially matches today’s tablets’ specs), but I cannot use a ’2000 desktop anymore for anything.
Taking advantage of new hardware has always required changing programs to make better use of the hardware. A Pentium 4 wasn’t just a faster Pentium Pro. It had a different architecture, new instructions, different latencies and throughputs for various instructions, and vector processing extensions. To make full use of the P4′s power, people definitely had to modify their code, all the way down to the assembly level. In fact early in the release cycle there were reports of many programs actually running slower on P4s than P3s under certain conditions. Software developers and compiler designers had to force themselves to use the new and largely unfamiliar MMX/SSE instruction sets to get the most out of those new chips.
But all of this is just part of the broader trend of hardware and software evolving together. Univac programs wouldn’t be very good fits for the power of the 386, for instance. Our programming practices have evolved greatly during the last several decades. One example is that x86 programmers had to learn how to write efficient programs using a relatively small number of registers and a small number of memory accesses. This was something of a handicap, as programmers were used to memory access being roughly the same speed as register access (maybe 2-4 times slower), rather than 10 or 20 times slower (or more!) as they were on later x86 architectures. This forced the development of cache-aware algorithms and so on.
And where fast performance is really needed (HPC and servers), software developers have always had to modify their code, usually for every single chip iteration. It is very uncommon, for instance, for code written on one supercomputer to run perfectly well on a new supercomputer, without updating the code and compilers.
Anyway, it’s not just misleading to say Moore’s law has broken down for the past 6-8 years, it’s flat-out wrong. Moore’s law is about the number of transistors that can fit on a ‘single chip’, and it has indeed been going strong and keeps going strong:
http://en.wikipedia.org/wiki/File:Transistor_Count_and_Moore%27s_Law_-_2011.svg
When you shrink the transistors down, the distance between them gets smaller, thus you can run circuits at higher speed. This is still true today. The problem is this: In the past, when you shrunk transistors down, you could lower their operating voltage without sacrificing error rate. This is no longer the case, and it seems to be a limitation of silicon CMOS tech, and not a limitation of photolithography techniques. Thus today we have chips that are in principle capable of operating easily at up to 10 GHz or more, but they would dissipate impractical levels of power while doing so. So chips are run at speeds far lower than they can theoretically run at. The cure for this problem is to try to do the same amount of work with fewer transistors, even if it means slightly slower speeds. The payoff of using 2x fewer transistors or a task more than outweighs the disadvantage of having it run 2x slower (A 2x reduction in chip frequency/voltage results in more than 2x reduction in power usage). This is in many ways opposite of the trend we had in the late 1990′s and early 2000′s. Thus we now have hybrid architectures that use a huge number of very simple, low-transistor-count cores (like nVidia’s CUDA cores or Intel’s MIC architecture) that run at slow speed, but with very high parallelism. These architectures have made computing MUCH faster. The price of this speed increase is that now mainstream computer hardware has become parallel. So, non-HPC programmers now have to deal with issues that were traditionally only reserved for HPC programmers. Thus the huge amount of tension and anxiety that we now see in mainstream programming.
Essentially, as computing has been getting better, the average cpu you have in your laptop has become more and more like what a supercomputer of 20 years ago used to look like. As a result, it has inherited the same programming difficulties.
Moore’s Law is precisely about transistors per area, not about clock speed. So it hasn’t broken down.
Moore’s original formulation referred to transistors per area per dollar, yes. However, the same exponential growth has been seen in, for example, memory per dollar, storage per dollar, CPU cycles per second per dollar, and several others; and the phrase “Moore’s Law” has come to encompass these other doublings as well.
If it’s about all of these things, it doesn’t seem very useful to say it’s broken down if it only stops working in one of these areas and continues in the others.
The benefits of parallelization are highly depended on the task, but there are quite a lot of tasks that are very amenable to it. It’s difficult to rewrite a system from the ground up to take advantage of parallelization, but systems are designed with it in mind from the beginning, they can simply be scaled up as a larger number of processors becomes economically feasible. For quite a few algorithms, setting up parallelization is quite easy. Creating a new bitcoin block, for instance, is already a highly parallel task. As far as society-changing applications, there’s a wide variety of tasks that are very susceptible to parallelization. Certainly, human-level intelligence does not appear to require huge serial power; human neurons have a firing rate in at most a few hundred hertz. Self-driving cars, wearable computers, drones, database integration … I don’t see a need for super-fast processors for any of these.