Because the noise usually grows as the signal does. Consider Moore’s law for transistors per chip. Back when that number was about 10^4, the standard deviation was also small—say 10^3. Now that density is 10^8, no chips are going to be within a thousand transiators of each other, the standard deviation is much bigger (~10^7).
This means that if you’re trying to fit the curve, being off by 10^5 is a small mistake when preducting current transistor #, but a huge mistake when predicting past transistor #. It’s not rare or implausible now to find a chip with 10^5 more transistors, but back in the ’70s that difference is a huge error, impossible under an accurate model of reality.
A basic fitting function, like least squares, doesn’t take this into account. It will trade off transistors now vs. transistors in the past as if the mistakes were of exactly equal importance. To do better you have to use something like a chi squared method, where you explicitly weight the points differently based on their variance. Or fit on a log scale using the simple method, which effectively assumes that the noise is proportional to the signal.
Because the noise usually grows as the signal does. Consider Moore’s law for transistors per chip. Back when that number was about 10^4, the standard deviation was also small—say 10^3. Now that density is 10^8, no chips are going to be within a thousand transiators of each other, the standard deviation is much bigger (~10^7).
This means that if you’re trying to fit the curve, being off by 10^5 is a small mistake when preducting current transistor #, but a huge mistake when predicting past transistor #. It’s not rare or implausible now to find a chip with 10^5 more transistors, but back in the ’70s that difference is a huge error, impossible under an accurate model of reality.
A basic fitting function, like least squares, doesn’t take this into account. It will trade off transistors now vs. transistors in the past as if the mistakes were of exactly equal importance. To do better you have to use something like a chi squared method, where you explicitly weight the points differently based on their variance. Or fit on a log scale using the simple method, which effectively assumes that the noise is proportional to the signal.
That makes perfect sense. Thanks.