Saying that “compiler technology” has only made floating point programs 8 times faster is somewhat too much of an apples-to-apples comparison. Sure, if you take the exact same Fortran program and recompile it decades later you may only see an 8x speedup (I’d have guessed 2x or 4x, myself, depending on how much the hardware benefits from vectorized instructions). But if you instead take a more modern program designed to solve the same higher-level problem, you are more likely to see a three order of magnitude speedup. Graph from the SCaLeS Report, Vol. 2, D. Keyes et. al. eds, 2004; it specifically refers to magnetohydrodynamics simulations but it’s pretty typical of a wide class of computational mathematics problems.
An AGI will presumably be able to optimize not only its own source code compilation, but also its own algorithm choices. That process will also eventually hit diminishing returns, but who knows how many orders of magnitude it could get before things plateau? The first AGI is likely to be using a lot of relatively new and suboptimal algorithms almost by definition of “first”.
I was handwaving a bit there, huh? “Some relatively new algorithm(s)” would have been true by definition; everything else needs a bit more justification:
“A lot of relatively new”: whatever makes the difference between problem-specific AI and general unknown-problem-handling AGI is going to be new. The harder these subproblems are (and I’d say they’re likely to be hard), the more difficult new algorithms are going to be required.
“suboptimal”: just by induction, what percentage of the time do we immediately hit on an optimal algorithm to solve a complicated problem, and do we expect the problems in AGI to be harder or easier than most of this reference class? Even superficially simple problems with exact solutions like sorting have a hundred algorithms whose optimality varies depending on the exact application and hardware. Hard problems with approximate solutions like uncertainty quantification are even worse. The people I know doing state-of-the-art work with Bayesian inverse problems are still mostly using (accelerated variants of) Monte Carlo, despite general agreement with that old quote about how Monte Carlo is the way you solve problems when you don’t yet know the right way to solve them.
Saying that “compiler technology” has only made floating point programs 8 times faster is somewhat too much of an apples-to-apples comparison. Sure, if you take the exact same Fortran program and recompile it decades later you may only see an 8x speedup (I’d have guessed 2x or 4x, myself, depending on how much the hardware benefits from vectorized instructions). But if you instead take a more modern program designed to solve the same higher-level problem, you are more likely to see a three order of magnitude speedup. Graph from the SCaLeS Report, Vol. 2, D. Keyes et. al. eds, 2004; it specifically refers to magnetohydrodynamics simulations but it’s pretty typical of a wide class of computational mathematics problems.
An AGI will presumably be able to optimize not only its own source code compilation, but also its own algorithm choices. That process will also eventually hit diminishing returns, but who knows how many orders of magnitude it could get before things plateau? The first AGI is likely to be using a lot of relatively new and suboptimal algorithms almost by definition of “first”.
Why?
Because optimality isn’t actually required, and humans are bad at perfection.
Yes, but non-perfect doesn’t imply that there is much room for improvement
I was handwaving a bit there, huh? “Some relatively new algorithm(s)” would have been true by definition; everything else needs a bit more justification:
“A lot of relatively new”: whatever makes the difference between problem-specific AI and general unknown-problem-handling AGI is going to be new. The harder these subproblems are (and I’d say they’re likely to be hard), the more difficult new algorithms are going to be required.
“suboptimal”: just by induction, what percentage of the time do we immediately hit on an optimal algorithm to solve a complicated problem, and do we expect the problems in AGI to be harder or easier than most of this reference class? Even superficially simple problems with exact solutions like sorting have a hundred algorithms whose optimality varies depending on the exact application and hardware. Hard problems with approximate solutions like uncertainty quantification are even worse. The people I know doing state-of-the-art work with Bayesian inverse problems are still mostly using (accelerated variants of) Monte Carlo, despite general agreement with that old quote about how Monte Carlo is the way you solve problems when you don’t yet know the right way to solve them.