I’m surprised to see this bullet being bitten. I can easily think of trivial examples against the claim, where we know the minimum complexity of simple things versus their naïve implementations, but I’m not sure what arguments there are for it. It sounds pretty wild to me honestly, I have no intuition algorithmic complexity works anything like that.
I don’t know what you mean by an “example against the claim.” I certainly agree that there is often other evidence that will improve your bet. Perhaps this is a disagreement about the term “prima facie”?
Learning that there is a very slow algorithm for a problem is often a very important indicator that a problem is solvable, and savings like 1040 to 1030 seem routine (and often have very strong similarities between the algorithms). And very often the running time of one algorithm is indeed a useful indicator for the running time of a very different approach. It’s possible we are thinking about different domains here, I’m mostly thinking of traditional algorithms (like graph problems, approximation algorithms, CSPs, etc.) scaled to input sizes where the computation cost is in this regime. Though it seems like the same is also true for ML (though I have much less data there and moreover all the examples are radically less crisp).
The chance that a 1016 parameter model unlocks AGI given a 1011 parameter model doesn’t is much larger than the chance that a 1031 parameter model unlocks AGI given a 1026 parameter model doesn’t.
This seems wrong but maybe for reasons unrelated to the matter at hand. (In general an unknown number is much more likely to lie between 1011 and 1016 than between 1026 and 1031, just as an unknown number is more likely to lie between 11 and 16 than between 26 and 31.)
I’m also unclear whether you consider this a general rule of thumb for probabilities in general, or something specific to algorithms. Would you for instance say that if there was a weak proof that we could travel interstellar with Y times better fuel energy density, then there’s a 50% chance that there’s a method derived from that method for interstellar travel with just √Y times better energy density?
I think it’s a good rule of thumb for estimating numbers in general. If you know a number is between A and B (and nothing else), where A and B are on the order of 1020, then a log-uniform distribution between A and B is a reasonable prima facie guess.
This holds whether the number is “The best you can do on the task using method X” or “The best you can do on the task using any method we can discover in 100 years” or “The best we could do on this task with a week and some duct tape” or “The mass of a random object in the universe.”
I don’t know what you mean by an “example against the claim.” I certainly agree that there is often other evidence that will improve your bet. Perhaps this is a disagreement about the term “prima facie”?
Learning that there is a very slow algorithm for a problem is often a very important indicator that a problem is solvable, and savings like 1040 to 1030 seem routine (and often have very strong similarities between the algorithms). And very often the running time of one algorithm is indeed a useful indicator for the running time of a very different approach. It’s possible we are thinking about different domains here, I’m mostly thinking of traditional algorithms (like graph problems, approximation algorithms, CSPs, etc.) scaled to input sizes where the computation cost is in this regime. Though it seems like the same is also true for ML (though I have much less data there and moreover all the examples are radically less crisp).
This seems wrong but maybe for reasons unrelated to the matter at hand. (In general an unknown number is much more likely to lie between 1011 and 1016 than between 1026 and 1031, just as an unknown number is more likely to lie between 11 and 16 than between 26 and 31.)
I think it’s a good rule of thumb for estimating numbers in general. If you know a number is between A and B (and nothing else), where A and B are on the order of 1020, then a log-uniform distribution between A and B is a reasonable prima facie guess.
This holds whether the number is “The best you can do on the task using method X” or “The best you can do on the task using any method we can discover in 100 years” or “The best we could do on this task with a week and some duct tape” or “The mass of a random object in the universe.”