It takes us about 17 orders of magnitude away from the HBHL anchor, in fact. Which is not very far, when you think about it. Divide 100 percentage points of probability mass evenly across those 17 orders of magnitude, and you get almost 6% per OOM, which means something like 4x as much probability mass on the HBHL anchor than Ajeya puts on it in her report!
I don’t understand what you’re doing here. Why 17 orders of magnitude, and why would I split 100% across each order?
I don’t follow this argument. It sounds like double-counting to me
Read ‘and therefore’, not ‘and in addition’. The point is that the more you spend your compute on search, the less directly your search can exploit computationally expensive models.
Put another way, if you have HBHL compute but spend nine orders of magnitude on search, then the per-model compute is much less than HBHL, so the reasons to argue for HBHL don’t apply to it. Equivalently, if your per-model compute estimate is HBHL, then the HBHL metric is only relevant for timelines if search is fairly limited.
I’m not sure I get the distinction between enabler and substitute, or why it is relevant here. The point is that we can use compute to search for the missing special sauce. Maybe humans are still in the loop; sure.
Motors are an enabler in the context of flight research because they let you build and test designs, learn what issues to solve, build better physical models, and verify good ideas.
Motors are a substitute in the context of flight research because a better motor means more, easier, and less optimal solutions become viable.
Ajeya estimates (and I agree with her) how much compute it would take to recapitulate evolution, i.e. simulate the entire history of life on earth evolving for a billion years etc. The number she gets is 10^41 FLOP give or take a few OOMs. That’s 17 OOMs away from where we are now. So if you take 10^41 as an upper bound, and divide up the probability evenly across the OOMs… Of course it probably shouldn’t be a hard upper bound, so instead of dividing up 100 percentage points you should divide up 95 or 90 or whatever your credence is that TAI could be achieved for 10^41 or less compute. But that wouldn’t change the result much, which is that a naive, flat-across-orders-of-magnitude-up-until-the-upper-bound-is-reached distribution would assign substantially higher probability to Shorty’s position than Ajeya does.
I’m still not following the argument. I agree that you won’t be able to use your HBHL compute to do search over HBHL-sized brains+childhoods, because if you only have HBHL compute, you can only do one HBHL-sized brain+childhood. But that doesn’t undermine my point, which is that as you get more compute, you can use it to do search. So e.g. when you have 3 OOMs more compute than the HBHL milestone, you can do automated search over 1000 HBHL-sized brains+childhoods. (Also I suppose even when you only have HBHL compute you could do search over architectures and childhoods that are a little bit smaller and hope that the lessons generalize)
I think part of what might be going on here is that since Shorty’s position isn’t “TAI will happen as soon as we hit HBHL” but rather “TAI will happen shortly after we hit HBHL” there’s room for an OOM or three of extra compute beyond the HBHL to be used. (Compute costs decrease fairly quickly, and investment can increase much faster, and probably will when TAI is nigh) I agree that we can’t use compute to search for special sauce if we only have exactly HBHL compute (setting aside the paranthetica in the previous paragraph, which suggests that we can)
Well I understand now where you get the 17, but I don’t understand why you want to spread it uniformly across the orders of magnitude. Shouldn’t you put the all probability mass for the brute-force evolution approach on some gaussian around where we’d expect that to land, and only have probability elsewhere to account for competing hypotheses? Like I think it’s fair to say the probability of a ground-up evolutionary approach only using 10-100 agents is way closer to zero than to 4%.
I’m still not following the argument. [...] So e.g. when you have 3 OOMs more compute than the HBHL milestone
I think you’re mixing up my paragraphs. I was referring here to cases where you’re trying to substitute searching over programs for the AI special sauce.
If you’re in the position where searching 1000 HBHL hypotheses finds TAI, then the implicit assumption is that model scaling has already substituted for the majority of AI special sauce, and the remaining search is just an enabler for figuring out the few remaining details. That or that there wasn’t much special sauce in the first place.
To maybe make my framing a bit more transparent, consider the example of a company trying to build useful, self-replicating nanoscale robots using a atomically precise 3D printer under the conditions where 1) nobody there has a good idea of how to go about doing this, and 2) you have 1000 tries.
--I agree that for the brute-force evolution approach, we should have a gaussian around where we’d expect that to land. My “Let’s just do evenly across all the OOMs between now and evolution” is only a reasonable first-pass approach to what our all-things-considered distribution should be like, including evolution but also various other strategies. (Even better would be having a taxonomy of the various strategies and a gaussian for each; this is sorta what Ajeya does. the problem is that insofar as you don’t trust your taxonomy to be exhaustive, the resulting distribution is untrustworthy as well.) I think it’s reasonable to extend the probability mass down to where we are now, because we are currently at the HBHL milestone pretty much, which seems like a pretty relevant milestone to say the least.
If you’re in the position where searching 1000 HBHL hypotheses finds TAI, then the implicit assumption is that model scaling has already substituted for the majority of AI special sauce, and the remaining search is just an enabler for figuring out the few remaining details. That or that there wasn’t much special sauce in the first place.
This seems right to me.
To maybe make my framing a bit more transparent, consider the example of a company trying to build useful, self-replicating nanoscale robots using a atomically precise 3D printer under the conditions where 1) nobody there has a good idea of how to go about doing this, and 2) you have 1000 tries.
I like this analogy. I think our intuitions about how hard it would be might differ though. Also, our intuitions about the extent to which nobody has a good idea of how to make TAI might differ too.
Also, our intuitions about the extent to which nobody has a good idea of how to make TAI might differ too.
To be clear I’m not saying nobody has a good idea of how to make TAI. I expect pretty short timelines, because I expect the remaining fundamental challenges aren’t very big.
What I don’t expect is that the remaining fundamental challenges go away through small-N search over large architectures, if the special sauce does turn out to be significant.
I don’t understand what you’re doing here. Why 17 orders of magnitude, and why would I split 100% across each order?
Read ‘and therefore’, not ‘and in addition’. The point is that the more you spend your compute on search, the less directly your search can exploit computationally expensive models.
Put another way, if you have HBHL compute but spend nine orders of magnitude on search, then the per-model compute is much less than HBHL, so the reasons to argue for HBHL don’t apply to it. Equivalently, if your per-model compute estimate is HBHL, then the HBHL metric is only relevant for timelines if search is fairly limited.
Motors are an enabler in the context of flight research because they let you build and test designs, learn what issues to solve, build better physical models, and verify good ideas.
Motors are a substitute in the context of flight research because a better motor means more, easier, and less optimal solutions become viable.
Ajeya estimates (and I agree with her) how much compute it would take to recapitulate evolution, i.e. simulate the entire history of life on earth evolving for a billion years etc. The number she gets is 10^41 FLOP give or take a few OOMs. That’s 17 OOMs away from where we are now. So if you take 10^41 as an upper bound, and divide up the probability evenly across the OOMs… Of course it probably shouldn’t be a hard upper bound, so instead of dividing up 100 percentage points you should divide up 95 or 90 or whatever your credence is that TAI could be achieved for 10^41 or less compute. But that wouldn’t change the result much, which is that a naive, flat-across-orders-of-magnitude-up-until-the-upper-bound-is-reached distribution would assign substantially higher probability to Shorty’s position than Ajeya does.
I’m still not following the argument. I agree that you won’t be able to use your HBHL compute to do search over HBHL-sized brains+childhoods, because if you only have HBHL compute, you can only do one HBHL-sized brain+childhood. But that doesn’t undermine my point, which is that as you get more compute, you can use it to do search. So e.g. when you have 3 OOMs more compute than the HBHL milestone, you can do automated search over 1000 HBHL-sized brains+childhoods. (Also I suppose even when you only have HBHL compute you could do search over architectures and childhoods that are a little bit smaller and hope that the lessons generalize)
I think part of what might be going on here is that since Shorty’s position isn’t “TAI will happen as soon as we hit HBHL” but rather “TAI will happen shortly after we hit HBHL” there’s room for an OOM or three of extra compute beyond the HBHL to be used. (Compute costs decrease fairly quickly, and investment can increase much faster, and probably will when TAI is nigh) I agree that we can’t use compute to search for special sauce if we only have exactly HBHL compute (setting aside the paranthetica in the previous paragraph, which suggests that we can)
Well I understand now where you get the 17, but I don’t understand why you want to spread it uniformly across the orders of magnitude. Shouldn’t you put the all probability mass for the brute-force evolution approach on some gaussian around where we’d expect that to land, and only have probability elsewhere to account for competing hypotheses? Like I think it’s fair to say the probability of a ground-up evolutionary approach only using 10-100 agents is way closer to zero than to 4%.
I think you’re mixing up my paragraphs. I was referring here to cases where you’re trying to substitute searching over programs for the AI special sauce.
If you’re in the position where searching 1000 HBHL hypotheses finds TAI, then the implicit assumption is that model scaling has already substituted for the majority of AI special sauce, and the remaining search is just an enabler for figuring out the few remaining details. That or that there wasn’t much special sauce in the first place.
To maybe make my framing a bit more transparent, consider the example of a company trying to build useful, self-replicating nanoscale robots using a atomically precise 3D printer under the conditions where 1) nobody there has a good idea of how to go about doing this, and 2) you have 1000 tries.
Sorry I didn’t see this until now!
--I agree that for the brute-force evolution approach, we should have a gaussian around where we’d expect that to land. My “Let’s just do evenly across all the OOMs between now and evolution” is only a reasonable first-pass approach to what our all-things-considered distribution should be like, including evolution but also various other strategies. (Even better would be having a taxonomy of the various strategies and a gaussian for each; this is sorta what Ajeya does. the problem is that insofar as you don’t trust your taxonomy to be exhaustive, the resulting distribution is untrustworthy as well.) I think it’s reasonable to extend the probability mass down to where we are now, because we are currently at the HBHL milestone pretty much, which seems like a pretty relevant milestone to say the least.
This seems right to me.
I like this analogy. I think our intuitions about how hard it would be might differ though. Also, our intuitions about the extent to which nobody has a good idea of how to make TAI might differ too.
To be clear I’m not saying nobody has a good idea of how to make TAI. I expect pretty short timelines, because I expect the remaining fundamental challenges aren’t very big.
What I don’t expect is that the remaining fundamental challenges go away through small-N search over large architectures, if the special sauce does turn out to be significant.