In the case of fusion, it certainly seems that control is a key variable, at least in retrospect—since we’ve had temperature and pressure equal to the sun for a while.
To get this out of the way, I expect that fusion progress is in fact predominantly determined by temperature and pressure (and factors like that that go into the Q factor), and expect that issues with control won’t seem very relevant to long-run timelines in retrospect. It’s true that we’ve had temperature and pressure equal to the sun for a while, but it’s also true that low-yield fusion is pretty easy. The missing piece to that cannot simply be control, since even a perfectly controlled ounce of a replica sun is not going to produce much energy. Rather, we just have a higher bar to cross before we get yield.
They need to argue that a.) X is probably necessary for TAI, and b.) X probably won’t arrive shortly after the other variables are achieved. I think most of the arguments I am calling bogus cannot be rephrased in this way to achieve a and b, or if they can, I haven’t seen it done yet.
While I’ve seen arguments about the complexity of neuron wiring and function, the argument has rarely been ‘and therefore we need a more exact diagram to capture the human thought processes so we can replicate it’, as much as ‘and therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
This argument refutes that in its naïve direct form, because, as you say, nature would add complexity irrespective of necessity, even for marginal gains. But if you allow for fusion to say, well, the simple model isn’t working out, so let’s add [miscellaneous complexity term], as long as it’s not directly in analogy to nature, then why can’t AI Longs say, well, GPT-3 clearly isn’t capturing certain facets of cognition, and scaling doesn’t immediately seem to be fixing that, so let’s add [miscellaneous complexity term] too? Hence, ‘and therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
I don’t think we necessarily disagree on much wrt. grounded arguments about AI, but I think if one of the key arguments (‘Part 1: Extra brute force can make the problem a lot easier’) is that certain driving forces are fungible, and can trade-off for complexity, then it seems like cases where that doesn’t hold (eg. your model of fusion) would be evidence against the argument’s generality. Because we don’t really know how intelligence works, it seems that either you need to have a lot of belief in this class of argument (which is the case for me), or you need to be very careful applying it to this domain.
I expect that fusion progress is in fact predominantly determined by temperature and pressure (and factors like that that go into the Q factor), and expect that issues with control won’t seem very relevant to long-run timelines in retrospect. It’s true that we’ve had temperature and pressure equal to the sun for a while, but it’s also true that low-yield fusion is pretty easy. The missing piece to that cannot simply be control, since even a perfectly controlled ounce of a replica sun is not going to produce much energy. Rather, we just have a higher bar to cross before we get yield. In fusion, you can use temperature and pressure to trade off against control issues. This is most clearly illustrated in hydrogen bombs. In fact, there is little in-principle reason you couldn’t use hydrogen bombs to heat water to power a turbine, even if it’s not the most politically or economically sensible design.
OK, then in that case I feel like the case of fusion is totally not a counterexample-precedent to Shorty’s methodology, because the Sun is just not at all analogous to what we are trying to do with fusion power generation. I’m surprised and intrigued to hear that control isn’t a big deal. I assume you know more about fusion than me so I’m deferring to you.
While I’ve seen arguments about the complexity of neuron wiring and function, the argument has rarely been ‘and therefore we need a more exact diagram to capture the human thought processes so we can replicate it’, as much as ‘and therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
This argument refutes that in its naïve direct form, because, as you say, nature would add complexity irrespective of necessity, even for marginal gains.
Then we agree, at least on the main point of this paper, which was indeed just to refute this sort of argument, which I heard surprisingly often. Just because the brain is complex mysterious etc. doesn’t mean ‘therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
But if you allow for fusion to say, well, the simple model isn’t working out, so let’s add [miscellaneous complexity term], as long as it’s not directly in analogy to nature, then why can’t AI Longs say, well, GPT-3 clearly isn’t capturing certain facets of cognition, and scaling doesn’t immediately seem to be fixing that, so let’s add [miscellaneous complexity term] too? Hence, ‘and therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
I called that complexity term “Special sauce.” I have not in this post argued that the amount of special sauce needed is small; I left open the possibility that it might be large. The precedent of birds and planes is evidence that necessary special sauce can be small even in situations where one might think it is large, but like I said, it’s just one case, so we shouldn’t update too strongly based on it. Maybe we can find other cases in which necessary special sauce does seem to be big. Maybe fusion is such a case, though as described above, it’s unclear—it seems like you are saying that we just haven’t reached enough temperature and pressure yet to get viable fusion? In which case fusion isn’t an example of lots of special sauce being needed after all.
I don’t think we necessarily disagree on much wrt. grounded arguments about AI, but I think if one of the key arguments (‘Part 1: Extra brute force can make the problem a lot easier’) is that certain driving forces are fungible, and can trade-off for complexity, then it seems like cases where that doesn’t hold (eg. your model of fusion) would be evidence against the argument’s generality. Because we don’t really know how intelligence works, it seems that either you need to have a lot of belief in this class of argument (which is the case for me), or you need to be very careful applying it to this domain.
I’m not sure I followed this paragraph. Are you saying that you think that, in general, there are key variables for any particular design problem which make the problem easier as they are scaled up? But that I shouldn’t think that, given what I erroneously thought about fusion?
I am by no means an expert on fusion power, I’ve just been loosely following the field after the recent bunch of fusion startups, a significant fraction of which seem to have come about precisely because HTS magnets significantly shifted the field strength you can achieve at practical sizes. Control and instabilities are absolutely a real practical concern, as are a bunch of other things like neutron damage; my expectation is only that they are second-order difficulties in the long run, much like wing shape was a second-order difficulty for flight. My framing is largely shaped by this MIT talk (here’s another, here’s their startup).
I called that complexity term “Special sauce.” I have not in this post argued that the amount of special sauce needed is small; I left open the possibility that it might be large.
I’m probably just wanting the article to be something it’s not then!
I’ll try to clarify my point about key variables. The real-world debate of short versus long AI timelines pretty much boils down to the question of whether the techniques we have for AI capture enough of cognition, that short-term future prospects (scaling and research both) end up capturing enough of the important ones for TAI.
It’s pretty obvious that GPT-3 doesn’t do some things we’d expect a generally intelligent agent to do, and it also seems to me (and seems to be a commonality among skeptics) that we don’t have enough of a grounded understanding of intelligence to expect to fill in these pieces from first principles, at least in the short term. Which means the question boils down to ‘can we buy these capabilities with other things we do have, particularly the increasing scale of computation, and by iterating on ideas?’
Flight is a clear case where, as you’ve said, you can trade the one variable (power-to-weight) to make up for inefficiencies and deficiencies in the other aspects. I expect fusion is another. A case where this doesn’t seem to be clearly the case is in building useful, self-replicating nanoscale robots to manufacture things, in analogy to cells and microorganisms. Lithography and biotech have given us good tools for building small objects with defined patterns, but there seems to be a lot of fundamental complexity to the task that can’t easily be solved by this. Even if we could fabricate a cubic millimeter of matter with every atom precisely positioned, it’s not clear how much of the gap this would close. There is an issue here with trading off scale and manufacturing to substitute for complexity and the things we don’t understand.
‘Part 1: Extra brute force can make the problem a lot easier’ says that you can do this sort of trade for AI, and it justifies this in part by drawing analogy to flight. But it’s hard to see what intrinsically motivates this comparison specifically, because trading off a motor’s power-to-weight ratio for physical upness is very different to trading off a computer’s FLOP rate for abstract thinkingness. I assumed you did this because you believed (as I do) that this sort of argument is general. Hence, a general argument should apply generally, so unless there’s something special about fusion, it should apply there too. If you don’t believe it’s a general sort of argument, then why the comparison to flight, rather than to useful, self-replicating nanoscale robots?
If instead you’re just drawing comparison to flight to say it’s potentially possible that compute is fungible with complexity, rather than it being likely, then it just seems like not a very impactful argument.
Thanks again for the detailed reply; I feel like I’m coming to understand you (and fusion!) much better.
You may indeed be hoping the OP is something it’s not.
That said, I think I have more to say in agreement with your strong position:
There is an issue here with trading off scale and manufacturing to substitute for complexity and the things we don’t understand.
‘Part 1: Extra brute force can make the problem a lot easier’ says that you can do this sort of trade for AI, and it justifies this in part by drawing analogy to flight. But it’s hard to see what intrinsically motivates this comparison specifically, because trading off a motor’s power-to-weight ratio for physical upness is very different to trading off a computer’s FLOP rate for abstract thinkingness. I assumed you did this because you believed (as I do) that this sort of argument is general. Hence, a general argument should apply generally, so unless there’s something special about fusion, it should apply there too. If you don’t believe it’s a general sort of argument, then why the comparison to flight, rather than to useful, self-replicating nanoscale robots?
If instead you’re just drawing comparison to flight to say it’s potentially possible that compute is fungible with complexity, rather than it being likely, then it just seems like not a very impactful argument.
1. I don’t know enough about nanotech to say whether it’s a counterexample to Shorty’s position Currently I suspect it isn’t. This is a separate issue from the issue you raise, which is whether it’s a counterexample to the position “In general, you can substitute brute force in some variables for special sauce.” Call this position the strong view.
2. I’m not sure whether I hold the strong view. I certainly didn’t try to argue for it in the OP (though I did present a small amount of evidence for it I suppose.)
3. I do hold the strong-view-applied-to-AI. That is, I do think we can make the problem of building TAI easier by using more compute. (As you say, compute is fungible with complexity). I gave two reasons for this in the OP: Can scale up the key variables, and can use compute to automate the search for special sauce. I think both of these reasons are solid on their own; I don’t need to appeal to historical case studies to justify them.
4. I am happy to expand on both arguments if you like. I think the “can use compute to automate search for special sauce” is pretty self-explanatory. The “can scale up the key variables” thing is based on deep learning theory as I understand it, which is that bigger neural nets work by containing more and better lottery tickets (and you need longer to train to isolate and promote those tickets from the sludge of competitor subnetworks?). And neural networks are universal function approximators. So whatever skill it is that humans do and that you are trying to get an AI to do, with a big enough neural net trained on enough data, you’ll succeed. And “big enough” means probably about the size of the human brain. This is just the sketch of a skeleton of an argument of course, but I could go on...
Thanks, I think I pretty much understand your framing now.
I think the only thing I really disagree with is that “”can use compute to automate search for special sauce” is pretty self-explanatory.” I think this heavily depends on what sort of variable you expect the special sauce to be. Eg. for useful, self-replicating nanoscale robots, my hypothetical atomic manufacturing technology would enable rapid automated iteration, but it’s unclear how you could use that to automatically search for a solution in practice. It’s an enabler for research, moreso than a substitute. Personally I’m not sure how I’d justify that claim for AI without importing a whole bunch of background knowledge of the generality of optimization procedures!
IIUC this is mostly outside the scope of what your article was about, and we don’t disagree on the meat of the matter, so I’m happy to leave this here.
I think I agree that it’s not clear compute can be used to search for special sauce in general, but in the case of AI it seems pretty clear to me: AIs themselves run in computers, and the capabilities we are interested in (some of them, at least) can be detected on AIs in simulations (no need for e.g. robotic bodies) and so we can do trial-and-error on our AI designs in proportion to how much compute we have. More compute, more trial-and-error. (Except it’s more efficient than mere trial-and-error, we have access to all sorts of learning and meta-learning and architecture search algorithms, not to mention human insight). If you had enough compute, you could just simulate the entire history of life evolving on an earth-sized planet for a billion years, in a very detailed and realistic physics environment!
Eventually the conclusion holds trivially, sure, but that takes us very far from the HBHL anchor. Most evolutionary algorithms we do today are very constrained in what programs they can generate, and are run over small models for a small number of iteration steps. A more general search would be exponentially slower, and even more disconnected from current ML. If you expect that sort of research to be pulling a lot of weight, you probably shouldn’t expect the result to look like large connectionist models trained on lots of data, and you lose most of the argument for anchoring to HBHL.
A more standard framing is that ‘we can do trial-and-error on our AI designs’, but there we’re again in a regime where scale is an enabler for research, moreso than a substitute for it. Architecture search will still fine-tune and validate these ideas, but is less likely to drive them directly in a significant way.
Eventually the conclusion holds trivially, sure, but that takes us very far from the HBHL anchor.
It takes us about 17 orders of magnitude away from the HBHL anchor, in fact. Which is not very far, when you think about it. Divide 100 percentage points of probability mass evenly across those 17 orders of magnitude, and you get almost 6% per OOM, which means something like 4x as much probability mass on the HBHL anchor than Ajeya puts on it in her report!
If you expect that sort of research to be pulling a lot of weight, you probably shouldn’t expect the result to look like large connectionist models trained on lots of data, and you lose most of the argument for anchoring to HBHL.
I don’t follow this argument. It sounds like double-counting to me, like: “If you put some of your probability mass away from HBHL, that means you are less confident that AI will be made in the HBHL-like way, which means you should have even less of your probability mass on HBHL.”
A more standard framing is that ‘we can do trial-and-error on our AI designs’, but there we’re again in a regime where scale is an enabler for research, moreso than a substitute for it. Architecture search will still fine-tune and validate these ideas, but is less likely to drive them directly in a significant way.
I’m not sure I get the distinction between enabler and substitute, or why it is relevant here. The point is that we can use compute to search for the missing special sauce. Maybe humans are still in the loop; sure.
It takes us about 17 orders of magnitude away from the HBHL anchor, in fact. Which is not very far, when you think about it. Divide 100 percentage points of probability mass evenly across those 17 orders of magnitude, and you get almost 6% per OOM, which means something like 4x as much probability mass on the HBHL anchor than Ajeya puts on it in her report!
I don’t understand what you’re doing here. Why 17 orders of magnitude, and why would I split 100% across each order?
I don’t follow this argument. It sounds like double-counting to me
Read ‘and therefore’, not ‘and in addition’. The point is that the more you spend your compute on search, the less directly your search can exploit computationally expensive models.
Put another way, if you have HBHL compute but spend nine orders of magnitude on search, then the per-model compute is much less than HBHL, so the reasons to argue for HBHL don’t apply to it. Equivalently, if your per-model compute estimate is HBHL, then the HBHL metric is only relevant for timelines if search is fairly limited.
I’m not sure I get the distinction between enabler and substitute, or why it is relevant here. The point is that we can use compute to search for the missing special sauce. Maybe humans are still in the loop; sure.
Motors are an enabler in the context of flight research because they let you build and test designs, learn what issues to solve, build better physical models, and verify good ideas.
Motors are a substitute in the context of flight research because a better motor means more, easier, and less optimal solutions become viable.
Ajeya estimates (and I agree with her) how much compute it would take to recapitulate evolution, i.e. simulate the entire history of life on earth evolving for a billion years etc. The number she gets is 10^41 FLOP give or take a few OOMs. That’s 17 OOMs away from where we are now. So if you take 10^41 as an upper bound, and divide up the probability evenly across the OOMs… Of course it probably shouldn’t be a hard upper bound, so instead of dividing up 100 percentage points you should divide up 95 or 90 or whatever your credence is that TAI could be achieved for 10^41 or less compute. But that wouldn’t change the result much, which is that a naive, flat-across-orders-of-magnitude-up-until-the-upper-bound-is-reached distribution would assign substantially higher probability to Shorty’s position than Ajeya does.
I’m still not following the argument. I agree that you won’t be able to use your HBHL compute to do search over HBHL-sized brains+childhoods, because if you only have HBHL compute, you can only do one HBHL-sized brain+childhood. But that doesn’t undermine my point, which is that as you get more compute, you can use it to do search. So e.g. when you have 3 OOMs more compute than the HBHL milestone, you can do automated search over 1000 HBHL-sized brains+childhoods. (Also I suppose even when you only have HBHL compute you could do search over architectures and childhoods that are a little bit smaller and hope that the lessons generalize)
I think part of what might be going on here is that since Shorty’s position isn’t “TAI will happen as soon as we hit HBHL” but rather “TAI will happen shortly after we hit HBHL” there’s room for an OOM or three of extra compute beyond the HBHL to be used. (Compute costs decrease fairly quickly, and investment can increase much faster, and probably will when TAI is nigh) I agree that we can’t use compute to search for special sauce if we only have exactly HBHL compute (setting aside the paranthetica in the previous paragraph, which suggests that we can)
Well I understand now where you get the 17, but I don’t understand why you want to spread it uniformly across the orders of magnitude. Shouldn’t you put the all probability mass for the brute-force evolution approach on some gaussian around where we’d expect that to land, and only have probability elsewhere to account for competing hypotheses? Like I think it’s fair to say the probability of a ground-up evolutionary approach only using 10-100 agents is way closer to zero than to 4%.
I’m still not following the argument. [...] So e.g. when you have 3 OOMs more compute than the HBHL milestone
I think you’re mixing up my paragraphs. I was referring here to cases where you’re trying to substitute searching over programs for the AI special sauce.
If you’re in the position where searching 1000 HBHL hypotheses finds TAI, then the implicit assumption is that model scaling has already substituted for the majority of AI special sauce, and the remaining search is just an enabler for figuring out the few remaining details. That or that there wasn’t much special sauce in the first place.
To maybe make my framing a bit more transparent, consider the example of a company trying to build useful, self-replicating nanoscale robots using a atomically precise 3D printer under the conditions where 1) nobody there has a good idea of how to go about doing this, and 2) you have 1000 tries.
--I agree that for the brute-force evolution approach, we should have a gaussian around where we’d expect that to land. My “Let’s just do evenly across all the OOMs between now and evolution” is only a reasonable first-pass approach to what our all-things-considered distribution should be like, including evolution but also various other strategies. (Even better would be having a taxonomy of the various strategies and a gaussian for each; this is sorta what Ajeya does. the problem is that insofar as you don’t trust your taxonomy to be exhaustive, the resulting distribution is untrustworthy as well.) I think it’s reasonable to extend the probability mass down to where we are now, because we are currently at the HBHL milestone pretty much, which seems like a pretty relevant milestone to say the least.
If you’re in the position where searching 1000 HBHL hypotheses finds TAI, then the implicit assumption is that model scaling has already substituted for the majority of AI special sauce, and the remaining search is just an enabler for figuring out the few remaining details. That or that there wasn’t much special sauce in the first place.
This seems right to me.
To maybe make my framing a bit more transparent, consider the example of a company trying to build useful, self-replicating nanoscale robots using a atomically precise 3D printer under the conditions where 1) nobody there has a good idea of how to go about doing this, and 2) you have 1000 tries.
I like this analogy. I think our intuitions about how hard it would be might differ though. Also, our intuitions about the extent to which nobody has a good idea of how to make TAI might differ too.
Also, our intuitions about the extent to which nobody has a good idea of how to make TAI might differ too.
To be clear I’m not saying nobody has a good idea of how to make TAI. I expect pretty short timelines, because I expect the remaining fundamental challenges aren’t very big.
What I don’t expect is that the remaining fundamental challenges go away through small-N search over large architectures, if the special sauce does turn out to be significant.
To get this out of the way, I expect that fusion progress is in fact predominantly determined by temperature and pressure (and factors like that that go into the Q factor), and expect that issues with control won’t seem very relevant to long-run timelines in retrospect. It’s true that we’ve had temperature and pressure equal to the sun for a while, but it’s also true that low-yield fusion is pretty easy. The missing piece to that cannot simply be control, since even a perfectly controlled ounce of a replica sun is not going to produce much energy. Rather, we just have a higher bar to cross before we get yield.
In fusion, you can use temperature and pressure to trade off against control issues. This is most clearly illustrated in hydrogen bombs. In fact, there is little in-principle reason you couldn’t use hydrogen bombs to heat water to power a turbine, even if it’s not the most politically or economically sensible design.
While I’ve seen arguments about the complexity of neuron wiring and function, the argument has rarely been ‘and therefore we need a more exact diagram to capture the human thought processes so we can replicate it’, as much as ‘and therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
This argument refutes that in its naïve direct form, because, as you say, nature would add complexity irrespective of necessity, even for marginal gains. But if you allow for fusion to say, well, the simple model isn’t working out, so let’s add [miscellaneous complexity term], as long as it’s not directly in analogy to nature, then why can’t AI Longs say, well, GPT-3 clearly isn’t capturing certain facets of cognition, and scaling doesn’t immediately seem to be fixing that, so let’s add [miscellaneous complexity term] too? Hence, ‘and therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
I don’t think we necessarily disagree on much wrt. grounded arguments about AI, but I think if one of the key arguments (‘Part 1: Extra brute force can make the problem a lot easier’) is that certain driving forces are fungible, and can trade-off for complexity, then it seems like cases where that doesn’t hold (eg. your model of fusion) would be evidence against the argument’s generality. Because we don’t really know how intelligence works, it seems that either you need to have a lot of belief in this class of argument (which is the case for me), or you need to be very careful applying it to this domain.
OK, then in that case I feel like the case of fusion is totally not a counterexample-precedent to Shorty’s methodology, because the Sun is just not at all analogous to what we are trying to do with fusion power generation. I’m surprised and intrigued to hear that control isn’t a big deal. I assume you know more about fusion than me so I’m deferring to you.
Then we agree, at least on the main point of this paper, which was indeed just to refute this sort of argument, which I heard surprisingly often. Just because the brain is complex mysterious etc. doesn’t mean ‘therefore intelligence is likely to rely on a lot of specialized machinery and hardcoded knowledge.’
I called that complexity term “Special sauce.” I have not in this post argued that the amount of special sauce needed is small; I left open the possibility that it might be large. The precedent of birds and planes is evidence that necessary special sauce can be small even in situations where one might think it is large, but like I said, it’s just one case, so we shouldn’t update too strongly based on it. Maybe we can find other cases in which necessary special sauce does seem to be big. Maybe fusion is such a case, though as described above, it’s unclear—it seems like you are saying that we just haven’t reached enough temperature and pressure yet to get viable fusion? In which case fusion isn’t an example of lots of special sauce being needed after all.
I’m not sure I followed this paragraph. Are you saying that you think that, in general, there are key variables for any particular design problem which make the problem easier as they are scaled up? But that I shouldn’t think that, given what I erroneously thought about fusion?
I am by no means an expert on fusion power, I’ve just been loosely following the field after the recent bunch of fusion startups, a significant fraction of which seem to have come about precisely because HTS magnets significantly shifted the field strength you can achieve at practical sizes. Control and instabilities are absolutely a real practical concern, as are a bunch of other things like neutron damage; my expectation is only that they are second-order difficulties in the long run, much like wing shape was a second-order difficulty for flight. My framing is largely shaped by this MIT talk (here’s another, here’s their startup).
I’m probably just wanting the article to be something it’s not then!
I’ll try to clarify my point about key variables. The real-world debate of short versus long AI timelines pretty much boils down to the question of whether the techniques we have for AI capture enough of cognition, that short-term future prospects (scaling and research both) end up capturing enough of the important ones for TAI.
It’s pretty obvious that GPT-3 doesn’t do some things we’d expect a generally intelligent agent to do, and it also seems to me (and seems to be a commonality among skeptics) that we don’t have enough of a grounded understanding of intelligence to expect to fill in these pieces from first principles, at least in the short term. Which means the question boils down to ‘can we buy these capabilities with other things we do have, particularly the increasing scale of computation, and by iterating on ideas?’
Flight is a clear case where, as you’ve said, you can trade the one variable (power-to-weight) to make up for inefficiencies and deficiencies in the other aspects. I expect fusion is another. A case where this doesn’t seem to be clearly the case is in building useful, self-replicating nanoscale robots to manufacture things, in analogy to cells and microorganisms. Lithography and biotech have given us good tools for building small objects with defined patterns, but there seems to be a lot of fundamental complexity to the task that can’t easily be solved by this. Even if we could fabricate a cubic millimeter of matter with every atom precisely positioned, it’s not clear how much of the gap this would close. There is an issue here with trading off scale and manufacturing to substitute for complexity and the things we don’t understand.
‘Part 1: Extra brute force can make the problem a lot easier’ says that you can do this sort of trade for AI, and it justifies this in part by drawing analogy to flight. But it’s hard to see what intrinsically motivates this comparison specifically, because trading off a motor’s power-to-weight ratio for physical upness is very different to trading off a computer’s FLOP rate for abstract thinkingness. I assumed you did this because you believed (as I do) that this sort of argument is general. Hence, a general argument should apply generally, so unless there’s something special about fusion, it should apply there too. If you don’t believe it’s a general sort of argument, then why the comparison to flight, rather than to useful, self-replicating nanoscale robots?
If instead you’re just drawing comparison to flight to say it’s potentially possible that compute is fungible with complexity, rather than it being likely, then it just seems like not a very impactful argument.
Thanks again for the detailed reply; I feel like I’m coming to understand you (and fusion!) much better.
You may indeed be hoping the OP is something it’s not.
That said, I think I have more to say in agreement with your strong position:
1. I don’t know enough about nanotech to say whether it’s a counterexample to Shorty’s position Currently I suspect it isn’t. This is a separate issue from the issue you raise, which is whether it’s a counterexample to the position “In general, you can substitute brute force in some variables for special sauce.” Call this position the strong view.
2. I’m not sure whether I hold the strong view. I certainly didn’t try to argue for it in the OP (though I did present a small amount of evidence for it I suppose.)
3. I do hold the strong-view-applied-to-AI. That is, I do think we can make the problem of building TAI easier by using more compute. (As you say, compute is fungible with complexity). I gave two reasons for this in the OP: Can scale up the key variables, and can use compute to automate the search for special sauce. I think both of these reasons are solid on their own; I don’t need to appeal to historical case studies to justify them.
4. I am happy to expand on both arguments if you like. I think the “can use compute to automate search for special sauce” is pretty self-explanatory. The “can scale up the key variables” thing is based on deep learning theory as I understand it, which is that bigger neural nets work by containing more and better lottery tickets (and you need longer to train to isolate and promote those tickets from the sludge of competitor subnetworks?). And neural networks are universal function approximators. So whatever skill it is that humans do and that you are trying to get an AI to do, with a big enough neural net trained on enough data, you’ll succeed. And “big enough” means probably about the size of the human brain. This is just the sketch of a skeleton of an argument of course, but I could go on...
Thanks, I think I pretty much understand your framing now.
I think the only thing I really disagree with is that “”can use compute to automate search for special sauce” is pretty self-explanatory.” I think this heavily depends on what sort of variable you expect the special sauce to be. Eg. for useful, self-replicating nanoscale robots, my hypothetical atomic manufacturing technology would enable rapid automated iteration, but it’s unclear how you could use that to automatically search for a solution in practice. It’s an enabler for research, moreso than a substitute. Personally I’m not sure how I’d justify that claim for AI without importing a whole bunch of background knowledge of the generality of optimization procedures!
IIUC this is mostly outside the scope of what your article was about, and we don’t disagree on the meat of the matter, so I’m happy to leave this here.
I think I agree that it’s not clear compute can be used to search for special sauce in general, but in the case of AI it seems pretty clear to me: AIs themselves run in computers, and the capabilities we are interested in (some of them, at least) can be detected on AIs in simulations (no need for e.g. robotic bodies) and so we can do trial-and-error on our AI designs in proportion to how much compute we have. More compute, more trial-and-error. (Except it’s more efficient than mere trial-and-error, we have access to all sorts of learning and meta-learning and architecture search algorithms, not to mention human insight). If you had enough compute, you could just simulate the entire history of life evolving on an earth-sized planet for a billion years, in a very detailed and realistic physics environment!
Eventually the conclusion holds trivially, sure, but that takes us very far from the HBHL anchor. Most evolutionary algorithms we do today are very constrained in what programs they can generate, and are run over small models for a small number of iteration steps. A more general search would be exponentially slower, and even more disconnected from current ML. If you expect that sort of research to be pulling a lot of weight, you probably shouldn’t expect the result to look like large connectionist models trained on lots of data, and you lose most of the argument for anchoring to HBHL.
A more standard framing is that ‘we can do trial-and-error on our AI designs’, but there we’re again in a regime where scale is an enabler for research, moreso than a substitute for it. Architecture search will still fine-tune and validate these ideas, but is less likely to drive them directly in a significant way.
It takes us about 17 orders of magnitude away from the HBHL anchor, in fact. Which is not very far, when you think about it. Divide 100 percentage points of probability mass evenly across those 17 orders of magnitude, and you get almost 6% per OOM, which means something like 4x as much probability mass on the HBHL anchor than Ajeya puts on it in her report!
I don’t follow this argument. It sounds like double-counting to me, like: “If you put some of your probability mass away from HBHL, that means you are less confident that AI will be made in the HBHL-like way, which means you should have even less of your probability mass on HBHL.”
I’m not sure I get the distinction between enabler and substitute, or why it is relevant here. The point is that we can use compute to search for the missing special sauce. Maybe humans are still in the loop; sure.
I don’t understand what you’re doing here. Why 17 orders of magnitude, and why would I split 100% across each order?
Read ‘and therefore’, not ‘and in addition’. The point is that the more you spend your compute on search, the less directly your search can exploit computationally expensive models.
Put another way, if you have HBHL compute but spend nine orders of magnitude on search, then the per-model compute is much less than HBHL, so the reasons to argue for HBHL don’t apply to it. Equivalently, if your per-model compute estimate is HBHL, then the HBHL metric is only relevant for timelines if search is fairly limited.
Motors are an enabler in the context of flight research because they let you build and test designs, learn what issues to solve, build better physical models, and verify good ideas.
Motors are a substitute in the context of flight research because a better motor means more, easier, and less optimal solutions become viable.
Ajeya estimates (and I agree with her) how much compute it would take to recapitulate evolution, i.e. simulate the entire history of life on earth evolving for a billion years etc. The number she gets is 10^41 FLOP give or take a few OOMs. That’s 17 OOMs away from where we are now. So if you take 10^41 as an upper bound, and divide up the probability evenly across the OOMs… Of course it probably shouldn’t be a hard upper bound, so instead of dividing up 100 percentage points you should divide up 95 or 90 or whatever your credence is that TAI could be achieved for 10^41 or less compute. But that wouldn’t change the result much, which is that a naive, flat-across-orders-of-magnitude-up-until-the-upper-bound-is-reached distribution would assign substantially higher probability to Shorty’s position than Ajeya does.
I’m still not following the argument. I agree that you won’t be able to use your HBHL compute to do search over HBHL-sized brains+childhoods, because if you only have HBHL compute, you can only do one HBHL-sized brain+childhood. But that doesn’t undermine my point, which is that as you get more compute, you can use it to do search. So e.g. when you have 3 OOMs more compute than the HBHL milestone, you can do automated search over 1000 HBHL-sized brains+childhoods. (Also I suppose even when you only have HBHL compute you could do search over architectures and childhoods that are a little bit smaller and hope that the lessons generalize)
I think part of what might be going on here is that since Shorty’s position isn’t “TAI will happen as soon as we hit HBHL” but rather “TAI will happen shortly after we hit HBHL” there’s room for an OOM or three of extra compute beyond the HBHL to be used. (Compute costs decrease fairly quickly, and investment can increase much faster, and probably will when TAI is nigh) I agree that we can’t use compute to search for special sauce if we only have exactly HBHL compute (setting aside the paranthetica in the previous paragraph, which suggests that we can)
Well I understand now where you get the 17, but I don’t understand why you want to spread it uniformly across the orders of magnitude. Shouldn’t you put the all probability mass for the brute-force evolution approach on some gaussian around where we’d expect that to land, and only have probability elsewhere to account for competing hypotheses? Like I think it’s fair to say the probability of a ground-up evolutionary approach only using 10-100 agents is way closer to zero than to 4%.
I think you’re mixing up my paragraphs. I was referring here to cases where you’re trying to substitute searching over programs for the AI special sauce.
If you’re in the position where searching 1000 HBHL hypotheses finds TAI, then the implicit assumption is that model scaling has already substituted for the majority of AI special sauce, and the remaining search is just an enabler for figuring out the few remaining details. That or that there wasn’t much special sauce in the first place.
To maybe make my framing a bit more transparent, consider the example of a company trying to build useful, self-replicating nanoscale robots using a atomically precise 3D printer under the conditions where 1) nobody there has a good idea of how to go about doing this, and 2) you have 1000 tries.
Sorry I didn’t see this until now!
--I agree that for the brute-force evolution approach, we should have a gaussian around where we’d expect that to land. My “Let’s just do evenly across all the OOMs between now and evolution” is only a reasonable first-pass approach to what our all-things-considered distribution should be like, including evolution but also various other strategies. (Even better would be having a taxonomy of the various strategies and a gaussian for each; this is sorta what Ajeya does. the problem is that insofar as you don’t trust your taxonomy to be exhaustive, the resulting distribution is untrustworthy as well.) I think it’s reasonable to extend the probability mass down to where we are now, because we are currently at the HBHL milestone pretty much, which seems like a pretty relevant milestone to say the least.
This seems right to me.
I like this analogy. I think our intuitions about how hard it would be might differ though. Also, our intuitions about the extent to which nobody has a good idea of how to make TAI might differ too.
To be clear I’m not saying nobody has a good idea of how to make TAI. I expect pretty short timelines, because I expect the remaining fundamental challenges aren’t very big.
What I don’t expect is that the remaining fundamental challenges go away through small-N search over large architectures, if the special sauce does turn out to be significant.