You’re saying a lot of things that seem very confused and irrelevant to me, and I’m trying to get to the bottom of where you’re coming from.
Here’s a key question, I think: In this comment, you drew a diagram with a dashed line labeled “capabilities ceiling”. What do you think determines the capabilities ceiling? E.g. what hypothetical real-world interventions would make that dashed line move left or right, or to get steeper or shallower?
In other words, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen sooner will push us towards slow takeoff. That would be internally-inconsistent, right? As a thought experiment, suppose all tech and economic and intellectual progress of the next century magically happened overnight tonight. Then we would have extremely powerful AGI tomorrow, right? And that is a very fast takeoff, i.e. one day.
Conversely, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen later will push us towards fast takeoff. Again, that would be internally-inconsistent, right? As a thought experiment, suppose every human on Earth simultaneously hibernated for 20 years starting right now. And then the entire human race wakes up in 2044, and we pick right back up where we were. That wouldn’t make a bit of difference to takeoff speed—the takeoff would be exactly as fast or slow if the hibernation happens as if it doesn’t. Right? (Well, that’s assuming that takeoff hasn’t already started, I guess, but if it has, then the hibernation would technically make takeoff slower not faster, right?)
If you agree with those two thought experiments, then you need to have in mind some bottleneck sitting between us and dangerous AGI, i.e. the “capabilities ceiling” dashed line. If there is such a bottleneck, then we can and hopefully would accelerate everything except that bottleneck, and we won’t get all the way to dangerous AGI until that bottleneck goes away, which (we hope) will take a long time, presumably because of the particular nature of that bottleneck. Most people in the prosaic AGI camp (including the slightly-younger Sam Altman I guess) think that the finite number of chips in the world is either the entirety of this bottleneck, or at least a major part of it, and that therefore trying to alleviate this bottleneck ASAP is the last thing you want to do in order to get maximally slow takeoff. If you disagree with that, then you presumably are expecting a different bottleneck besides chips, and I’d like to know what it is, and how you know.
Maybe I got carried away with the whole “everything overhang” idea.
While I do think fast vs slow takeoff is an important variable that determines how safe a singularity is, it’s far from the only thing that matters.
If you were looking at our world today and asking “what obvious inefficiencies will an AGI exploit?” there are probably a lot of lower-hanging fruits (nuclear power, genetic engineering, zoning) that you would point to before getting to “we’re not building chip fabs as fast as physically possible”.
My actual views are probably closest to d/acc which is that there are a wide variety of directions we can chose when researching new technology and we ought to focus on the ones that make the world safer.
I do think that creating new obvious inefficiencies is a bad idea. For example, if we were to sustain a cap of 10**26 FLOPs on training runs for a decade or longer, that would make it really easy for a rouge actor/ai to suddenly build a much more powerful AI than anyone else in the world has.
As to the specific case of Sam/$7T, I think that it’s largely aspiration and to the extent that it happens it was going to happen anyway. I guess if I was given a specific counterfactual, like: TSMC is going to build 100 new fabs in the next 10 years, is it better that they be built in the USA or Taiwan? I would prefer they be built in the USA. If, on the other hand, the counterfactual was: the USA is going to invest $7T in AI in the next 10 years, would you prefer it be all spent on semiconductor fabs or half on semiconductor fabs and half on researching controllable AI algorithms, I would prefer the latter.
Basically, my views are “don’t be an idiot”, but it’s possible to be an idiot both by arbitrarily banning things and by focusing on a single line of research to the exclusion of all others.
I think you’re confused. A rocket launch is not the same thing as filling the rocket fuel tanks with high explosives. Even when you give the rocket bigger engines. Launching a roller coaster harder is not the same as dropping someone to the ground. Avalanche isn’t the same thing as a high speed train moving the ice.
The difference between these cases is the sudden all at once jolt, or impulse, is fatal. While most acceleration events up to a limit are not.
So in the AI case, situations where there are a lot of idle chips powerful enough to host ASI piled up everywhere plugged into infrastucture like the explosives/fall/avalanche. You may note the system has a lot of potential energy unused that is suddenly released.
While if humans start running their fabs faster, say 10-1000 times faster, and every newly made IC goes into an AI cluster giving immediate feedback to the developers, then to an extent this is a continuous process. You might be confident that it is simply not possible to create ICs fast enough, assuming they all immediately go into AI and humans immediately get genuine feedback on failures and safety, to cause a fast takeoff.
And I think this is the case, assuming humans isolate each cluster and refrain from giving their early production AIs some mechanism to communicate with each other in an unstructured way, and/or a way to remember past a few hours.
Failures are fine, AI screwing up and killing people is just fine (so long as the risks are less than humans doing the task). What’s not fine is a coordinated failure.
The converse view might be you can’t try to change faster than some rate limit decided by human governments. The legal system isn’t ready for AGI, the economy and medical system isn’t, no precautions whatsoever are legally required, the economy will naturally put all the power into the hands of a few tech companies if this goes forward. That democracies simply can’t adjust fast enough because most voters won’t understand any of the issues and keep voting for incumbents, dictatorships have essentially the same problem.
Whether Eliezer or Paul is right about “sudden all at once jolt” is an interesting question but I don’t understand how that question is related to Sam Altman’s chip ambitions. I don’t understand why Logan keeps bringing that up, and now I guess you’re doing it too. Is the idea that “sudden all at once jolt” is less likely in a world with more chips and chip fabs, and more likely in a world with fewer chips and chip fabs? If so, why? I would expect that if the extra chips make any difference at all, it would be to push things in the opposite direction.
In other words, if “situations where there are a lot of chips piled up everywhere plugged into infrastucture” is the bad thing that we’re trying to avoid, then a good way to help avoid that is to NOT manufacture tons and tons of extra chips, right?
Is the idea that “sudden all at once jolt” is less likely in a world with more chips and chip fabs, and more likely in a world with fewer chips and chip fabs? If so, why? I would expect that if the extra chips make any difference at all, it would be to push things in the opposite direction.
We’re changing 2 variables, not 1:
(1) We know how to make useful AI, and these are AI accelerator chips specifically meant for specific network architectures
(2) we built a lot of them
Pretend base world:
(1) OAI doesn’t exist or folds like most startups. Deepmind went from a 40% staff cut to 100% and joins the graveyard.
(2) Moore’s law continues, and various kinds of general accelerator keep getting printed
So in the “pretend base world”, 2 things are true:
(1) AI is possible, just human investors were too stupid to pay for it
(2) each 2-3 years, the cost of compute is halving
Suppose for 20 more years the base world continues, with human investors preferring to invest in various pyramid schemes instead of AI. (real estate, crypto...). Then after 20 years, compute is 256 times cheaper. This “7 trillion” investment is now 27 billion, pocket change. Also various inefficient specialized neural networks (“narrow AI”) are used in places, with lots of support hardware plugged into real infrastructure.
That world has a compute overhang, and since compute is so cheap, someone will eventually try to train a now small neural network with 20 trillion weights on some random stuff they downloaded and you know the rest.
What’s different in the real world: Each accelerator built in practice will be specifically for probably a large transformer network, specifically with fp16 or less precision. Each one printed is then racked into a cluster specifically with a finite bandwidth backplane intended to support the scale of network that is in commercial use.
Key differences :
(1) the expensive hardware has to be specialized, instead of general purpose
(2) specializations mean it is likely impossible for any of the newly made accelerators to support a different/much larger architecture. This means is is very unlikely to be able to run an ASI, unless that ASI happens to function with a transformers-like architecture of similar scale on fp16, or can somehow function as a distributed system with low bandwidth between clusters.
(3) Human developers get immediate feedback as they scale up their networks. If there are actual safety concerns of the type that MIRI et al has speculated exist, they may be found before the hardware can support ASI. This is what makes it a continuous process.
Note the epistemics : I work on an accelerator platform. I can say with confidence that accelerator design does limit what networks it is viable to accelerate, tops are not fungible.
Conclusion: if you think AI during your lifetime is bad, you’re probably going to see this as a bad thing. Whether it is actually a bad thing is complicated.
I’m confused about your “pretend base world”. This isn’t a discussion about whether it’s good or bad that OAI exists. It’s a discussion about “Sam Altman’s chip ambitions”. So we should compare the world where OAI seems to be doing quite well and Sam Altman has no chip ambitions at all, to the world where OAI seems to be doing quite well and Sam Altman does have chip ambitions. Right?
I agree that if we’re worried about FOOM-from-a-paradigm-shifting-algorithmic-breakthrough (which as it turns out I am indeed worried about), then we would prefer be in a world where there is a low absolute number of chips that are flexible enough to run a wide variety of algorithms, than a world where there are a large number of such chips. But I disagree that this would be the effect of Sam Altman’s chip ambitions; rather, I think Sam Altman’s chip ambitions would clearly move things in the opposite, bad direction, on that metric. Don’t you think?
By analogy, suppose I say “(1) It’s very important to minimize the number of red cars in existence. (2) Hey, there’s a massively hot upcoming specialized market for blue cars, so let’s build 100 massive car factories all around the world.” You would agree that (2) is moving things in the wrong direction for accomplishing (1), right?
This seems obvious to me, but if not, I’ll spell out a couple reasons:
For one thing, who’s to say that the new car factories won’t sell into the red-car market too? Back to the case at hand: we should strongly presume that whatever fabs get built by this Sam Altman initiative will make not exclusively ultra-specialized AI chips, but rather they will make whatever kinds of chips are most profitable to make, and this might include some less-specialized chips. After all, whoever invests in the fab, once they build it, they will try to maximize revenue, to make back the insane amount of money they put in, right? And fabs are flexible enough to make more than one kind of chip, especially over the long term.
For another thing, even if the new car factories don’t directly produce red cars, they will still lower the price of red cars, compared to the factories not existing, because the old car factories will produce extra marginal red cars when they would otherwise be producing blue cars. Back to the case at hand: the non-Sam-Altman fabs will choose to pump out more non-ultra-specialized chips if Sam-Altman fabs are flooding the specialized-chips market. Also, in the longer term, fab suppliers will be able to lower costs across the industry (from both economies-of-scale and having more money for R&D towards process improvements) if they have more fabs to sell to, and this would make it economical for non-Sam-Altman fabs to produce and sell more non-ultra-specialized chips.
You’re saying a lot of things that seem very confused and irrelevant to me, and I’m trying to get to the bottom of where you’re coming from.
Here’s a key question, I think: In this comment, you drew a diagram with a dashed line labeled “capabilities ceiling”. What do you think determines the capabilities ceiling? E.g. what hypothetical real-world interventions would make that dashed line move left or right, or to get steeper or shallower?
In other words, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen sooner will push us towards slow takeoff. That would be internally-inconsistent, right? As a thought experiment, suppose all tech and economic and intellectual progress of the next century magically happened overnight tonight. Then we would have extremely powerful AGI tomorrow, right? And that is a very fast takeoff, i.e. one day.
Conversely, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen later will push us towards fast takeoff. Again, that would be internally-inconsistent, right? As a thought experiment, suppose every human on Earth simultaneously hibernated for 20 years starting right now. And then the entire human race wakes up in 2044, and we pick right back up where we were. That wouldn’t make a bit of difference to takeoff speed—the takeoff would be exactly as fast or slow if the hibernation happens as if it doesn’t. Right? (Well, that’s assuming that takeoff hasn’t already started, I guess, but if it has, then the hibernation would technically make takeoff slower not faster, right?)
If you agree with those two thought experiments, then you need to have in mind some bottleneck sitting between us and dangerous AGI, i.e. the “capabilities ceiling” dashed line. If there is such a bottleneck, then we can and hopefully would accelerate everything except that bottleneck, and we won’t get all the way to dangerous AGI until that bottleneck goes away, which (we hope) will take a long time, presumably because of the particular nature of that bottleneck. Most people in the prosaic AGI camp (including the slightly-younger Sam Altman I guess) think that the finite number of chips in the world is either the entirety of this bottleneck, or at least a major part of it, and that therefore trying to alleviate this bottleneck ASAP is the last thing you want to do in order to get maximally slow takeoff. If you disagree with that, then you presumably are expecting a different bottleneck besides chips, and I’d like to know what it is, and how you know.
idk.
Maybe I got carried away with the whole “everything overhang” idea.
While I do think fast vs slow takeoff is an important variable that determines how safe a singularity is, it’s far from the only thing that matters.
If you were looking at our world today and asking “what obvious inefficiencies will an AGI exploit?” there are probably a lot of lower-hanging fruits (nuclear power, genetic engineering, zoning) that you would point to before getting to “we’re not building chip fabs as fast as physically possible”.
My actual views are probably closest to d/acc which is that there are a wide variety of directions we can chose when researching new technology and we ought to focus on the ones that make the world safer.
I do think that creating new obvious inefficiencies is a bad idea. For example, if we were to sustain a cap of 10**26 FLOPs on training runs for a decade or longer, that would make it really easy for a rouge actor/ai to suddenly build a much more powerful AI than anyone else in the world has.
As to the specific case of Sam/$7T, I think that it’s largely aspiration and to the extent that it happens it was going to happen anyway. I guess if I was given a specific counterfactual, like: TSMC is going to build 100 new fabs in the next 10 years, is it better that they be built in the USA or Taiwan? I would prefer they be built in the USA. If, on the other hand, the counterfactual was: the USA is going to invest $7T in AI in the next 10 years, would you prefer it be all spent on semiconductor fabs or half on semiconductor fabs and half on researching controllable AI algorithms, I would prefer the latter.
Basically, my views are “don’t be an idiot”, but it’s possible to be an idiot both by arbitrarily banning things and by focusing on a single line of research to the exclusion of all others.
I think you’re confused. A rocket launch is not the same thing as filling the rocket fuel tanks with high explosives. Even when you give the rocket bigger engines. Launching a roller coaster harder is not the same as dropping someone to the ground. Avalanche isn’t the same thing as a high speed train moving the ice.
The difference between these cases is the sudden all at once jolt, or impulse, is fatal. While most acceleration events up to a limit are not.
So in the AI case, situations where there are a lot of idle chips powerful enough to host ASI piled up everywhere plugged into infrastucture like the explosives/fall/avalanche. You may note the system has a lot of potential energy unused that is suddenly released.
While if humans start running their fabs faster, say 10-1000 times faster, and every newly made IC goes into an AI cluster giving immediate feedback to the developers, then to an extent this is a continuous process. You might be confident that it is simply not possible to create ICs fast enough, assuming they all immediately go into AI and humans immediately get genuine feedback on failures and safety, to cause a fast takeoff.
And I think this is the case, assuming humans isolate each cluster and refrain from giving their early production AIs some mechanism to communicate with each other in an unstructured way, and/or a way to remember past a few hours.
Failures are fine, AI screwing up and killing people is just fine (so long as the risks are less than humans doing the task). What’s not fine is a coordinated failure.
The converse view might be you can’t try to change faster than some rate limit decided by human governments. The legal system isn’t ready for AGI, the economy and medical system isn’t, no precautions whatsoever are legally required, the economy will naturally put all the power into the hands of a few tech companies if this goes forward. That democracies simply can’t adjust fast enough because most voters won’t understand any of the issues and keep voting for incumbents, dictatorships have essentially the same problem.
Whether Eliezer or Paul is right about “sudden all at once jolt” is an interesting question but I don’t understand how that question is related to Sam Altman’s chip ambitions. I don’t understand why Logan keeps bringing that up, and now I guess you’re doing it too. Is the idea that “sudden all at once jolt” is less likely in a world with more chips and chip fabs, and more likely in a world with fewer chips and chip fabs? If so, why? I would expect that if the extra chips make any difference at all, it would be to push things in the opposite direction.
In other words, if “situations where there are a lot of chips piled up everywhere plugged into infrastucture” is the bad thing that we’re trying to avoid, then a good way to help avoid that is to NOT manufacture tons and tons of extra chips, right?
We’re changing 2 variables, not 1:
(1) We know how to make useful AI, and these are AI accelerator chips specifically meant for specific network architectures
(2) we built a lot of them
Pretend base world:
(1) OAI doesn’t exist or folds like most startups. Deepmind went from a 40% staff cut to 100% and joins the graveyard.
(2) Moore’s law continues, and various kinds of general accelerator keep getting printed
So in the “pretend base world”, 2 things are true:
(1) AI is possible, just human investors were too stupid to pay for it
(2) each 2-3 years, the cost of compute is halving
Suppose for 20 more years the base world continues, with human investors preferring to invest in various pyramid schemes instead of AI. (real estate, crypto...). Then after 20 years, compute is 256 times cheaper. This “7 trillion” investment is now 27 billion, pocket change. Also various inefficient specialized neural networks (“narrow AI”) are used in places, with lots of support hardware plugged into real infrastructure.
That world has a compute overhang, and since compute is so cheap, someone will eventually try to train a now small neural network with 20 trillion weights on some random stuff they downloaded and you know the rest.
What’s different in the real world: Each accelerator built in practice will be specifically for probably a large transformer network, specifically with fp16 or less precision. Each one printed is then racked into a cluster specifically with a finite bandwidth backplane intended to support the scale of network that is in commercial use.
Key differences :
(1) the expensive hardware has to be specialized, instead of general purpose
(2) specializations mean it is likely impossible for any of the newly made accelerators to support a different/much larger architecture. This means is is very unlikely to be able to run an ASI, unless that ASI happens to function with a transformers-like architecture of similar scale on fp16, or can somehow function as a distributed system with low bandwidth between clusters.
(3) Human developers get immediate feedback as they scale up their networks. If there are actual safety concerns of the type that MIRI et al has speculated exist, they may be found before the hardware can support ASI. This is what makes it a continuous process.
Note the epistemics : I work on an accelerator platform. I can say with confidence that accelerator design does limit what networks it is viable to accelerate, tops are not fungible.
Conclusion: if you think AI during your lifetime is bad, you’re probably going to see this as a bad thing. Whether it is actually a bad thing is complicated.
I’m confused about your “pretend base world”. This isn’t a discussion about whether it’s good or bad that OAI exists. It’s a discussion about “Sam Altman’s chip ambitions”. So we should compare the world where OAI seems to be doing quite well and Sam Altman has no chip ambitions at all, to the world where OAI seems to be doing quite well and Sam Altman does have chip ambitions. Right?
I agree that if we’re worried about FOOM-from-a-paradigm-shifting-algorithmic-breakthrough (which as it turns out I am indeed worried about), then we would prefer be in a world where there is a low absolute number of chips that are flexible enough to run a wide variety of algorithms, than a world where there are a large number of such chips. But I disagree that this would be the effect of Sam Altman’s chip ambitions; rather, I think Sam Altman’s chip ambitions would clearly move things in the opposite, bad direction, on that metric. Don’t you think?
By analogy, suppose I say “(1) It’s very important to minimize the number of red cars in existence. (2) Hey, there’s a massively hot upcoming specialized market for blue cars, so let’s build 100 massive car factories all around the world.” You would agree that (2) is moving things in the wrong direction for accomplishing (1), right?
This seems obvious to me, but if not, I’ll spell out a couple reasons:
For one thing, who’s to say that the new car factories won’t sell into the red-car market too? Back to the case at hand: we should strongly presume that whatever fabs get built by this Sam Altman initiative will make not exclusively ultra-specialized AI chips, but rather they will make whatever kinds of chips are most profitable to make, and this might include some less-specialized chips. After all, whoever invests in the fab, once they build it, they will try to maximize revenue, to make back the insane amount of money they put in, right? And fabs are flexible enough to make more than one kind of chip, especially over the long term.
For another thing, even if the new car factories don’t directly produce red cars, they will still lower the price of red cars, compared to the factories not existing, because the old car factories will produce extra marginal red cars when they would otherwise be producing blue cars. Back to the case at hand: the non-Sam-Altman fabs will choose to pump out more non-ultra-specialized chips if Sam-Altman fabs are flooding the specialized-chips market. Also, in the longer term, fab suppliers will be able to lower costs across the industry (from both economies-of-scale and having more money for R&D towards process improvements) if they have more fabs to sell to, and this would make it economical for non-Sam-Altman fabs to produce and sell more non-ultra-specialized chips.