This is only true if we assume that there are little to no differences in which company takes the lead in AI, or which types of AI are preferable, and I think this is wrong, and there fairly massive differences between OpenAI or Anthropic winning the race, compared to Deepmind winning the race to AGI.
Noting that I don’t think alignment being “solved” is a binary. As discussed in the post, I think there are a number of measures that could improve our odds of getting early human-level-ish AIs to be aligned “enough,” even assuming no positive surprises on alignment science. This would imply that if lab A is more attentive to alignment and more inclined to invest heavily in even basic measures for aligning its systems than lab B, it could matter which lab develops very capable AI systems first.
I don’t exactly condition on alignment being solved. I instead point to a very important difference between OpenAI/Anthropic’s AI vs Deepmind’s AI, and the biggest difference between the two is that OpenAI/Anthropic’s AI has a lot less incentive to develop instrumental goals due to having way fewer steps between the input and output, and incentivizes constraining goals, compared to Deepmind which uses RL, which essentially requires instrumental goals/instrumental convergence to do anything.
This is an important observation by porby, which I’d lossily compress it to “Instrumental goals/Instrumental convergence is at best a debatable assumption for LLMs and Non-RL AI, and may not be there at all for LLMs/Non-RL AI.”
And this matters, because the assumption of instrumental convergence/powerseeking underlies basically all of the pessimistic analyses on AI, and arguably a supermajority of why AI is fundamentally dangerous, because instrumental convergence/powerseeking is essentially why it’s so difficult to gain AI safety. LLMs/Non-RL AI probably bypass all of the AI safety concerns that isn’t related to misuse or ethics, and this has massive implications. So massive, I covered them in it’s own post here:
One big implication is obvious: OpenAI and Anthropic are much safer companies to win the AI race, relative to Deepmind, because of the probably non-existent instrumental convergence/powerseeking issue.
It also makes the initial alignment problem drastically easier, as it’s a non-adversarial problem that doesn’t need security mindset to make the LLM/Non-RL AI Alignment researcher plan work, as described here:
And thus makes the whole problem easier as we don’t need to worry much about the first AI researcher’s alignment, resulting in a stable foundation for their recursive/meta alignment plan.
The fact that instrumental convergence/powerseeking/instrumental goals are at best debatable and probably false is probably the biggest reason why I claim that the different companies are fundamentally different in the probability of extinction, with the p(DOOM) radically reducing conditional on OpenAI or Anthropic winning the race, due to their AIs having a very desirable safety property, which is the lack of an incentive to have instrumental goals/instrumental convergence/powerseeking by default.
You’re right, it is the technology that makes the difference, but my point is that specific companies focus more on specific technology paths to safe AGI. And OpenAI/Anthropic’s approach tends not to have instrumental convergence/powerseeking, compared to Deepmind, given that Deepmind focuses on RL, which essentially requires instrumental convergence. To be clear, I actually don’t think OpenAI/Anthropic’s path can work to AGI, but their alignment plans probably do work. And given instrumental convergence/powerseeking is basically the reason why AI is more dangerous than standard technology, that is a very big difference between the companies rushing to AGI.
Thanks for the posts on non-agentic AGI.
My other points are that the non-existence of instrumental convergence/powerseeking even at really high scales, if true, has very, very large implications for the dangerousness of AI, and consequently basically everything has to change with respect to AI safety, given that it’s a foundational assumption of why AI is so dangerous at all.
This is only true if we assume that there are little to no differences in which company takes the lead in AI, or which types of AI are preferable, and I think this is wrong, and there fairly massive differences between OpenAI or Anthropic winning the race, compared to Deepmind winning the race to AGI.
So I guess first you condition over alignment being solved when we win the race. Why do you think OpenAI/Anthropic are very different from DeepMind?
Noting that I don’t think alignment being “solved” is a binary. As discussed in the post, I think there are a number of measures that could improve our odds of getting early human-level-ish AIs to be aligned “enough,” even assuming no positive surprises on alignment science. This would imply that if lab A is more attentive to alignment and more inclined to invest heavily in even basic measures for aligning its systems than lab B, it could matter which lab develops very capable AI systems first.
I don’t exactly condition on alignment being solved. I instead point to a very important difference between OpenAI/Anthropic’s AI vs Deepmind’s AI, and the biggest difference between the two is that OpenAI/Anthropic’s AI has a lot less incentive to develop instrumental goals due to having way fewer steps between the input and output, and incentivizes constraining goals, compared to Deepmind which uses RL, which essentially requires instrumental goals/instrumental convergence to do anything.
This is an important observation by porby, which I’d lossily compress it to “Instrumental goals/Instrumental convergence is at best a debatable assumption for LLMs and Non-RL AI, and may not be there at all for LLMs/Non-RL AI.”
And this matters, because the assumption of instrumental convergence/powerseeking underlies basically all of the pessimistic analyses on AI, and arguably a supermajority of why AI is fundamentally dangerous, because instrumental convergence/powerseeking is essentially why it’s so difficult to gain AI safety. LLMs/Non-RL AI probably bypass all of the AI safety concerns that isn’t related to misuse or ethics, and this has massive implications. So massive, I covered them in it’s own post here:
https://www.lesswrong.com/posts/8SpbjkJREzp2H4dBB/a-potentially-high-impact-differential-technological
One big implication is obvious: OpenAI and Anthropic are much safer companies to win the AI race, relative to Deepmind, because of the probably non-existent instrumental convergence/powerseeking issue.
It also makes the initial alignment problem drastically easier, as it’s a non-adversarial problem that doesn’t need security mindset to make the LLM/Non-RL AI Alignment researcher plan work, as described here:
https://openai.com/blog/our-approach-to-alignment-research
And thus makes the whole problem easier as we don’t need to worry much about the first AI researcher’s alignment, resulting in a stable foundation for their recursive/meta alignment plan.
The fact that instrumental convergence/powerseeking/instrumental goals are at best debatable and probably false is probably the biggest reason why I claim that the different companies are fundamentally different in the probability of extinction, with the p(DOOM) radically reducing conditional on OpenAI or Anthropic winning the race, due to their AIs having a very desirable safety property, which is the lack of an incentive to have instrumental goals/instrumental convergence/powerseeking by default.
I agree that there is a difference between strong AI that has goals and one that is not an agent. This is the point I made here https://www.lesswrong.com/posts/wDL6wiqg3c6WFisHq/gpt-as-an-intelligence-forklift
But this has less to do with the particular lab (eg DeepMind trained Chinchilla) and more with the underlying technology. If the path to stronger models goes through scaling up LLMs then it does seem that they will be 99.9% non agentic (measured in FLOPs https://www.lesswrong.com/posts/f8joCrfQemEc3aCk8/the-local-unit-of-intelligence-is-flops )
You’re right, it is the technology that makes the difference, but my point is that specific companies focus more on specific technology paths to safe AGI. And OpenAI/Anthropic’s approach tends not to have instrumental convergence/powerseeking, compared to Deepmind, given that Deepmind focuses on RL, which essentially requires instrumental convergence. To be clear, I actually don’t think OpenAI/Anthropic’s path can work to AGI, but their alignment plans probably do work. And given instrumental convergence/powerseeking is basically the reason why AI is more dangerous than standard technology, that is a very big difference between the companies rushing to AGI.
Thanks for the posts on non-agentic AGI.
My other points are that the non-existence of instrumental convergence/powerseeking even at really high scales, if true, has very, very large implications for the dangerousness of AI, and consequently basically everything has to change with respect to AI safety, given that it’s a foundational assumption of why AI is so dangerous at all.