It’s great to see that Sam cares about AI safety, is willing to engage with the topic, and has clear, testable beliefs about it. Some paragraphs from the interview that I found interesting and relevant to AI safety:
“One of the things we really believe is that the most responsible way to put this out in society is very gradually and to get people, institutions, policy makers, get them familiar with it, thinking about the implications, feeling the technology, and getting a sense for what it can do and can’t do very early. Rather than drop a super powerful AGI in the world all at once.”
“The world I think we’re heading to and the safest world, the one I most hope for, is the short timeline slow takeoff.”
“I think there will be many systems in the world that have different settings of the values that they enforce. And really what I think—and this will take longer—is that you as a user should be able to write up a few pages of here’s what I want here are my values here’s how I want the AI to behave and it reads it and thinks about it and acts exactly how you want. Because it should be your AI and it should be there to serve you and do the things you believe in.”
“multiple AGIs in the world I think is better than one.”
“I think the best case is so unbelievably good that it’s hard to—it’s like hard for me to even imagine.”
“And the bad case—and I think this is important to say—is like lights out for all of us. I’m more worried about an accidental misuse case in the short term where someone gets a super powerful—it’s not like the AI wakes up and decides to be evil. I think all of the traditional AI safety thinkers reveal a lot more about themselves than they mean to when they talk about what they think the AGI is going to be like. But I can see the accidental misuse case clearly and that’s super bad. So I think it’s like impossible to overstate the importance of AI safety and alignment work. I would like to see much much more happening.”
“But I think it’s more subtle than most people think. You hear a lot of people talk about AI capabilities and AI alignment as in orthogonal vectors. You’re bad if you’re a capabilities researcher and you’re good if you’re an alignment researcher. It actually sounds very reasonable, but they’re almost the same thing. Deep learning is just gonna solve all of these problems and so far that’s what the progress has been. And progress on capabilities is also what has let us make the systems safer and vice versa surprisingly. So I think none of the sort of sound-bite easy answers work”
“I think the AGI safety stuff is really different, personally. And worthy of study as its own category. Because the stakes are so high and the irreversible situations are so easy to imagine we do need to somehow treat that differently and figure out a new set of safety processes and standards.”
Here is my summary of Sam Altman’s beliefs about AI and AI safety as a list of bullet points:
Sub-AGI models should be released soon and gradually increased in capability so that society can adapt and the models can be tested.
Many people believe that AI capabilities and AI safety are orthogonal vectors but they are actually highly correlated and this belief is confirmed by recent advances. Advances in AI capabilities advance safety and vice-versa.
To align AI it should be possible to write about our values and ask AGIs to read these instructions and behave according to them. Using this approach, we could have AIs tailored to each individual.
There should be multiple AGIs in the world with a diversity of different settings and values.
The upside of AI is extremely positive and potentially utopian. The worst-case scenarios are extremely negative and include scenarios involving human extinction.
Sam is more worried about accidents than the AI itself acting maliciously.
I agree that we should AI models should gradually increase in capabilities so that we can study their properties and think about how to make them safe.
Sam seems to believe that the orthogonality thesis is false in practice. For reference, here is the definition of the orthogonality thesis:
“Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.”
The classic example is the superintelligent paperclip AI that only cares about paperclips. I think the idea is that capabilities and alignment are independent and that scaling capabilities will lead to no more alignment. However, in practice, AI researchers are trying to scale both capabilities and alignment.
I think the orthogonality thesis is true but it doesn’t seem very useful. I believe any combination of intelligence and goals is possible but we also want to know which combinations are most likely. In other words, we want to know the strength of the correlation between capabilities and alignment which may depend on the architecture used.
I think recent events have actually shown that capabilities and alignment are not necessarily correlated. For example, GPT-3 was powerful but not particularly aligned and OpenAI had to come up with new and different methods such as RLHF to make it more aligned.
Sam seems to believe that the value loading problem will be easy and that we can simply ask the AI for what we want. I think it will become easier to create AIs that can understand our values. But whether there is any correlation between understanding and caring seems like a different and more important question. Future AIs could be like an empathetic person who understands and cares about what we want or they could use this understanding to manipulate and defeat humanity.
I’m skeptical about the idea that multiple AGIs would be desirable. According to the book Superintelligence, race dynamics and competition would tend to be worse with more actors building AGI. Actions such as temporarily slowing down global AI development would be more difficult to coordinate with more teams building AGI.
I agree with the idea that the set of future AGI possibilities includes a wide range of possible futures including very positive and negative outcomes.
In the short term, I’m more worried about accidents and malicious actors but when AI exceeds human intelligence, it seems like the source of most of the danger will be the AI’s own values and decision-making.
From these points, these are the beliefs I’m most uncertain about:
The strength of the correlation, if any, between capabilities and alignment in modern deep learning systems.
Whether we can program AI models to understand and care about (adopt) our values simply by prompting them with a description of our values.
Whether one or many AGIs would be most desirable.
Whether we should be more worried about AI accidents and misuse or the AI itself being a dangerous agent.
It’s great to see that Sam cares about AI safety, is willing to engage with the topic, and has clear, testable beliefs about it. Some paragraphs from the interview that I found interesting and relevant to AI safety:
Here is my summary of Sam Altman’s beliefs about AI and AI safety as a list of bullet points:
Sub-AGI models should be released soon and gradually increased in capability so that society can adapt and the models can be tested.
Many people believe that AI capabilities and AI safety are orthogonal vectors but they are actually highly correlated and this belief is confirmed by recent advances. Advances in AI capabilities advance safety and vice-versa.
To align AI it should be possible to write about our values and ask AGIs to read these instructions and behave according to them. Using this approach, we could have AIs tailored to each individual.
There should be multiple AGIs in the world with a diversity of different settings and values.
The upside of AI is extremely positive and potentially utopian. The worst-case scenarios are extremely negative and include scenarios involving human extinction.
Sam is more worried about accidents than the AI itself acting maliciously.
I agree that we should AI models should gradually increase in capabilities so that we can study their properties and think about how to make them safe.
Sam seems to believe that the orthogonality thesis is false in practice. For reference, here is the definition of the orthogonality thesis:
The classic example is the superintelligent paperclip AI that only cares about paperclips. I think the idea is that capabilities and alignment are independent and that scaling capabilities will lead to no more alignment. However, in practice, AI researchers are trying to scale both capabilities and alignment.
I think the orthogonality thesis is true but it doesn’t seem very useful. I believe any combination of intelligence and goals is possible but we also want to know which combinations are most likely. In other words, we want to know the strength of the correlation between capabilities and alignment which may depend on the architecture used.
I think recent events have actually shown that capabilities and alignment are not necessarily correlated. For example, GPT-3 was powerful but not particularly aligned and OpenAI had to come up with new and different methods such as RLHF to make it more aligned.
Sam seems to believe that the value loading problem will be easy and that we can simply ask the AI for what we want. I think it will become easier to create AIs that can understand our values. But whether there is any correlation between understanding and caring seems like a different and more important question. Future AIs could be like an empathetic person who understands and cares about what we want or they could use this understanding to manipulate and defeat humanity.
I’m skeptical about the idea that multiple AGIs would be desirable. According to the book Superintelligence, race dynamics and competition would tend to be worse with more actors building AGI. Actions such as temporarily slowing down global AI development would be more difficult to coordinate with more teams building AGI.
I agree with the idea that the set of future AGI possibilities includes a wide range of possible futures including very positive and negative outcomes.
In the short term, I’m more worried about accidents and malicious actors but when AI exceeds human intelligence, it seems like the source of most of the danger will be the AI’s own values and decision-making.
From these points, these are the beliefs I’m most uncertain about:
The strength of the correlation, if any, between capabilities and alignment in modern deep learning systems.
Whether we can program AI models to understand and care about (adopt) our values simply by prompting them with a description of our values.
Whether one or many AGIs would be most desirable.
Whether we should be more worried about AI accidents and misuse or the AI itself being a dangerous agent.