Thanks for this post; it’s probably my favorite Cold Takes post from the last few months. I appreciated the specific scenario, as well as the succinct points in the “we can do better” section. I felt like I could get a more concrete understanding of your worldview, how you think we should move forward, and the reasons why. I’m also glad that you’re thinking critically about standards and monitoring.
For a simple example, imagine an AI company in a dominant market position—months ahead of all of the competition, in some relevant sense (e.g., its AI systems are more capable, such that it would take the competition months to catch up). Such a company could put huge amounts of resources—including its money, top people and its advanced AI systems themselves (e.g., AI systems performing roles similar to top human scientists) - into AI safety research, hoping to find safety measures that can be published for everyone to use.
Let’s suppose this AI lab existed. For a while, it was prioritizing capabilities research in order to stay ahead of its competition. Do you expect that it would know when it’s supposed to “hit the pause button” and reallocate its resources into AI safety research?
I think my biggest fear with pushing the “Successful, careful AI project” narrative is that (a) every AGI company will think that they can be the successful/careful project, which just gives them more justification to keep doing capabilities research and (b) it seems hard to know when the lab is supposed to “pause”. This was one of my major uncertainties about OpenAI’s alignment plan and it seems consistent with your concerns about racing through a minefield.
What do you think about this “when to pause” problem? Are you expecting that labs implement evals that tell them when to pause, or that they’ll kind of “know it when they see it”, or something else?
Thanks! I agree this is a concern. In theory, people who are constantly thinking about the risks should be able to make a reasonable decision about “when to pause”, but in practice I think there is a lot of important work to do today making the “pause” more likely in the future, including on AI safety standards and on the kinds of measures described at https://www.cold-takes.com/what-ai-companies-can-do-today-to-help-with-the-most-important-century/
Thanks for this post; it’s probably my favorite Cold Takes post from the last few months. I appreciated the specific scenario, as well as the succinct points in the “we can do better” section. I felt like I could get a more concrete understanding of your worldview, how you think we should move forward, and the reasons why. I’m also glad that you’re thinking critically about standards and monitoring.
Let’s suppose this AI lab existed. For a while, it was prioritizing capabilities research in order to stay ahead of its competition. Do you expect that it would know when it’s supposed to “hit the pause button” and reallocate its resources into AI safety research?
I think my biggest fear with pushing the “Successful, careful AI project” narrative is that (a) every AGI company will think that they can be the successful/careful project, which just gives them more justification to keep doing capabilities research and (b) it seems hard to know when the lab is supposed to “pause”. This was one of my major uncertainties about OpenAI’s alignment plan and it seems consistent with your concerns about racing through a minefield.
What do you think about this “when to pause” problem? Are you expecting that labs implement evals that tell them when to pause, or that they’ll kind of “know it when they see it”, or something else?
Thanks! I agree this is a concern. In theory, people who are constantly thinking about the risks should be able to make a reasonable decision about “when to pause”, but in practice I think there is a lot of important work to do today making the “pause” more likely in the future, including on AI safety standards and on the kinds of measures described at https://www.cold-takes.com/what-ai-companies-can-do-today-to-help-with-the-most-important-century/