I definitely agree that we need to normalize thinking about AI safety, and I think that’s been happening. In fact, I think of that as one of the major benefits of writing the Alignment newsletter, even though I started it with AI safety researchers in mind (who still remain the audience I write for, if not the audience I actually have).
I’m less convinced that we should have a process for dangerous AI research. What counts as dangerous? Certainly this makes sense for AI research that can be dangerous in the short term, such as research that has military or surveillance applications, but what would be dangerous from a long-term perspective? It shouldn’t just be research that differentially benefits general AI over long-term safety, since that’s almost all AI research. And even though on the current margin I would want research to differentially advance safety, it feels wrong to call other research dangerous, especially given its enormous potential for good.
it feels wrong to call other research dangerous, especially given its enormous potential for good.
I agree that calling 99.9% of AI research “dangerous” and AI Safety research “safe” is not an useful dichotomy. However, I consider AGI companies/labs and people focusing on implementing self-improving AI/code synthesis extremely dangerous. Same for any breakthrough in general AI, or things that greatly shorten the AGI timeline.
Do you mean that some AI research have positive expected utility (e.g. in medecine) and should not be called dangerous because the good they produce compensates for the increased AI-risk?
Just to return for a moment to what I wrote, I don’t mean to be making an assessment here on “dangerous”, but instead to provide this service for things people themselves think are dangerous. Figuring out where to draw the line in what capabilities research is so dangerous it should not be published is a thing I have only very weak opinions on. For example, if you figured out how to make recursive self improvement work in a way that doesn’t immediately result in wild divergence and could stablely produce better results over many iterations I’d say that’s dangerous, but less than that I’m not sure where you might draw the line.
I definitely agree that we need to normalize thinking about AI safety, and I think that’s been happening. In fact, I think of that as one of the major benefits of writing the Alignment newsletter, even though I started it with AI safety researchers in mind (who still remain the audience I write for, if not the audience I actually have).
I’m less convinced that we should have a process for dangerous AI research. What counts as dangerous? Certainly this makes sense for AI research that can be dangerous in the short term, such as research that has military or surveillance applications, but what would be dangerous from a long-term perspective? It shouldn’t just be research that differentially benefits general AI over long-term safety, since that’s almost all AI research. And even though on the current margin I would want research to differentially advance safety, it feels wrong to call other research dangerous, especially given its enormous potential for good.
I agree that calling 99.9% of AI research “dangerous” and AI Safety research “safe” is not an useful dichotomy. However, I consider AGI companies/labs and people focusing on implementing self-improving AI/code synthesis extremely dangerous. Same for any breakthrough in general AI, or things that greatly shorten the AGI timeline.
Do you mean that some AI research have positive expected utility (e.g. in medecine) and should not be called dangerous because the good they produce compensates for the increased AI-risk?
Just to return for a moment to what I wrote, I don’t mean to be making an assessment here on “dangerous”, but instead to provide this service for things people themselves think are dangerous. Figuring out where to draw the line in what capabilities research is so dangerous it should not be published is a thing I have only very weak opinions on. For example, if you figured out how to make recursive self improvement work in a way that doesn’t immediately result in wild divergence and could stablely produce better results over many iterations I’d say that’s dangerous, but less than that I’m not sure where you might draw the line.