I disagree that people working on the technical alignment problem generally believe that solving that technical problem is sufficient to get to Safe & Beneficial AGI.
I won’t put words in people’s mouth, but it’s not my goal to talk about words. I think that large portions of the AI safety community act this way. This includes most people working on scalable alignment, interp, and deception.
If we don’t solve the technical alignment problem, then we’ll eventually wind up with a recipe for summoning more and more powerful demons with callous lack of interest in whether humans live or die
Yeah, I don’t really agree with the idea that getting better at alignment is necessary for safety. I think that it’s more likely than not that we’re already sufficiently good at it. The paragraph titled: “If AI causes a catastrophe, what are the chances that it will be triggered by the choices of people who were exercising what would be considered to be “best safety practices” at the time?” gives my thoughts on this.
I think that large portions of the AI safety community act this way. This includes most people working on scalable alignment, interp, and deception.
Hmm. Sounds like “AI safety community” is a pretty different group of people from your perspective than from mine. Like, I would say that if there’s some belief that is rejected by Eliezer Yudkowsky and by Paul Christiano and by Holden Karnofsky and, widely rejected by employees of OpenPhil and 80,000 hours and ARC and UK-AISI, and widely rejected by self-described rationalists and by self-described EAs and by the people at Anthropic and DeepMind (and maybe even OpenAI) who have “alignment” in their job title … then that belief is not typical of the “AI safety community”.
If you want to talk about actions not words, MIRI exited technical alignment and pivoted to AI governance, OpenPhil is probably funding AI governance and outreach as much as they’re funding technical alignment (hmm, actually, I don’t know the ratio, do you?), 80,000 hours is pushing people into AI governance and outreach as much as into technical alignment (again I don’t know the exact ratio, but my guess would be 50-50), Paul Christiano’s ARC spawned METR, ARIA is funding work on the FlexHEG thing, Zvi writes way more content on governance and societal and legal challenges than on technical alignment, etc.
If you define “AI safety community” as “people working on scalable alignment, interp, and deception”, and say that their “actions not words” are that they’re working on technical alignment as opposed to governance or outreach or whatever, then that’s circular / tautological, right?
I don’t really agree with the idea that getting better at alignment is necessary for safety. I think that it’s more likely than not that we’re already sufficiently good at it
If your opinion is that people shouldn’t work on technical alignment because technical alignment is already a solved problem, that’s at least a coherent position, even if I strongly disagree with it. (Well, I expect future AI to be different than current AI in a way that will make technical alignment much much harder. But let’s not get into that.)
But even in that case, I think you should have written two different posts:
one post would be entitled “good news: technical alignment is easy, egregious scheming is not a concern and never will be, and all the scalable oversight / interp / deception research is a waste of time” (or whatever your preferred wording is)
the other post title would be entitled “bad news: we are not on track in regards to AI governance and institutions and competitive races-to-the-bottom and whatnot”.
That would be a big improvement! For my part, I would agree with the second and disagree with the first. I just think it’s misleading how this OP is lumping those two issues together.
If AI causes a catastrophe, what are the chances that it will be triggered by the choices of people who were exercising what would be considered to be “best safety practices” at the time?
I think it’s pretty low, but then again, I also think ASI is probably going to cause human extinction. I think that, to avoid human extinction, we need to either (A) never ever build ASI, or both (B) come up with adequate best practices to avoid ASI extinction and (C) ensure that relevant parties actually follow those best practices. I think (A) is very hard, and so is (B), and so is (C).
If your position is: “people might not follow best practices even if they exist, so hey, why bother creating best practices in the first place”, then that’s crazy, right?
For example, Wuhan Institute of Virology is still, infuriatingly, researching potential pandemic viruses under inadequate BSL-2 precautions. Does that mean that inventing BSL-4 tech was a waste of time? No! We want one group of people to be inventing BSL-4 tech, and making that tech as inexpensive and user-friendly as possible, and another group of people in parallel to be advocating that people actually use BSL-4 tech when appropriate, and a third group of people in parallel advocating that this kind of research not be done in the first place given the present balance of costs and benefits. (…And a fourth group of people working to prevent bioterrorists who are actually trying to create pandemics, etc. etc.)
I’m glad you think that the post has a small audience and may not be needed. I suppose that’s a good sign.
—
In the post I said it’s good that nukes don’t blow up on accident and similarly, it’s good that BSL-4 protocols and tech exist. I’m not saying that alignment solutions shouldn’t exist. I am speaking to a specific audience (e.g. the frontier companies and allies) that their focus on alignment isn’t commensurate with its usefulness. Also don’t forget the dual nature of alignment progress. I also mentioned in the post that frontier alignment progress hastens timelines and makes misuse risk more acute.
I think that large portions of the AI safety community act this way. This includes most people working on scalable alignment, interp, and deception.
Are you sure? For example, I work on technical AI safety because it’s my comparative advantage, but agree at a high level with your view of the AI safety problem, and almost all of my donations are directed at making AI governance go well. My (not very confident) impression is that most of the people working on technical AI safety (at least in Berkeley/SF) are in a similar place.
Thx!
I won’t put words in people’s mouth, but it’s not my goal to talk about words. I think that large portions of the AI safety community act this way. This includes most people working on scalable alignment, interp, and deception.
Yeah, I don’t really agree with the idea that getting better at alignment is necessary for safety. I think that it’s more likely than not that we’re already sufficiently good at it. The paragraph titled: “If AI causes a catastrophe, what are the chances that it will be triggered by the choices of people who were exercising what would be considered to be “best safety practices” at the time?” gives my thoughts on this.
Hmm. Sounds like “AI safety community” is a pretty different group of people from your perspective than from mine. Like, I would say that if there’s some belief that is rejected by Eliezer Yudkowsky and by Paul Christiano and by Holden Karnofsky and, widely rejected by employees of OpenPhil and 80,000 hours and ARC and UK-AISI, and widely rejected by self-described rationalists and by self-described EAs and by the people at Anthropic and DeepMind (and maybe even OpenAI) who have “alignment” in their job title … then that belief is not typical of the “AI safety community”.
If you want to talk about actions not words, MIRI exited technical alignment and pivoted to AI governance, OpenPhil is probably funding AI governance and outreach as much as they’re funding technical alignment (hmm, actually, I don’t know the ratio, do you?), 80,000 hours is pushing people into AI governance and outreach as much as into technical alignment (again I don’t know the exact ratio, but my guess would be 50-50), Paul Christiano’s ARC spawned METR, ARIA is funding work on the FlexHEG thing, Zvi writes way more content on governance and societal and legal challenges than on technical alignment, etc.
If you define “AI safety community” as “people working on scalable alignment, interp, and deception”, and say that their “actions not words” are that they’re working on technical alignment as opposed to governance or outreach or whatever, then that’s circular / tautological, right?
If your opinion is that people shouldn’t work on technical alignment because technical alignment is already a solved problem, that’s at least a coherent position, even if I strongly disagree with it. (Well, I expect future AI to be different than current AI in a way that will make technical alignment much much harder. But let’s not get into that.)
But even in that case, I think you should have written two different posts:
one post would be entitled “good news: technical alignment is easy, egregious scheming is not a concern and never will be, and all the scalable oversight / interp / deception research is a waste of time” (or whatever your preferred wording is)
the other post title would be entitled “bad news: we are not on track in regards to AI governance and institutions and competitive races-to-the-bottom and whatnot”.
That would be a big improvement! For my part, I would agree with the second and disagree with the first. I just think it’s misleading how this OP is lumping those two issues together.
I think it’s pretty low, but then again, I also think ASI is probably going to cause human extinction. I think that, to avoid human extinction, we need to either (A) never ever build ASI, or both (B) come up with adequate best practices to avoid ASI extinction and (C) ensure that relevant parties actually follow those best practices. I think (A) is very hard, and so is (B), and so is (C).
If your position is: “people might not follow best practices even if they exist, so hey, why bother creating best practices in the first place”, then that’s crazy, right?
For example, Wuhan Institute of Virology is still, infuriatingly, researching potential pandemic viruses under inadequate BSL-2 precautions. Does that mean that inventing BSL-4 tech was a waste of time? No! We want one group of people to be inventing BSL-4 tech, and making that tech as inexpensive and user-friendly as possible, and another group of people in parallel to be advocating that people actually use BSL-4 tech when appropriate, and a third group of people in parallel advocating that this kind of research not be done in the first place given the present balance of costs and benefits. (…And a fourth group of people working to prevent bioterrorists who are actually trying to create pandemics, etc. etc.)
I’m glad you think that the post has a small audience and may not be needed. I suppose that’s a good sign.
—
In the post I said it’s good that nukes don’t blow up on accident and similarly, it’s good that BSL-4 protocols and tech exist. I’m not saying that alignment solutions shouldn’t exist. I am speaking to a specific audience (e.g. the frontier companies and allies) that their focus on alignment isn’t commensurate with its usefulness. Also don’t forget the dual nature of alignment progress. I also mentioned in the post that frontier alignment progress hastens timelines and makes misuse risk more acute.
Are you sure? For example, I work on technical AI safety because it’s my comparative advantage, but agree at a high level with your view of the AI safety problem, and almost all of my donations are directed at making AI governance go well. My (not very confident) impression is that most of the people working on technical AI safety (at least in Berkeley/SF) are in a similar place.