Seems like different AI alignment perspectives sometimes are about “which thing seems least impossible.”
Straw MIRI researchers: “building AGI out of modern machine learning is automatically too messy and doomed. Much less impossible to try to build a robust theory of agency first.”
Straw Paul Christiano: “trying to get a robust theory of agency that matters in time is doomed, timelines are too short. Much less impossible to try to build AGI that listens reasonably to me out of current-gen stuff.”
(Not sure if either of these are fair, or if other camps fit this)
‘Straw MIRI researchers’ seems basically right to me. Though if I were trying to capture all MIRI research I’d probably replace “try to build a robust theory of agency” with “try to get deconfused about powerful general-purpose intelligence/optimization” or “try to ensure that the future developers of AGI aren’t flying blind; less like the black boxes of current ML, more like how NASA has to deal with some chaotic wind and weather patterns but the principles and parts of the rocket are fundamentally well-understood”.
‘Straw Paul Christiano’ doesn’t sound right to me, but I’m not sure how to fix it. Some things that felt off to me (though maybe I’m wrong about this too):
Disagreements about whether MIRI’s approach is doomed or too-hard seem smaller and less cruxy to me than disagreements about whether prosaic AGI alignment is doomed.
“Timelines are too short” doesn’t sound like a crux I’ve heard before.
A better example of a thing I think Paul thinks is pretty doomed is “trying to align AGI in hard-takeoff scenarios”. I could see takeoff speed/continuity being a crux: either disagreement about the likelihood of hard takeoff, or disagreement about the feasibility of alignment given hard takeoff.
(I got nerd-sniped by trying to develop a short description of what I do. The following is my stream of thought)
+1 to replacing “build a robust theory” with “get deconfused,” and with replacing “agency” with “intelligence/optimization,” although I think it is even better with all three. I don’t think “powerful” or “general-purpose” do very much for the tagline.
When I say what I do to someone (e.g. at a reunion) I say something like “I work in AI safety, by doing math/philosophy to try to become less confused about agency/intelligence/optimization.” (I dont think I actually have said this sentence, but I have said things close.)
I specifically say it with the slashes and not “and,” because I feel like it better conveys that there is only one thing that is hard to translate, but could be translated as “agency,” “intelligence,” or “optimization.”
I think it is probably better to also replace the word “about” with the word “around” for the same reason.
I wish I had a better word for “do.” “Study” is wrong. “Invent” and “discover” both seem wrong, because it is more like “invent/discover”, but that feels like it is overusing the slashes. Maybe “develop”? I think I like “invent” best. (Note that not knowing whether to say “invent” or “discover” is an example of being confused around agency/intelligence/optimization).
I also think I’ll replace “try to become” with “make myself.”
So, that leads me to “I invent math/philosophy to make myself less confused around agency/intelligence/optimization.”
I have no idea what to do with the first part. The first part feels political. In practice, I often say something like “I work in AI safety (so like trying to prevent the robot apocalypse) by...”, and I often try to make it boring and just say “AI safety,” depending on whether the audience is such that I want them to get the takeaway “Scott has a weird and mathy job that may or may not be about saving the world” vs I want them to bite on the agency part and talk to me about it.
I also think I jump sometimes between saying alignment, sometimes saying safety, and sometimes saying X-risk, and I am not sure why. I should probably pick one. For some reason I feel much less invested in getting the first half right. Maybe that is just because it is fun to say the robot apocalypse thing, and if I think too hard about it I will realize that is a bad idea.
The thing the “timelines are too short” was trying to get at was “it has to be competitive with mainstream AI in order to work” (pretty sure Paul has explicitly said this), with, what I thought was basically a followup assumption of “and timelines are too short to have time to get a competitive thing based off the kind of deconfusion work that MIRI does.”
I’d have thought the Paul-argument is less timeline-dependent than that—more like ‘even if timelines are long, there’s no reason to expect any totally new unexplored research direction to pay off so spectacularly that it can compete with the state of the art n years from now; and prosaic alignment seems like it may work, so we should focus more on that until we’re confident it’s a dead end’.
The base rate of new ideas paying off in a big way, even if they’re very promising-seeming at the outset, is super low. It may be useful for some people to pursue ideas like this, but (on my possibly-flawed Paul-model) the bulk of the field’s attention should be on AI techniques that already have a proven track record of competitiveness, until we know this is unworkable.
Whereas if you’re already confident that scaled-up deep learning in the vein of current ML is unalignable, then base rates are a bit of a moot point; we have to find new approaches one way or another, even if it’s hard-in-expectation. So “are scaled-up deep nets a complete dead end in terms of alignability?” seems like an especially key crux to me.
Caveat: I didn’t run the above comments by MIRI researchers, and MIRI researchers aren’t a monolith in any case. E.g., I could imagine people’s probabilities in “scaled-up deep nets are a complete dead end in terms of alignability” looking like “Eliezer ≈ Benya ≈ Nate >> Scott >> Abram > Evan >> Paul”, or something?
Seems like different AI alignment perspectives sometimes are about “which thing seems least impossible.”
Straw MIRI researchers: “building AGI out of modern machine learning is automatically too messy and doomed. Much less impossible to try to build a robust theory of agency first.”
Straw Paul Christiano: “trying to get a robust theory of agency that matters in time is doomed, timelines are too short. Much less impossible to try to build AGI that listens reasonably to me out of current-gen stuff.”
(Not sure if either of these are fair, or if other camps fit this)
‘Straw MIRI researchers’ seems basically right to me. Though if I were trying to capture all MIRI research I’d probably replace “try to build a robust theory of agency” with “try to get deconfused about powerful general-purpose intelligence/optimization” or “try to ensure that the future developers of AGI aren’t flying blind; less like the black boxes of current ML, more like how NASA has to deal with some chaotic wind and weather patterns but the principles and parts of the rocket are fundamentally well-understood”.
‘Straw Paul Christiano’ doesn’t sound right to me, but I’m not sure how to fix it. Some things that felt off to me (though maybe I’m wrong about this too):
Disagreements about whether MIRI’s approach is doomed or too-hard seem smaller and less cruxy to me than disagreements about whether prosaic AGI alignment is doomed.
“Timelines are too short” doesn’t sound like a crux I’ve heard before.
A better example of a thing I think Paul thinks is pretty doomed is “trying to align AGI in hard-takeoff scenarios”. I could see takeoff speed/continuity being a crux: either disagreement about the likelihood of hard takeoff, or disagreement about the feasibility of alignment given hard takeoff.
(I got nerd-sniped by trying to develop a short description of what I do. The following is my stream of thought)
+1 to replacing “build a robust theory” with “get deconfused,” and with replacing “agency” with “intelligence/optimization,” although I think it is even better with all three. I don’t think “powerful” or “general-purpose” do very much for the tagline.
When I say what I do to someone (e.g. at a reunion) I say something like “I work in AI safety, by doing math/philosophy to try to become less confused about agency/intelligence/optimization.” (I dont think I actually have said this sentence, but I have said things close.)
I specifically say it with the slashes and not “and,” because I feel like it better conveys that there is only one thing that is hard to translate, but could be translated as “agency,” “intelligence,” or “optimization.”
I think it is probably better to also replace the word “about” with the word “around” for the same reason.
I wish I had a better word for “do.” “Study” is wrong. “Invent” and “discover” both seem wrong, because it is more like “invent/discover”, but that feels like it is overusing the slashes. Maybe “develop”? I think I like “invent” best. (Note that not knowing whether to say “invent” or “discover” is an example of being confused around agency/intelligence/optimization).
I also think I’ll replace “try to become” with “make myself.”
So, that leads me to “I invent math/philosophy to make myself less confused around agency/intelligence/optimization.”
I have no idea what to do with the first part. The first part feels political. In practice, I often say something like “I work in AI safety (so like trying to prevent the robot apocalypse) by...”, and I often try to make it boring and just say “AI safety,” depending on whether the audience is such that I want them to get the takeaway “Scott has a weird and mathy job that may or may not be about saving the world” vs I want them to bite on the agency part and talk to me about it.
I also think I jump sometimes between saying alignment, sometimes saying safety, and sometimes saying X-risk, and I am not sure why. I should probably pick one. For some reason I feel much less invested in getting the first half right. Maybe that is just because it is fun to say the robot apocalypse thing, and if I think too hard about it I will realize that is a bad idea.
The thing the “timelines are too short” was trying to get at was “it has to be competitive with mainstream AI in order to work” (pretty sure Paul has explicitly said this), with, what I thought was basically a followup assumption of “and timelines are too short to have time to get a competitive thing based off the kind of deconfusion work that MIRI does.”
I’d have thought the Paul-argument is less timeline-dependent than that—more like ‘even if timelines are long, there’s no reason to expect any totally new unexplored research direction to pay off so spectacularly that it can compete with the state of the art n years from now; and prosaic alignment seems like it may work, so we should focus more on that until we’re confident it’s a dead end’.
The base rate of new ideas paying off in a big way, even if they’re very promising-seeming at the outset, is super low. It may be useful for some people to pursue ideas like this, but (on my possibly-flawed Paul-model) the bulk of the field’s attention should be on AI techniques that already have a proven track record of competitiveness, until we know this is unworkable.
Whereas if you’re already confident that scaled-up deep learning in the vein of current ML is unalignable, then base rates are a bit of a moot point; we have to find new approaches one way or another, even if it’s hard-in-expectation. So “are scaled-up deep nets a complete dead end in terms of alignability?” seems like an especially key crux to me.
Caveat: I didn’t run the above comments by MIRI researchers, and MIRI researchers aren’t a monolith in any case. E.g., I could imagine people’s probabilities in “scaled-up deep nets are a complete dead end in terms of alignability” looking like “Eliezer ≈ Benya ≈ Nate >> Scott >> Abram > Evan >> Paul”, or something?
Okay, that is compatible with the rest of my Paul model. Does still seem to fit into the ‘what’s least impossible’ frame.