(Later added disclaimer: it’s a good idea to add “I feel like...” before the judgment in this comment, so that you keep in mind that I’m talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))
Okay, so you’re completely right that a lot of my points are logically downstream of the debate on whether Prosaic Alignment is Impossible or not. But I feel like you don’t get how one sided this debate is, and how misrepresented it is here (and generally on the AF)
Like nobody except EY and a bunch of core MIRI people actually believes that prosaic alignment is impossible. I mean that every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard. That includes MIRI people like Evan Hubinger too. And note that some of these other alignment researchers actually work with Neural Nets and keep up to speed on the implementation details and subtleties, which in my book means their voice should count more.
But that’s just a majority argument. The real problem is that nobody has ever given a good argument on why this is impossible. I mean the analogous situation is that a car is driving right at you, accelerating, and you’ve decided somehow that it’s impossible to ever stop it before it kills you. You need a very strong case before giving up like that. And that has not been given by EY and MIRI AFAIK.
The last part of this is that because EY and MIRI founded the field, their view is given far more credibility than what it would have on the basis of the arguments alone, and far more than it has in actual discussions between researchers.
The best analogy I can find (a bit strawmanish but less than you would expect) is a world where somehow the people who had founded the study of cancer had the idea that no method based on biological experimentation and thinking about cells could ever cure cancer, and that the only way of solving it was to understand every dynamics in a very advanced category theoretic model. Then having found the latter really hard, they just say that curing cancer is impossible.
I think one core issue here is that there are actually two debates going on. One is “how hard is the alignment problem?”; another is “how powerful are prosaic alignment techniques?” Broadly speaking, I’d characterise most of the disagreement as being on the first question. But you’re treating it like it’s mostly on the second question—like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.
My attempt to portray EY’s perspective is more like: he’s concerned with the problem of ageing, and a whole bunch of people have come along, said they agree with him, and started proposing ways to cure cancer using prosaic radiotherapy techniques. Now he’s trying to say: no, your work is not addressing the core problem of ageing, which is going to kill us unless we make a big theoretical breakthrough.
Regardless of that, calling the debate “one sided” seems way too strong, especially given how many selection effects are involved. I mean, you could also call the debate about whether alignment is even a problem “one sided” − 95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily. But for fairly similar meta-level reasons as why it’s good for them to listen to us in an open-minded way, it’s also good for prosaic alignment researchers to listen to EY in an open-minded way. (As a side note, I’d be curious what credence you place on EY’s worldview being more true than the prosaic alignment worldview.)
Now, your complaint might be that MIRI has not made their case enough over the last few years. If that’s the main issue, then stay tuned; as Rob said, this is just the preface to a bunch of relevant material.
95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily
The 2016 survey of people in AI asked people about the alignment problem as described by Stuart Russell, and 39% said it was an important problem and 33% that it’s a harder problem than most other problem in the field.
I think one core issue here is that there are actually two debates going on. One is “how hard is the alignment problem?”; another is “how powerful are prosaic alignment techniques?” Broadly speaking, I’d characterise most of the disagreement as being on the first question. But you’re treating it like it’s mostly on the second question—like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.
That’s an interesting separation of the problem, because I really feel there is more disagreement on the second question than on the first.
My attempt to portray EY’s perspective is more like: he’s concerned with the problem of ageing, and a whole bunch of people have come along, said they agree with him, and started proposing ways to cure cancer using prosaic radiotherapy techniques. Now he’s trying to say: no, your work is not addressing the core problem of ageing, which is going to kill us unless we make a big theoretical breakthrough.
Funnily, aren’t the people currently working on ageing using quite prosaic techniques? I completely agree that one need to go for the big problems, especially ones that only appear in more powerful regimes (which is why I am adamant that there should be places for researchers to think about distinctly AGI problems and not have to rephrase everything in a way that is palatable to ML academia). But people like Paul and Evan and more are actually going for the core problems IMO, just anchoring a lot of their thinking in current ML technologies. So I have trouble understanding how prosaic alignment isn’t trying to solve the problem at all. Maybe it’s just a disagreement on how large the “prosaic alignment category” is?
Regardless of that, calling the debate “one sided” seems way too strong, especially given how many selection effects are involved. I mean, you could also call the debate about whether alignment is even a problem “one sided” − 95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily. But for fairly similar meta-level reasons as why it’s good for them to listen to us in an open-minded way, it’s also good for prosaic alignment researchers to listen to EY in an open-minded way.
You definitely have a point, and I want to listen to EY in an open-minded way. It’s just harder when he writes things like everyone working on alignment is faking it and not giving much details. Also I feel that your comparison breaks a bit because compared to the debate with ML researchers (where most people against alignment haven’t even thought about the basics and make obvious mistakes), the other parties in this debate have thought long and hard about alignment. Maybe not as much as EY, but clearly much more than the ML researchers in the whole “is alignment even a problem” debate.
(As a side note, I’d be curious what credence you place on EY’s worldview being more true than the prosaic alignment worldview.)
At the moment I feel like I don’t have a good enough model of EY’s worldview, plus I’m annoyed by his statements, so any credence I give now would be biased against his worldview.
Now, your complaint might be that MIRI has not made their case enough over the last few years. If that’s the main issue, then stay tuned; as Rob said, this is just the preface to a bunch of relevant material.
I really feel there is more disagreement on the second question than on the first
What is this feeling based on? One way we could measure this is by asking people about how much AI xrisk there is conditional on there being no more research explicitly aimed at aligning AGIs. I expect that different people would give very different predictions.
People like Paul and Evan and more are actually going for the core problems IMO, just anchoring a lot of their thinking in current ML technologies.
Everyone agrees that Paul is trying to solve foundational problems. And it seems strange to criticise Eliezer’s position by citing the work of MIRI employees.
It’s just harder when he writes things like everyone working on alignment is faking it and not giving much details.
I worry that “Prosaic Alignment Is Doomed” seems a bit… off as the most appropriate crux. At least for me. It seems hard for someone to justifiably know that this is true with enough confidence to not even try anymore. To have essayed or otherwise precluded all promising paths of inquiry, to not even engage with the rest of the field, to not even try to argue other researchers out of their mistaken beliefs, because it’s all Hopeless.
Consider the following analogy: Someone who wants to gain muscle, but has thought a lot about nutrition and their genetic makeup and concluded that Direct Exercise Gains Are Doomed, and they should expend their energy elsewhere.
OK, maybe. But how about try going to the gym for a month anyways and see what happens?
The point isn’t “EY hasn’t spent a month of work thinking about prosaic alignment.” The point is that AFAICT, by MIRI/EY’s own values, valuable-seeming plans are being left to rot on the cutting room floor. Like,“core MIRI staff meet for an hour each month and attack corrigibility/deceptive cognition/etc with all they’ve got. They pay someone to transcribe the session and post the fruits / negative results / reasoning to AF, without individually committing to following up with comments.”
(I am excited by Rob Bensinger’s comment that this post is the start of more communication from MIRI)
Like nobody except EY and a bunch of core MIRI people actually believes that prosaic alignment is impossible. I mean that every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard. That includes MIRI people like Evan Hubinger too. And note that some of these other alignment researchers actually work with Neural Nets and keep up to speed on the implementation details and subtleties, which in my book means their voice should count more.
I don’t get the impression that Eliezer’s saying that alignment of prosaic AI is impossible. I think he’s saying “it’s almost certainly not going to happen because humans are bad at things.” That seems compatible with “every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard” (if you go with the “very hard” part).
Yes, +1 to this; I think it’s important to distinguish between impossible (which is a term I carefully avoided using in my earlier comment, precisely because of its theoretical implications) and doomed (which I think of as a conjunction of theoretical considerations—how hard is this problem?--and social/coordination ones—how likely is it that humans will have solved this problem before solving AGI?).
I currently view this as consistent with e.g. Eliezer’s claim that Chris Olah’s work, though potentially on a pathway to something important, is probably going to accomplish “far too little far too late”. I certainly didn’t read it as anything like an unconditional endorsement of Chris’ work, as e.g. this comment seems to imply.
Ditto—the first half makes it clear that any strategy which isn’t at most 2 years slower than an unaligned approach will be useless, and that prosaic AI safety falls into that bucket.
Thanks for elaborating. I don’t think I have the necessary familiarity with the alignment research community to assess your characterization of the situation, but I appreciate your willingness to raise potentially unpopular hypotheses to attention. +1
+1 for this whole conversation, including Adam pushing back re prosaic alignment / trying to articulate disagreements! I agree that this is an important thing to talk about more.
I like the ‘give more concrete feedback on specific research directions’ idea, especially if it helps clarify generators for Eliezer’s pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you’re simultaneously optimistic about all those approaches, then there must be some more basic disagreement(s) behind that.
From my perspective, the OP discussion is the opening salvo in ‘MIRI does a lot more model-sharing and discussion’. It’s more like a preface than like a conclusion, and the next topic we plan to focus on is why Eliezer-cluster people think alignment is hard, how we’re thinking about AGI, etc. In the meantime, I’m strongly in favor of arguing about this a bunch in the comments, sharing thoughts and reflections on your own models, etc. -- going straight for the meaty central disagreements now, not waiting to hash this out later.
Someone privately contacted me to express confusion, because they thought my ‘+1’ means that I think adamShimi’s initial comment was unusually great. That’s not the case. The reasons I commented positively are:
I think this overall exchange went well—it raised good points that might have otherwise been neglected, and everyone quickly reached agreement about the real crux.
I want to try to cancel out any impression that criticizing / pushing back on Eliezer-stuff is unwelcome, since Adam expressed worries about a “taboo on criticizing MIRI and EY too hard”.
On a more abstract level, I like seeing people ‘blurt out what they’re actually thinking’ (if done with enough restraint and willingness-to-update to mostly avoid demon threads), even if I disagree with the content of their thought. I think disagreements are often tied up in emotions, or pattern-recognition, or intuitive senses of ‘what a person/group/forum is like’. This can make it harder to epistemically converge about tough topics, because there’s a temptation to pretend your cruxes are more simple and legible than they really are, and end up talking about non-cruxy things.
Separately, I endorse Ben Pace’s question (“Can you make a positive case here for how the work being done on prosaic alignment leads to success?”) as the thing to focus on.
Thanks for the kind answer, even if we’re probably disagreeing about most points in this thread. I think message like yours really help in making everyone aware that such topics can actually be discussed publicly without big backlash.
I like the ‘give more concrete feedback on specific research directions’ idea, especially if it helps clarify generators for Eliezer’s pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you’re simultaneously optimistic about all those approaches, then there must be some more basic disagreement(s) behind that.
That sounds amazing! I definitely want to extract some of the epistemic strategies that EY uses to generate criticisms and break proposals. :)
From my perspective, the OP discussion is the opening salvo in ‘MIRI does a lot more model-sharing and discussion’. It’s more like a preface than like a conclusion, and the next topic we plan to focus on is why Eliezer-cluster people think alignment is hard, how we’re thinking about AGI, etc. In the meantime, I’m strongly in favor of arguing about this a bunch in the comments, sharing thoughts and reflections on your own models, etc. -- going straight for the meaty central disagreements now, not waiting to hash this out later.
(Later added disclaimer: it’s a good idea to add “I feel like...” before the judgment in this comment, so that you keep in mind that I’m talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))
Okay, so you’re completely right that a lot of my points are logically downstream of the debate on whether Prosaic Alignment is Impossible or not. But I feel like you don’t get how one sided this debate is, and how misrepresented it is here (and generally on the AF)
Like nobody except EY and a bunch of core MIRI people actually believes that prosaic alignment is impossible. I mean that every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard. That includes MIRI people like Evan Hubinger too. And note that some of these other alignment researchers actually work with Neural Nets and keep up to speed on the implementation details and subtleties, which in my book means their voice should count more.
But that’s just a majority argument. The real problem is that nobody has ever given a good argument on why this is impossible. I mean the analogous situation is that a car is driving right at you, accelerating, and you’ve decided somehow that it’s impossible to ever stop it before it kills you. You need a very strong case before giving up like that. And that has not been given by EY and MIRI AFAIK.
The last part of this is that because EY and MIRI founded the field, their view is given far more credibility than what it would have on the basis of the arguments alone, and far more than it has in actual discussions between researchers.
The best analogy I can find (a bit strawmanish but less than you would expect) is a world where somehow the people who had founded the study of cancer had the idea that no method based on biological experimentation and thinking about cells could ever cure cancer, and that the only way of solving it was to understand every dynamics in a very advanced category theoretic model. Then having found the latter really hard, they just say that curing cancer is impossible.
I think one core issue here is that there are actually two debates going on. One is “how hard is the alignment problem?”; another is “how powerful are prosaic alignment techniques?” Broadly speaking, I’d characterise most of the disagreement as being on the first question. But you’re treating it like it’s mostly on the second question—like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.
My attempt to portray EY’s perspective is more like: he’s concerned with the problem of ageing, and a whole bunch of people have come along, said they agree with him, and started proposing ways to cure cancer using prosaic radiotherapy techniques. Now he’s trying to say: no, your work is not addressing the core problem of ageing, which is going to kill us unless we make a big theoretical breakthrough.
Regardless of that, calling the debate “one sided” seems way too strong, especially given how many selection effects are involved. I mean, you could also call the debate about whether alignment is even a problem “one sided” − 95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily. But for fairly similar meta-level reasons as why it’s good for them to listen to us in an open-minded way, it’s also good for prosaic alignment researchers to listen to EY in an open-minded way. (As a side note, I’d be curious what credence you place on EY’s worldview being more true than the prosaic alignment worldview.)
Now, your complaint might be that MIRI has not made their case enough over the last few years. If that’s the main issue, then stay tuned; as Rob said, this is just the preface to a bunch of relevant material.
The 2016 survey of people in AI asked people about the alignment problem as described by Stuart Russell, and 39% said it was an important problem and 33% that it’s a harder problem than most other problem in the field.
Thanks for the detailed comment!
That’s an interesting separation of the problem, because I really feel there is more disagreement on the second question than on the first.
Funnily, aren’t the people currently working on ageing using quite prosaic techniques? I completely agree that one need to go for the big problems, especially ones that only appear in more powerful regimes (which is why I am adamant that there should be places for researchers to think about distinctly AGI problems and not have to rephrase everything in a way that is palatable to ML academia). But people like Paul and Evan and more are actually going for the core problems IMO, just anchoring a lot of their thinking in current ML technologies. So I have trouble understanding how prosaic alignment isn’t trying to solve the problem at all. Maybe it’s just a disagreement on how large the “prosaic alignment category” is?
You definitely have a point, and I want to listen to EY in an open-minded way. It’s just harder when he writes things like everyone working on alignment is faking it and not giving much details. Also I feel that your comparison breaks a bit because compared to the debate with ML researchers (where most people against alignment haven’t even thought about the basics and make obvious mistakes), the other parties in this debate have thought long and hard about alignment. Maybe not as much as EY, but clearly much more than the ML researchers in the whole “is alignment even a problem” debate.
At the moment I feel like I don’t have a good enough model of EY’s worldview, plus I’m annoyed by his statements, so any credence I give now would be biased against his worldview.
Yeah, excited about that!
What is this feeling based on? One way we could measure this is by asking people about how much AI xrisk there is conditional on there being no more research explicitly aimed at aligning AGIs. I expect that different people would give very different predictions.
Everyone agrees that Paul is trying to solve foundational problems. And it seems strange to criticise Eliezer’s position by citing the work of MIRI employees.
As Rob pointed out above, this straightforwardly mischaracterises what Eliezer said.
I worry that “Prosaic Alignment Is Doomed” seems a bit… off as the most appropriate crux. At least for me. It seems hard for someone to justifiably know that this is true with enough confidence to not even try anymore. To have essayed or otherwise precluded all promising paths of inquiry, to not even engage with the rest of the field, to not even try to argue other researchers out of their mistaken beliefs, because it’s all Hopeless.
Consider the following analogy: Someone who wants to gain muscle, but has thought a lot about nutrition and their genetic makeup and concluded that Direct Exercise Gains Are Doomed, and they should expend their energy elsewhere.
OK, maybe. But how about try going to the gym for a month anyways and see what happens?
The point isn’t “EY hasn’t spent a month of work thinking about prosaic alignment.” The point is that AFAICT, by MIRI/EY’s own values, valuable-seeming plans are being left to rot on the cutting room floor. Like, “core MIRI staff meet for an hour each month and attack corrigibility/deceptive cognition/etc with all they’ve got. They pay someone to transcribe the session and post the fruits / negative results / reasoning to AF, without individually committing to following up with comments.”
(I am excited by Rob Bensinger’s comment that this post is the start of more communication from MIRI)
I don’t get the impression that Eliezer’s saying that alignment of prosaic AI is impossible. I think he’s saying “it’s almost certainly not going to happen because humans are bad at things.” That seems compatible with “every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard” (if you go with the “very hard” part).
Yes, +1 to this; I think it’s important to distinguish between impossible (which is a term I carefully avoided using in my earlier comment, precisely because of its theoretical implications) and doomed (which I think of as a conjunction of theoretical considerations—how hard is this problem?--and social/coordination ones—how likely is it that humans will have solved this problem before solving AGI?).
I currently view this as consistent with e.g. Eliezer’s claim that Chris Olah’s work, though potentially on a pathway to something important, is probably going to accomplish “far too little far too late”. I certainly didn’t read it as anything like an unconditional endorsement of Chris’ work, as e.g. this comment seems to imply.
Ditto—the first half makes it clear that any strategy which isn’t at most 2 years slower than an unaligned approach will be useless, and that prosaic AI safety falls into that bucket.
Thanks for elaborating. I don’t think I have the necessary familiarity with the alignment research community to assess your characterization of the situation, but I appreciate your willingness to raise potentially unpopular hypotheses to attention. +1
Thanks for taking the time of asking a question about the discussion even if you lack expertise on the topic. ;)
+1 for this whole conversation, including Adam pushing back re prosaic alignment / trying to articulate disagreements! I agree that this is an important thing to talk about more.
I like the ‘give more concrete feedback on specific research directions’ idea, especially if it helps clarify generators for Eliezer’s pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you’re simultaneously optimistic about all those approaches, then there must be some more basic disagreement(s) behind that.
From my perspective, the OP discussion is the opening salvo in ‘MIRI does a lot more model-sharing and discussion’. It’s more like a preface than like a conclusion, and the next topic we plan to focus on is why Eliezer-cluster people think alignment is hard, how we’re thinking about AGI, etc. In the meantime, I’m strongly in favor of arguing about this a bunch in the comments, sharing thoughts and reflections on your own models, etc. -- going straight for the meaty central disagreements now, not waiting to hash this out later.
Someone privately contacted me to express confusion, because they thought my ‘+1’ means that I think adamShimi’s initial comment was unusually great. That’s not the case. The reasons I commented positively are:
I think this overall exchange went well—it raised good points that might have otherwise been neglected, and everyone quickly reached agreement about the real crux.
I want to try to cancel out any impression that criticizing / pushing back on Eliezer-stuff is unwelcome, since Adam expressed worries about a “taboo on criticizing MIRI and EY too hard”.
On a more abstract level, I like seeing people ‘blurt out what they’re actually thinking’ (if done with enough restraint and willingness-to-update to mostly avoid demon threads), even if I disagree with the content of their thought. I think disagreements are often tied up in emotions, or pattern-recognition, or intuitive senses of ‘what a person/group/forum is like’. This can make it harder to epistemically converge about tough topics, because there’s a temptation to pretend your cruxes are more simple and legible than they really are, and end up talking about non-cruxy things.
Separately, I endorse Ben Pace’s question (“Can you make a positive case here for how the work being done on prosaic alignment leads to success?”) as the thing to focus on.
Thanks for the kind answer, even if we’re probably disagreeing about most points in this thread. I think message like yours really help in making everyone aware that such topics can actually be discussed publicly without big backlash.
That sounds amazing! I definitely want to extract some of the epistemic strategies that EY uses to generate criticisms and break proposals. :)
Excited about that!