Similarly, the fact that they kept at it over and over with all the big improvement of DL instead of trying to adapt to prosaic Alignment sounds like evidence that they might be over attached to a specific framing, which they had trouble to discard.
I’m… confused by this framing? Specifically, this bit (as well as other bits like these)
I have to explain again and again to stressed-out newcomers that you definitely don’t need to master model theory or decision theory to do alignment, and try to steer them towards problems and questions that look like they’re actually moving the ball instead of following the lead of the “figure of authority”.
Some of the brightest and first thinkers on alignment have decided to follow their own nerd-sniping and call everyone else fakers, and when they realized they were not actually making progress, they didn’t switch to something else as much as declare everyone was still full of it
Also, I don’t know how much is related to mental health and pessimism and depression (which I completely understand can color one’s view of the world), but I would love to see the core MIRI team and EY actually try solving alignment with neural nets and prosaic AI. Starting with all their fears and caveats, sure, but then be like “fuck it, let’s just find a new way of grappling it”.
seem to be coming at the problem with [something like] a baked-in assumption that prosaic alignment is something that Actually Has A Chance Of Working?
And, like, to be clear, obviously if you’re working on prosaic alignment that’s going to be something you believe[1]. But it seems clear to me that EY/MIRI does not share this viewpoint, and all the disagreements you have regarding their treatment of other avenues of research seem to me to be logically downstream of this disagreement?
I mean, it’s possible I’m misinterpreting you here. But you’re saying things that (from my perspective) only make sense with the background assumption that “there’s more than one game in town”—things like “I wish EY/MIRI would spend more time engaging with other frames” and “I don’t like how they treat lack of progress in their frame as evidence that all other frames are similarly doomed”—and I feel like all of those arguments simply fail in the world where prosaic alignment is Actually Just Doomed, all the other frames Actually Just Go Nowhere, and conceptual alignment work of the MIRI variety is (more or less) The Only Game In Town.
To be clear: I’m pretty sure you don’t believe we live in that world. But I don’t think you can just export arguments from the world you think we live in to the world EY/MIRI thinks we live in; there needs to be a bridging step first, where you argue about which world we actually live in. I don’t think it makes sense to try and highlight the drawbacks of someone’s approach when they don’t share the same background premises as you, and the background premises they do hold imply a substantially different set of priorities and concerns.
Another thing it occurs to me your frustration could be about is the fact that you can’t actually argue this with EY/MIRI directly, because they don’t frequently make themselves available to discuss things. And if something like that’s the case, then I guess what I want to say is… I sympathize with you abstractly, but I think your efforts are misdirected? It’s okay for you and other alignment researchers to have different background premises from MIRI or even each other, and for you and those other researchers to be working on largely separate agendas as a result? I want to say that’s kind of what foundational research work looks like, in a field where (to a first approximation) nobody has any idea what the fuck they’re doing?
And yes, in the end [assuming somebody succeeds] that will likely mean that a bunch of people’s research directions were ultimately irrelevant. Most people, even. That’s… kind of unavoidable? And also not really the point, because you can’t know which line of research will be successful in advance, so all you have to go on is your best guess, which… may or may not be the same as somebody else’s best guess?
I dunno. I’m trying not to come across as too aggressive here, which is why I’m hedging so many of my claims. To some extent I feel uncomfortable trying to “police” people’s thoughts here, since I’m not actually an alignment researcher… but also it felt to me like your comment was trying to police people’s thoughts, and I don’t actually approve of that either, so...
Yeah. Take this how you will.
[1] I personally am (relatively) agnostic on this question, but as a non-expert in the field my opinion should matter relatively little; I mention this merely as a disclaimer that I am not necessarily on board with EY/MIRI about the doomed-ness of prosaic alignment.
(Later added disclaimer: it’s a good idea to add “I feel like...” before the judgment in this comment, so that you keep in mind that I’m talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))
Okay, so you’re completely right that a lot of my points are logically downstream of the debate on whether Prosaic Alignment is Impossible or not. But I feel like you don’t get how one sided this debate is, and how misrepresented it is here (and generally on the AF)
Like nobody except EY and a bunch of core MIRI people actually believes that prosaic alignment is impossible. I mean that every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard. That includes MIRI people like Evan Hubinger too. And note that some of these other alignment researchers actually work with Neural Nets and keep up to speed on the implementation details and subtleties, which in my book means their voice should count more.
But that’s just a majority argument. The real problem is that nobody has ever given a good argument on why this is impossible. I mean the analogous situation is that a car is driving right at you, accelerating, and you’ve decided somehow that it’s impossible to ever stop it before it kills you. You need a very strong case before giving up like that. And that has not been given by EY and MIRI AFAIK.
The last part of this is that because EY and MIRI founded the field, their view is given far more credibility than what it would have on the basis of the arguments alone, and far more than it has in actual discussions between researchers.
The best analogy I can find (a bit strawmanish but less than you would expect) is a world where somehow the people who had founded the study of cancer had the idea that no method based on biological experimentation and thinking about cells could ever cure cancer, and that the only way of solving it was to understand every dynamics in a very advanced category theoretic model. Then having found the latter really hard, they just say that curing cancer is impossible.
I think one core issue here is that there are actually two debates going on. One is “how hard is the alignment problem?”; another is “how powerful are prosaic alignment techniques?” Broadly speaking, I’d characterise most of the disagreement as being on the first question. But you’re treating it like it’s mostly on the second question—like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.
My attempt to portray EY’s perspective is more like: he’s concerned with the problem of ageing, and a whole bunch of people have come along, said they agree with him, and started proposing ways to cure cancer using prosaic radiotherapy techniques. Now he’s trying to say: no, your work is not addressing the core problem of ageing, which is going to kill us unless we make a big theoretical breakthrough.
Regardless of that, calling the debate “one sided” seems way too strong, especially given how many selection effects are involved. I mean, you could also call the debate about whether alignment is even a problem “one sided” − 95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily. But for fairly similar meta-level reasons as why it’s good for them to listen to us in an open-minded way, it’s also good for prosaic alignment researchers to listen to EY in an open-minded way. (As a side note, I’d be curious what credence you place on EY’s worldview being more true than the prosaic alignment worldview.)
Now, your complaint might be that MIRI has not made their case enough over the last few years. If that’s the main issue, then stay tuned; as Rob said, this is just the preface to a bunch of relevant material.
95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily
The 2016 survey of people in AI asked people about the alignment problem as described by Stuart Russell, and 39% said it was an important problem and 33% that it’s a harder problem than most other problem in the field.
I think one core issue here is that there are actually two debates going on. One is “how hard is the alignment problem?”; another is “how powerful are prosaic alignment techniques?” Broadly speaking, I’d characterise most of the disagreement as being on the first question. But you’re treating it like it’s mostly on the second question—like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.
That’s an interesting separation of the problem, because I really feel there is more disagreement on the second question than on the first.
My attempt to portray EY’s perspective is more like: he’s concerned with the problem of ageing, and a whole bunch of people have come along, said they agree with him, and started proposing ways to cure cancer using prosaic radiotherapy techniques. Now he’s trying to say: no, your work is not addressing the core problem of ageing, which is going to kill us unless we make a big theoretical breakthrough.
Funnily, aren’t the people currently working on ageing using quite prosaic techniques? I completely agree that one need to go for the big problems, especially ones that only appear in more powerful regimes (which is why I am adamant that there should be places for researchers to think about distinctly AGI problems and not have to rephrase everything in a way that is palatable to ML academia). But people like Paul and Evan and more are actually going for the core problems IMO, just anchoring a lot of their thinking in current ML technologies. So I have trouble understanding how prosaic alignment isn’t trying to solve the problem at all. Maybe it’s just a disagreement on how large the “prosaic alignment category” is?
Regardless of that, calling the debate “one sided” seems way too strong, especially given how many selection effects are involved. I mean, you could also call the debate about whether alignment is even a problem “one sided” − 95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily. But for fairly similar meta-level reasons as why it’s good for them to listen to us in an open-minded way, it’s also good for prosaic alignment researchers to listen to EY in an open-minded way.
You definitely have a point, and I want to listen to EY in an open-minded way. It’s just harder when he writes things like everyone working on alignment is faking it and not giving much details. Also I feel that your comparison breaks a bit because compared to the debate with ML researchers (where most people against alignment haven’t even thought about the basics and make obvious mistakes), the other parties in this debate have thought long and hard about alignment. Maybe not as much as EY, but clearly much more than the ML researchers in the whole “is alignment even a problem” debate.
(As a side note, I’d be curious what credence you place on EY’s worldview being more true than the prosaic alignment worldview.)
At the moment I feel like I don’t have a good enough model of EY’s worldview, plus I’m annoyed by his statements, so any credence I give now would be biased against his worldview.
Now, your complaint might be that MIRI has not made their case enough over the last few years. If that’s the main issue, then stay tuned; as Rob said, this is just the preface to a bunch of relevant material.
I really feel there is more disagreement on the second question than on the first
What is this feeling based on? One way we could measure this is by asking people about how much AI xrisk there is conditional on there being no more research explicitly aimed at aligning AGIs. I expect that different people would give very different predictions.
People like Paul and Evan and more are actually going for the core problems IMO, just anchoring a lot of their thinking in current ML technologies.
Everyone agrees that Paul is trying to solve foundational problems. And it seems strange to criticise Eliezer’s position by citing the work of MIRI employees.
It’s just harder when he writes things like everyone working on alignment is faking it and not giving much details.
I worry that “Prosaic Alignment Is Doomed” seems a bit… off as the most appropriate crux. At least for me. It seems hard for someone to justifiably know that this is true with enough confidence to not even try anymore. To have essayed or otherwise precluded all promising paths of inquiry, to not even engage with the rest of the field, to not even try to argue other researchers out of their mistaken beliefs, because it’s all Hopeless.
Consider the following analogy: Someone who wants to gain muscle, but has thought a lot about nutrition and their genetic makeup and concluded that Direct Exercise Gains Are Doomed, and they should expend their energy elsewhere.
OK, maybe. But how about try going to the gym for a month anyways and see what happens?
The point isn’t “EY hasn’t spent a month of work thinking about prosaic alignment.” The point is that AFAICT, by MIRI/EY’s own values, valuable-seeming plans are being left to rot on the cutting room floor. Like,“core MIRI staff meet for an hour each month and attack corrigibility/deceptive cognition/etc with all they’ve got. They pay someone to transcribe the session and post the fruits / negative results / reasoning to AF, without individually committing to following up with comments.”
(I am excited by Rob Bensinger’s comment that this post is the start of more communication from MIRI)
Like nobody except EY and a bunch of core MIRI people actually believes that prosaic alignment is impossible. I mean that every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard. That includes MIRI people like Evan Hubinger too. And note that some of these other alignment researchers actually work with Neural Nets and keep up to speed on the implementation details and subtleties, which in my book means their voice should count more.
I don’t get the impression that Eliezer’s saying that alignment of prosaic AI is impossible. I think he’s saying “it’s almost certainly not going to happen because humans are bad at things.” That seems compatible with “every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard” (if you go with the “very hard” part).
Yes, +1 to this; I think it’s important to distinguish between impossible (which is a term I carefully avoided using in my earlier comment, precisely because of its theoretical implications) and doomed (which I think of as a conjunction of theoretical considerations—how hard is this problem?--and social/coordination ones—how likely is it that humans will have solved this problem before solving AGI?).
I currently view this as consistent with e.g. Eliezer’s claim that Chris Olah’s work, though potentially on a pathway to something important, is probably going to accomplish “far too little far too late”. I certainly didn’t read it as anything like an unconditional endorsement of Chris’ work, as e.g. this comment seems to imply.
Ditto—the first half makes it clear that any strategy which isn’t at most 2 years slower than an unaligned approach will be useless, and that prosaic AI safety falls into that bucket.
Thanks for elaborating. I don’t think I have the necessary familiarity with the alignment research community to assess your characterization of the situation, but I appreciate your willingness to raise potentially unpopular hypotheses to attention. +1
+1 for this whole conversation, including Adam pushing back re prosaic alignment / trying to articulate disagreements! I agree that this is an important thing to talk about more.
I like the ‘give more concrete feedback on specific research directions’ idea, especially if it helps clarify generators for Eliezer’s pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you’re simultaneously optimistic about all those approaches, then there must be some more basic disagreement(s) behind that.
From my perspective, the OP discussion is the opening salvo in ‘MIRI does a lot more model-sharing and discussion’. It’s more like a preface than like a conclusion, and the next topic we plan to focus on is why Eliezer-cluster people think alignment is hard, how we’re thinking about AGI, etc. In the meantime, I’m strongly in favor of arguing about this a bunch in the comments, sharing thoughts and reflections on your own models, etc. -- going straight for the meaty central disagreements now, not waiting to hash this out later.
Someone privately contacted me to express confusion, because they thought my ‘+1’ means that I think adamShimi’s initial comment was unusually great. That’s not the case. The reasons I commented positively are:
I think this overall exchange went well—it raised good points that might have otherwise been neglected, and everyone quickly reached agreement about the real crux.
I want to try to cancel out any impression that criticizing / pushing back on Eliezer-stuff is unwelcome, since Adam expressed worries about a “taboo on criticizing MIRI and EY too hard”.
On a more abstract level, I like seeing people ‘blurt out what they’re actually thinking’ (if done with enough restraint and willingness-to-update to mostly avoid demon threads), even if I disagree with the content of their thought. I think disagreements are often tied up in emotions, or pattern-recognition, or intuitive senses of ‘what a person/group/forum is like’. This can make it harder to epistemically converge about tough topics, because there’s a temptation to pretend your cruxes are more simple and legible than they really are, and end up talking about non-cruxy things.
Separately, I endorse Ben Pace’s question (“Can you make a positive case here for how the work being done on prosaic alignment leads to success?”) as the thing to focus on.
Thanks for the kind answer, even if we’re probably disagreeing about most points in this thread. I think message like yours really help in making everyone aware that such topics can actually be discussed publicly without big backlash.
I like the ‘give more concrete feedback on specific research directions’ idea, especially if it helps clarify generators for Eliezer’s pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you’re simultaneously optimistic about all those approaches, then there must be some more basic disagreement(s) behind that.
That sounds amazing! I definitely want to extract some of the epistemic strategies that EY uses to generate criticisms and break proposals. :)
From my perspective, the OP discussion is the opening salvo in ‘MIRI does a lot more model-sharing and discussion’. It’s more like a preface than like a conclusion, and the next topic we plan to focus on is why Eliezer-cluster people think alignment is hard, how we’re thinking about AGI, etc. In the meantime, I’m strongly in favor of arguing about this a bunch in the comments, sharing thoughts and reflections on your own models, etc. -- going straight for the meaty central disagreements now, not waiting to hash this out later.
Some things that seem important to distinguish here:
‘Prosaic alignment is doomed’. I parse this as: ‘Aligning AGI, without coming up with any fundamentally new ideas about AGI/intelligence or discovering any big “unknown unknowns” about AGI/intelligence, is doomed.’
I (and my Eliezer-model) endorse this, in large part because ML (as practiced today) produces such opaque and uninterpretable models. My sense is that Eliezer’s hopes largely route through understanding AGI systems’ internals better, rather than coming up with cleverer ways to apply external pressures to a black box.
‘All alignment work that involves running experiments on deep nets is doomed’.
My Eliezer-model doesn’t endorse this at all.
Also important to distinguish, IMO (making up the names here):
A strong ‘prosaic AGI’ thesis, like ‘AGI will just be GPT-n or some other scaled-up version of current systems’. Eliezer is extremely skeptical of this.
A weak ‘prosaic AGI’ thesis, like ‘AGI will involve coming up with new techniques, but the path between here and AGI won’t involve any fundamental paradigm shifts and won’t involve us learning any new deep things about intelligence’. I’m not sure what Eliezer’s unconditional view on this is, but I’d guess that he thinks this falls a lot in probability if we condition on something like ‘good outcomes are possible’—it’s very bad news.
An ‘unprosaic but not radically different AGI’ thesis, like ‘AGI might involve new paradigm shifts and/or new deep insights into intelligence, but it will still be similar enough to the current deep learning paradigm that we can potentially learn important stuff about alignable AGI by working with deep nets today’. I don’t think Eliezer has a strong view on this, though I observe that he thinks some of the most useful stuff humanity can do today is ‘run various alignment experiments on deep nets’.
An ‘AGI won’t be GOFAI’ thesis. Eliezer strongly endorses this.
There’s also an ‘inevitability thesis’ that I think is a crux here: my Eliezer-model thinks there are a wide variety of ways to build AGI that are very different, such that it matters a lot which option we steer toward (and various kinds of ‘prosaicness’ might be one parameter we can intervene on, rather than being a constant). My Paul-model has the opposite view, and endorses some version of inevitability.
Your comment and Vaniver’s (paraphrasing) “not surprised by the results of this work, so why do it?” especially helpful. EY (or others) assessing concrete research directions with detailed explanations would be even more helpful.
I agree with Rohin’s general question of “Can you tell a story where your research helps solve a specific alignment problem?”, and if you have other heuristics when assessing research, that would be good to know.
I’m… confused by this framing? Specifically, this bit (as well as other bits like these)
seem to be coming at the problem with [something like] a baked-in assumption that prosaic alignment is something that Actually Has A Chance Of Working?
And, like, to be clear, obviously if you’re working on prosaic alignment that’s going to be something you believe[1]. But it seems clear to me that EY/MIRI does not share this viewpoint, and all the disagreements you have regarding their treatment of other avenues of research seem to me to be logically downstream of this disagreement?
I mean, it’s possible I’m misinterpreting you here. But you’re saying things that (from my perspective) only make sense with the background assumption that “there’s more than one game in town”—things like “I wish EY/MIRI would spend more time engaging with other frames” and “I don’t like how they treat lack of progress in their frame as evidence that all other frames are similarly doomed”—and I feel like all of those arguments simply fail in the world where prosaic alignment is Actually Just Doomed, all the other frames Actually Just Go Nowhere, and conceptual alignment work of the MIRI variety is (more or less) The Only Game In Town.
To be clear: I’m pretty sure you don’t believe we live in that world. But I don’t think you can just export arguments from the world you think we live in to the world EY/MIRI thinks we live in; there needs to be a bridging step first, where you argue about which world we actually live in. I don’t think it makes sense to try and highlight the drawbacks of someone’s approach when they don’t share the same background premises as you, and the background premises they do hold imply a substantially different set of priorities and concerns.
Another thing it occurs to me your frustration could be about is the fact that you can’t actually argue this with EY/MIRI directly, because they don’t frequently make themselves available to discuss things. And if something like that’s the case, then I guess what I want to say is… I sympathize with you abstractly, but I think your efforts are misdirected? It’s okay for you and other alignment researchers to have different background premises from MIRI or even each other, and for you and those other researchers to be working on largely separate agendas as a result? I want to say that’s kind of what foundational research work looks like, in a field where (to a first approximation) nobody has any idea what the fuck they’re doing?
And yes, in the end [assuming somebody succeeds] that will likely mean that a bunch of people’s research directions were ultimately irrelevant. Most people, even. That’s… kind of unavoidable? And also not really the point, because you can’t know which line of research will be successful in advance, so all you have to go on is your best guess, which… may or may not be the same as somebody else’s best guess?
I dunno. I’m trying not to come across as too aggressive here, which is why I’m hedging so many of my claims. To some extent I feel uncomfortable trying to “police” people’s thoughts here, since I’m not actually an alignment researcher… but also it felt to me like your comment was trying to police people’s thoughts, and I don’t actually approve of that either, so...
Yeah. Take this how you will.
[1] I personally am (relatively) agnostic on this question, but as a non-expert in the field my opinion should matter relatively little; I mention this merely as a disclaimer that I am not necessarily on board with EY/MIRI about the doomed-ness of prosaic alignment.
(Later added disclaimer: it’s a good idea to add “I feel like...” before the judgment in this comment, so that you keep in mind that I’m talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))
Okay, so you’re completely right that a lot of my points are logically downstream of the debate on whether Prosaic Alignment is Impossible or not. But I feel like you don’t get how one sided this debate is, and how misrepresented it is here (and generally on the AF)
Like nobody except EY and a bunch of core MIRI people actually believes that prosaic alignment is impossible. I mean that every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard. That includes MIRI people like Evan Hubinger too. And note that some of these other alignment researchers actually work with Neural Nets and keep up to speed on the implementation details and subtleties, which in my book means their voice should count more.
But that’s just a majority argument. The real problem is that nobody has ever given a good argument on why this is impossible. I mean the analogous situation is that a car is driving right at you, accelerating, and you’ve decided somehow that it’s impossible to ever stop it before it kills you. You need a very strong case before giving up like that. And that has not been given by EY and MIRI AFAIK.
The last part of this is that because EY and MIRI founded the field, their view is given far more credibility than what it would have on the basis of the arguments alone, and far more than it has in actual discussions between researchers.
The best analogy I can find (a bit strawmanish but less than you would expect) is a world where somehow the people who had founded the study of cancer had the idea that no method based on biological experimentation and thinking about cells could ever cure cancer, and that the only way of solving it was to understand every dynamics in a very advanced category theoretic model. Then having found the latter really hard, they just say that curing cancer is impossible.
I think one core issue here is that there are actually two debates going on. One is “how hard is the alignment problem?”; another is “how powerful are prosaic alignment techniques?” Broadly speaking, I’d characterise most of the disagreement as being on the first question. But you’re treating it like it’s mostly on the second question—like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.
My attempt to portray EY’s perspective is more like: he’s concerned with the problem of ageing, and a whole bunch of people have come along, said they agree with him, and started proposing ways to cure cancer using prosaic radiotherapy techniques. Now he’s trying to say: no, your work is not addressing the core problem of ageing, which is going to kill us unless we make a big theoretical breakthrough.
Regardless of that, calling the debate “one sided” seems way too strong, especially given how many selection effects are involved. I mean, you could also call the debate about whether alignment is even a problem “one sided” − 95% of all ML researchers don’t think it’s a problem, or think it’s something we’ll solve easily. But for fairly similar meta-level reasons as why it’s good for them to listen to us in an open-minded way, it’s also good for prosaic alignment researchers to listen to EY in an open-minded way. (As a side note, I’d be curious what credence you place on EY’s worldview being more true than the prosaic alignment worldview.)
Now, your complaint might be that MIRI has not made their case enough over the last few years. If that’s the main issue, then stay tuned; as Rob said, this is just the preface to a bunch of relevant material.
The 2016 survey of people in AI asked people about the alignment problem as described by Stuart Russell, and 39% said it was an important problem and 33% that it’s a harder problem than most other problem in the field.
Thanks for the detailed comment!
That’s an interesting separation of the problem, because I really feel there is more disagreement on the second question than on the first.
Funnily, aren’t the people currently working on ageing using quite prosaic techniques? I completely agree that one need to go for the big problems, especially ones that only appear in more powerful regimes (which is why I am adamant that there should be places for researchers to think about distinctly AGI problems and not have to rephrase everything in a way that is palatable to ML academia). But people like Paul and Evan and more are actually going for the core problems IMO, just anchoring a lot of their thinking in current ML technologies. So I have trouble understanding how prosaic alignment isn’t trying to solve the problem at all. Maybe it’s just a disagreement on how large the “prosaic alignment category” is?
You definitely have a point, and I want to listen to EY in an open-minded way. It’s just harder when he writes things like everyone working on alignment is faking it and not giving much details. Also I feel that your comparison breaks a bit because compared to the debate with ML researchers (where most people against alignment haven’t even thought about the basics and make obvious mistakes), the other parties in this debate have thought long and hard about alignment. Maybe not as much as EY, but clearly much more than the ML researchers in the whole “is alignment even a problem” debate.
At the moment I feel like I don’t have a good enough model of EY’s worldview, plus I’m annoyed by his statements, so any credence I give now would be biased against his worldview.
Yeah, excited about that!
What is this feeling based on? One way we could measure this is by asking people about how much AI xrisk there is conditional on there being no more research explicitly aimed at aligning AGIs. I expect that different people would give very different predictions.
Everyone agrees that Paul is trying to solve foundational problems. And it seems strange to criticise Eliezer’s position by citing the work of MIRI employees.
As Rob pointed out above, this straightforwardly mischaracterises what Eliezer said.
I worry that “Prosaic Alignment Is Doomed” seems a bit… off as the most appropriate crux. At least for me. It seems hard for someone to justifiably know that this is true with enough confidence to not even try anymore. To have essayed or otherwise precluded all promising paths of inquiry, to not even engage with the rest of the field, to not even try to argue other researchers out of their mistaken beliefs, because it’s all Hopeless.
Consider the following analogy: Someone who wants to gain muscle, but has thought a lot about nutrition and their genetic makeup and concluded that Direct Exercise Gains Are Doomed, and they should expend their energy elsewhere.
OK, maybe. But how about try going to the gym for a month anyways and see what happens?
The point isn’t “EY hasn’t spent a month of work thinking about prosaic alignment.” The point is that AFAICT, by MIRI/EY’s own values, valuable-seeming plans are being left to rot on the cutting room floor. Like, “core MIRI staff meet for an hour each month and attack corrigibility/deceptive cognition/etc with all they’ve got. They pay someone to transcribe the session and post the fruits / negative results / reasoning to AF, without individually committing to following up with comments.”
(I am excited by Rob Bensinger’s comment that this post is the start of more communication from MIRI)
I don’t get the impression that Eliezer’s saying that alignment of prosaic AI is impossible. I think he’s saying “it’s almost certainly not going to happen because humans are bad at things.” That seems compatible with “every other researcher that I know think Prosaic Alignment is possible, even if potentially very hard” (if you go with the “very hard” part).
Yes, +1 to this; I think it’s important to distinguish between impossible (which is a term I carefully avoided using in my earlier comment, precisely because of its theoretical implications) and doomed (which I think of as a conjunction of theoretical considerations—how hard is this problem?--and social/coordination ones—how likely is it that humans will have solved this problem before solving AGI?).
I currently view this as consistent with e.g. Eliezer’s claim that Chris Olah’s work, though potentially on a pathway to something important, is probably going to accomplish “far too little far too late”. I certainly didn’t read it as anything like an unconditional endorsement of Chris’ work, as e.g. this comment seems to imply.
Ditto—the first half makes it clear that any strategy which isn’t at most 2 years slower than an unaligned approach will be useless, and that prosaic AI safety falls into that bucket.
Thanks for elaborating. I don’t think I have the necessary familiarity with the alignment research community to assess your characterization of the situation, but I appreciate your willingness to raise potentially unpopular hypotheses to attention. +1
Thanks for taking the time of asking a question about the discussion even if you lack expertise on the topic. ;)
+1 for this whole conversation, including Adam pushing back re prosaic alignment / trying to articulate disagreements! I agree that this is an important thing to talk about more.
I like the ‘give more concrete feedback on specific research directions’ idea, especially if it helps clarify generators for Eliezer’s pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you’re simultaneously optimistic about all those approaches, then there must be some more basic disagreement(s) behind that.
From my perspective, the OP discussion is the opening salvo in ‘MIRI does a lot more model-sharing and discussion’. It’s more like a preface than like a conclusion, and the next topic we plan to focus on is why Eliezer-cluster people think alignment is hard, how we’re thinking about AGI, etc. In the meantime, I’m strongly in favor of arguing about this a bunch in the comments, sharing thoughts and reflections on your own models, etc. -- going straight for the meaty central disagreements now, not waiting to hash this out later.
Someone privately contacted me to express confusion, because they thought my ‘+1’ means that I think adamShimi’s initial comment was unusually great. That’s not the case. The reasons I commented positively are:
I think this overall exchange went well—it raised good points that might have otherwise been neglected, and everyone quickly reached agreement about the real crux.
I want to try to cancel out any impression that criticizing / pushing back on Eliezer-stuff is unwelcome, since Adam expressed worries about a “taboo on criticizing MIRI and EY too hard”.
On a more abstract level, I like seeing people ‘blurt out what they’re actually thinking’ (if done with enough restraint and willingness-to-update to mostly avoid demon threads), even if I disagree with the content of their thought. I think disagreements are often tied up in emotions, or pattern-recognition, or intuitive senses of ‘what a person/group/forum is like’. This can make it harder to epistemically converge about tough topics, because there’s a temptation to pretend your cruxes are more simple and legible than they really are, and end up talking about non-cruxy things.
Separately, I endorse Ben Pace’s question (“Can you make a positive case here for how the work being done on prosaic alignment leads to success?”) as the thing to focus on.
Thanks for the kind answer, even if we’re probably disagreeing about most points in this thread. I think message like yours really help in making everyone aware that such topics can actually be discussed publicly without big backlash.
That sounds amazing! I definitely want to extract some of the epistemic strategies that EY uses to generate criticisms and break proposals. :)
Excited about that!
I don’t think the “Only Game in Town” argument works when EY in the OP says
As well as approving redwood research.
Some things that seem important to distinguish here:
‘Prosaic alignment is doomed’. I parse this as: ‘Aligning AGI, without coming up with any fundamentally new ideas about AGI/intelligence or discovering any big “unknown unknowns” about AGI/intelligence, is doomed.’
I (and my Eliezer-model) endorse this, in large part because ML (as practiced today) produces such opaque and uninterpretable models. My sense is that Eliezer’s hopes largely route through understanding AGI systems’ internals better, rather than coming up with cleverer ways to apply external pressures to a black box.
‘All alignment work that involves running experiments on deep nets is doomed’.
My Eliezer-model doesn’t endorse this at all.
Also important to distinguish, IMO (making up the names here):
A strong ‘prosaic AGI’ thesis, like ‘AGI will just be GPT-n or some other scaled-up version of current systems’. Eliezer is extremely skeptical of this.
A weak ‘prosaic AGI’ thesis, like ‘AGI will involve coming up with new techniques, but the path between here and AGI won’t involve any fundamental paradigm shifts and won’t involve us learning any new deep things about intelligence’. I’m not sure what Eliezer’s unconditional view on this is, but I’d guess that he thinks this falls a lot in probability if we condition on something like ‘good outcomes are possible’—it’s very bad news.
An ‘unprosaic but not radically different AGI’ thesis, like ‘AGI might involve new paradigm shifts and/or new deep insights into intelligence, but it will still be similar enough to the current deep learning paradigm that we can potentially learn important stuff about alignable AGI by working with deep nets today’. I don’t think Eliezer has a strong view on this, though I observe that he thinks some of the most useful stuff humanity can do today is ‘run various alignment experiments on deep nets’.
An ‘AGI won’t be GOFAI’ thesis. Eliezer strongly endorses this.
There’s also an ‘inevitability thesis’ that I think is a crux here: my Eliezer-model thinks there are a wide variety of ways to build AGI that are very different, such that it matters a lot which option we steer toward (and various kinds of ‘prosaicness’ might be one parameter we can intervene on, rather than being a constant). My Paul-model has the opposite view, and endorses some version of inevitability.
Note: GOFAI = Good Old Fashioned AI
Your comment and Vaniver’s (paraphrasing) “not surprised by the results of this work, so why do it?” especially helpful. EY (or others) assessing concrete research directions with detailed explanations would be even more helpful.
I agree with Rohin’s general question of “Can you tell a story where your research helps solve a specific alignment problem?”, and if you have other heuristics when assessing research, that would be good to know.
+1, plus endorsing Chris Olah