This is an apology for the tone and the framing of the above comment (and my following answers), which have both been needlessly aggressive, status-focused and uncharitable. Underneath are still issues that matter a lot to me, but others have discussed them better (I’ll provide a list of linked comments at the end of this one).
Thanks to Richard Ngo for convincing me that I actually needed to write such an apology, which was probably the needed push for me to stop weaseling around it.
So what did I do wrong? The list is pretty damning:
I took something about the original post that I didn’t understand — EY’s “And then there is, so far as I can tell, a vast desert full of work that seems to me to be mostly fake or pointless or predictable.” — and because it didn’t make sense to me, and because that fitted with my stereotypes for MIRI and EY’s dismissiveness of a lot of work in alignment, I turned to an explanation of this as an attack on alignment researchers, saying they were consciously faking it when they knew they should do better. Whereas I feel know that what EY meant is far closer to alignment research at the moment is trying to try to align AI as best as we can, instead of just trying to do it. I’m still not sure if I agree with that characterization, but that sounds far more like something that can be discussed.
There’s also a weird aspect of status-criticism to my comment that I think I completely failed to explain. Looking at my motives now (let’s be wary of hindsight...), I feel like my issue with the status things was more that a bunch of people other than EY and MIRI just take what they say as super strong evidence without looking at all the arguments and details, and thus I expected this post and recent MIRI publications to create a background of “we’re doomed” for a lot of casual observers, with the force of the status of EY and MIRI. But I don’t want to say that EY and MIRI are given too much status in general in the community, even if I actually wrote something along those lines. I guess it’s just easier to focus your criticism on the beacon of status than on the invisible crowd misusing status. Sorry about that.
I somehow turned that into an attack of MIRI’s research (at least a chunk of it), which didn’t really have anything to do with it. That probably was just the manifestation of my frustration when people come to the field and feel like they shouldn’t do the experimental research that they fill better suited for or feel like they need to learn a lot of advanced maths. Even if those are not official MIRI positions, I definitely feel MIRI has had a big influence on them. And yet, maybe newcomers should question themselves that way. It always sounded like a loss of potential to me, because the outcome is often to not do alignment; but maybe even if you’re into experiments, the best way you could align AIs now doesn’t go through that path (and you could still find that exciting enough to find new research). Whatever the correct answer is, my weird ad-hominem attack has nothing to do with it, so I apologize for attacking all of MIRI’s research and their research agendas choice with it (even if I think talking more about what is and was the right choice still matters)
Part of my failure here has also been to not check for the fact that aggressive writing just feels snappier without much effort. I still think my paragraph starting with “When I’m not frustrated by this situation, I’m just sad.” works pretty well as an independent piece of writing, but it’s obviously needlessly aggressive and spicy, and doesn’t leave any room for the doubt that I actually felt or the doubts I should have felt. My answers after that comment are better, but still riding too much on that tone.
One of the saddest failure (pointed to me by Richard) is that by my tone and my presentation, I made it harder and more aversive for MIRI and EY to share their models, because they have to fear a bit more that kind of reaction. And even if Rob reacted really nicely, I expect that required a bunch of additional mental energy than a better comment wouldn’t have asked for. So I apologize for that, and really want more model-building and discussions from MIRI and EY publicly.
So in summary, my comment should have been something along the line of “Hey, I don’t understand what are your generators for saying that all alignment research is ‘mostly fake or pointless or predictable’, could you give me some pointers to that”. I wasn’t in the head space or had the right handles to frame it that way and not go into weirdly aggressive tangents, and that’s on me.
On the plus side, every other comments on the thread has been high-quality and thoughtful, so here’s a list of the best ones IMO:
Ben Pace’s comment on what success stories for alignment would look like, giving examples.
Rob Bensinger’s comment about the directions of prosaic alignment I wrote I was excited about, and whether they’re “moving the dial”.
Rohin Shah’s comment which frames the outside view of MIRI I was pointing out better than I did and not aggressively.
John Wentworth’s twocomments about the generators of EY’s pessimism being in the sequences all along.
Vaniver’s comment presenting an analysis of why some concrete ML work in alignment doesn’t seem to help for the AGI level.
Rob Bensinger’s comment drawing a great list of distinction to clarify the debate.
Although I don’t usually write LW comments, I’m writing a post right now and this is helping me clarify my thoughts on a range of historical incidents.
In hindsight, I’m worried that you wrote this apology. I think it’s an unhealthy obeisance.
I suspect you noticed how Eliezer often works to degrade the status of people who disagree with him and otherwise treats them poorly. As I will support in an upcoming essay, his writing is often optimized to exploit intellectual insecurity (e.g. by frequently praising his own expertise, or appealing to a fictional utopia of fictional geniuses who agree that you’re an idiot or wrong[1]) and to demean others’ contributions (e.g. by claiming to have invented them already, or calling them fake, or emphasizing how far behind everyone else is). It’s not that it’s impossible for these claims to have factual merit, but rather the presentation and the usage of these claims seem optimized to push others down. This has the effect of increasing his own status.
Anger and frustration are a rational reaction in that situation (though it’s important to express those emotions in healthy ways—I think your original comment wasn’t perfect there). And yet you ended up the one humbled for focusing on status too much!
by frequently praising his own expertise, or appealing to a fictional utopia of fictional geniuses who agree that you’re an idiot or wrong[1])
This part in particular is easily one of the most problematic things I see Yudkowsky do, because a fictional world can be almost arbitrarily different from our world, and thus lessons from a fictional world often fail to generalize (and that’s conditioning on it being logically coherent), so there’s very little reason to do this unless you are very careful, and at that point, you could just focus on the lessons of our own world’s history.
Even when I mention stuff like halting oracles, which are almost certainly not possible in the world we live in, I don’t make the mistake of thinking that halting oracles can give us insights for our own world, because our world and the world where we can compute the halting problem are so different as to make a lot of lessons non-transferable (I’m referring to a recent discussion in discord here).
There are good reasons why we should mostly not use fiction to inform real-world beliefs very much, courtesy of Eliezer Yudkowsky himself:
This is an apology for the tone and the framing of the above comment (and my following answers), which have both been needlessly aggressive, status-focused and uncharitable. Underneath are still issues that matter a lot to me, but others have discussed them better (I’ll provide a list of linked comments at the end of this one).
Thanks to Richard Ngo for convincing me that I actually needed to write such an apology, which was probably the needed push for me to stop weaseling around it.
So what did I do wrong? The list is pretty damning:
I took something about the original post that I didn’t understand — EY’s “And then there is, so far as I can tell, a vast desert full of work that seems to me to be mostly fake or pointless or predictable.” — and because it didn’t make sense to me, and because that fitted with my stereotypes for MIRI and EY’s dismissiveness of a lot of work in alignment, I turned to an explanation of this as an attack on alignment researchers, saying they were consciously faking it when they knew they should do better. Whereas I feel know that what EY meant is far closer to alignment research at the moment is trying to try to align AI as best as we can, instead of just trying to do it. I’m still not sure if I agree with that characterization, but that sounds far more like something that can be discussed.
There’s also a weird aspect of status-criticism to my comment that I think I completely failed to explain. Looking at my motives now (let’s be wary of hindsight...), I feel like my issue with the status things was more that a bunch of people other than EY and MIRI just take what they say as super strong evidence without looking at all the arguments and details, and thus I expected this post and recent MIRI publications to create a background of “we’re doomed” for a lot of casual observers, with the force of the status of EY and MIRI.
But I don’t want to say that EY and MIRI are given too much status in general in the community, even if I actually wrote something along those lines. I guess it’s just easier to focus your criticism on the beacon of status than on the invisible crowd misusing status. Sorry about that.
I somehow turned that into an attack of MIRI’s research (at least a chunk of it), which didn’t really have anything to do with it. That probably was just the manifestation of my frustration when people come to the field and feel like they shouldn’t do the experimental research that they fill better suited for or feel like they need to learn a lot of advanced maths. Even if those are not official MIRI positions, I definitely feel MIRI has had a big influence on them. And yet, maybe newcomers should question themselves that way. It always sounded like a loss of potential to me, because the outcome is often to not do alignment; but maybe even if you’re into experiments, the best way you could align AIs now doesn’t go through that path (and you could still find that exciting enough to find new research).
Whatever the correct answer is, my weird ad-hominem attack has nothing to do with it, so I apologize for attacking all of MIRI’s research and their research agendas choice with it (even if I think talking more about what is and was the right choice still matters)
Part of my failure here has also been to not check for the fact that aggressive writing just feels snappier without much effort. I still think my paragraph starting with “When I’m not frustrated by this situation, I’m just sad.” works pretty well as an independent piece of writing, but it’s obviously needlessly aggressive and spicy, and doesn’t leave any room for the doubt that I actually felt or the doubts I should have felt. My answers after that comment are better, but still riding too much on that tone.
One of the saddest failure (pointed to me by Richard) is that by my tone and my presentation, I made it harder and more aversive for MIRI and EY to share their models, because they have to fear a bit more that kind of reaction. And even if Rob reacted really nicely, I expect that required a bunch of additional mental energy than a better comment wouldn’t have asked for.
So I apologize for that, and really want more model-building and discussions from MIRI and EY publicly.
So in summary, my comment should have been something along the line of “Hey, I don’t understand what are your generators for saying that all alignment research is ‘mostly fake or pointless or predictable’, could you give me some pointers to that”. I wasn’t in the head space or had the right handles to frame it that way and not go into weirdly aggressive tangents, and that’s on me.
On the plus side, every other comments on the thread has been high-quality and thoughtful, so here’s a list of the best ones IMO:
Ben Pace’s comment on what success stories for alignment would look like, giving examples.
Rob Bensinger’s comment about the directions of prosaic alignment I wrote I was excited about, and whether they’re “moving the dial”.
Rohin Shah’s comment which frames the outside view of MIRI I was pointing out better than I did and not aggressively.
John Wentworth’s two comments about the generators of EY’s pessimism being in the sequences all along.
Vaniver’s comment presenting an analysis of why some concrete ML work in alignment doesn’t seem to help for the AGI level.
Rob Bensinger’s comment drawing a great list of distinction to clarify the debate.
Although I don’t usually write LW comments, I’m writing a post right now and this is helping me clarify my thoughts on a range of historical incidents.
In hindsight, I’m worried that you wrote this apology. I think it’s an unhealthy obeisance.
I suspect you noticed how Eliezer often works to degrade the status of people who disagree with him and otherwise treats them poorly. As I will support in an upcoming essay, his writing is often optimized to exploit intellectual insecurity (e.g. by frequently praising his own expertise, or appealing to a fictional utopia of fictional geniuses who agree that you’re an idiot or wrong[1]) and to demean others’ contributions (e.g. by claiming to have invented them already, or calling them fake, or emphasizing how far behind everyone else is). It’s not that it’s impossible for these claims to have factual merit, but rather the presentation and the usage of these claims seem optimized to push others down. This has the effect of increasing his own status.
Anger and frustration are a rational reaction in that situation (though it’s important to express those emotions in healthy ways—I think your original comment wasn’t perfect there). And yet you ended up the one humbled for focusing on status too much!
See https://www.lesswrong.com/posts/tcCxPLBrEXdxN5HCQ/shah-and-yudkowsky-on-alignment-failures and search for “even if he looks odd to you because you’re not seeing the population of other dath ilani.”
This part in particular is easily one of the most problematic things I see Yudkowsky do, because a fictional world can be almost arbitrarily different from our world, and thus lessons from a fictional world often fail to generalize (and that’s conditioning on it being logically coherent), so there’s very little reason to do this unless you are very careful, and at that point, you could just focus on the lessons of our own world’s history.
Even when I mention stuff like halting oracles, which are almost certainly not possible in the world we live in, I don’t make the mistake of thinking that halting oracles can give us insights for our own world, because our world and the world where we can compute the halting problem are so different as to make a lot of lessons non-transferable (I’m referring to a recent discussion in discord here).
There are good reasons why we should mostly not use fiction to inform real-world beliefs very much, courtesy of Eliezer Yudkowsky himself:
https://www.lesswrong.com/posts/rHBdcHGLJ7KvLJQPk/the-logical-fallacy-of-generalization-from-fictional
Thank you for this follow-up comment Adam, I appreciate it.