This is an apology for the tone and the framing of the above comment (and my following answers), which have both been needlessly aggressive, status-focused and uncharitable. Underneath are still issues that matter a lot to me, but others have discussed them better (I’ll provide a list of linked comments at the end of this one).
Thanks to Richard Ngo for convincing me that I actually needed to write such an apology, which was probably the needed push for me to stop weaseling around it.
So what did I do wrong? The list is pretty damning:
I took something about the original post that I didn’t understand — EY’s “And then there is, so far as I can tell, a vast desert full of work that seems to me to be mostly fake or pointless or predictable.” — and because it didn’t make sense to me, and because that fitted with my stereotypes for MIRI and EY’s dismissiveness of a lot of work in alignment, I turned to an explanation of this as an attack on alignment researchers, saying they were consciously faking it when they knew they should do better. Whereas I feel know that what EY meant is far closer to alignment research at the moment is trying to try to align AI as best as we can, instead of just trying to do it. I’m still not sure if I agree with that characterization, but that sounds far more like something that can be discussed.
There’s also a weird aspect of status-criticism to my comment that I think I completely failed to explain. Looking at my motives now (let’s be wary of hindsight...), I feel like my issue with the status things was more that a bunch of people other than EY and MIRI just take what they say as super strong evidence without looking at all the arguments and details, and thus I expected this post and recent MIRI publications to create a background of “we’re doomed” for a lot of casual observers, with the force of the status of EY and MIRI. But I don’t want to say that EY and MIRI are given too much status in general in the community, even if I actually wrote something along those lines. I guess it’s just easier to focus your criticism on the beacon of status than on the invisible crowd misusing status. Sorry about that.
I somehow turned that into an attack of MIRI’s research (at least a chunk of it), which didn’t really have anything to do with it. That probably was just the manifestation of my frustration when people come to the field and feel like they shouldn’t do the experimental research that they fill better suited for or feel like they need to learn a lot of advanced maths. Even if those are not official MIRI positions, I definitely feel MIRI has had a big influence on them. And yet, maybe newcomers should question themselves that way. It always sounded like a loss of potential to me, because the outcome is often to not do alignment; but maybe even if you’re into experiments, the best way you could align AIs now doesn’t go through that path (and you could still find that exciting enough to find new research). Whatever the correct answer is, my weird ad-hominem attack has nothing to do with it, so I apologize for attacking all of MIRI’s research and their research agendas choice with it (even if I think talking more about what is and was the right choice still matters)
Part of my failure here has also been to not check for the fact that aggressive writing just feels snappier without much effort. I still think my paragraph starting with “When I’m not frustrated by this situation, I’m just sad.” works pretty well as an independent piece of writing, but it’s obviously needlessly aggressive and spicy, and doesn’t leave any room for the doubt that I actually felt or the doubts I should have felt. My answers after that comment are better, but still riding too much on that tone.
One of the saddest failure (pointed to me by Richard) is that by my tone and my presentation, I made it harder and more aversive for MIRI and EY to share their models, because they have to fear a bit more that kind of reaction. And even if Rob reacted really nicely, I expect that required a bunch of additional mental energy than a better comment wouldn’t have asked for. So I apologize for that, and really want more model-building and discussions from MIRI and EY publicly.
So in summary, my comment should have been something along the line of “Hey, I don’t understand what are your generators for saying that all alignment research is ‘mostly fake or pointless or predictable’, could you give me some pointers to that”. I wasn’t in the head space or had the right handles to frame it that way and not go into weirdly aggressive tangents, and that’s on me.
On the plus side, every other comments on the thread has been high-quality and thoughtful, so here’s a list of the best ones IMO:
Ben Pace’s comment on what success stories for alignment would look like, giving examples.
Rob Bensinger’s comment about the directions of prosaic alignment I wrote I was excited about, and whether they’re “moving the dial”.
Rohin Shah’s comment which frames the outside view of MIRI I was pointing out better than I did and not aggressively.
John Wentworth’s twocomments about the generators of EY’s pessimism being in the sequences all along.
Vaniver’s comment presenting an analysis of why some concrete ML work in alignment doesn’t seem to help for the AGI level.
Rob Bensinger’s comment drawing a great list of distinction to clarify the debate.
This is an apology for the tone and the framing of the above comment (and my following answers), which have both been needlessly aggressive, status-focused and uncharitable. Underneath are still issues that matter a lot to me, but others have discussed them better (I’ll provide a list of linked comments at the end of this one).
Thanks to Richard Ngo for convincing me that I actually needed to write such an apology, which was probably the needed push for me to stop weaseling around it.
So what did I do wrong? The list is pretty damning:
I took something about the original post that I didn’t understand — EY’s “And then there is, so far as I can tell, a vast desert full of work that seems to me to be mostly fake or pointless or predictable.” — and because it didn’t make sense to me, and because that fitted with my stereotypes for MIRI and EY’s dismissiveness of a lot of work in alignment, I turned to an explanation of this as an attack on alignment researchers, saying they were consciously faking it when they knew they should do better. Whereas I feel know that what EY meant is far closer to alignment research at the moment is trying to try to align AI as best as we can, instead of just trying to do it. I’m still not sure if I agree with that characterization, but that sounds far more like something that can be discussed.
There’s also a weird aspect of status-criticism to my comment that I think I completely failed to explain. Looking at my motives now (let’s be wary of hindsight...), I feel like my issue with the status things was more that a bunch of people other than EY and MIRI just take what they say as super strong evidence without looking at all the arguments and details, and thus I expected this post and recent MIRI publications to create a background of “we’re doomed” for a lot of casual observers, with the force of the status of EY and MIRI.
But I don’t want to say that EY and MIRI are given too much status in general in the community, even if I actually wrote something along those lines. I guess it’s just easier to focus your criticism on the beacon of status than on the invisible crowd misusing status. Sorry about that.
I somehow turned that into an attack of MIRI’s research (at least a chunk of it), which didn’t really have anything to do with it. That probably was just the manifestation of my frustration when people come to the field and feel like they shouldn’t do the experimental research that they fill better suited for or feel like they need to learn a lot of advanced maths. Even if those are not official MIRI positions, I definitely feel MIRI has had a big influence on them. And yet, maybe newcomers should question themselves that way. It always sounded like a loss of potential to me, because the outcome is often to not do alignment; but maybe even if you’re into experiments, the best way you could align AIs now doesn’t go through that path (and you could still find that exciting enough to find new research).
Whatever the correct answer is, my weird ad-hominem attack has nothing to do with it, so I apologize for attacking all of MIRI’s research and their research agendas choice with it (even if I think talking more about what is and was the right choice still matters)
Part of my failure here has also been to not check for the fact that aggressive writing just feels snappier without much effort. I still think my paragraph starting with “When I’m not frustrated by this situation, I’m just sad.” works pretty well as an independent piece of writing, but it’s obviously needlessly aggressive and spicy, and doesn’t leave any room for the doubt that I actually felt or the doubts I should have felt. My answers after that comment are better, but still riding too much on that tone.
One of the saddest failure (pointed to me by Richard) is that by my tone and my presentation, I made it harder and more aversive for MIRI and EY to share their models, because they have to fear a bit more that kind of reaction. And even if Rob reacted really nicely, I expect that required a bunch of additional mental energy than a better comment wouldn’t have asked for.
So I apologize for that, and really want more model-building and discussions from MIRI and EY publicly.
So in summary, my comment should have been something along the line of “Hey, I don’t understand what are your generators for saying that all alignment research is ‘mostly fake or pointless or predictable’, could you give me some pointers to that”. I wasn’t in the head space or had the right handles to frame it that way and not go into weirdly aggressive tangents, and that’s on me.
On the plus side, every other comments on the thread has been high-quality and thoughtful, so here’s a list of the best ones IMO:
Ben Pace’s comment on what success stories for alignment would look like, giving examples.
Rob Bensinger’s comment about the directions of prosaic alignment I wrote I was excited about, and whether they’re “moving the dial”.
Rohin Shah’s comment which frames the outside view of MIRI I was pointing out better than I did and not aggressively.
John Wentworth’s two comments about the generators of EY’s pessimism being in the sequences all along.
Vaniver’s comment presenting an analysis of why some concrete ML work in alignment doesn’t seem to help for the AGI level.
Rob Bensinger’s comment drawing a great list of distinction to clarify the debate.
Thank you for this follow-up comment Adam, I appreciate it.