I want to say some things about the experience working with Nate, I’m not sure how coherent this will be.
Reflections on working with Nate
I think jsteinhardt is pretty correct when he talks about psychological safety, I think our conversations with Nate often didn’t feel particularly “safe”, possibly because Nate assumes his conversation partners will be as robust as him.
Nate can pretty easily bulldoze/steamroll over you in conversation, in a way that requires a lot of fortitude to stand up to, and eventually one can just kind of give up. This could happen if you ask a question (and maybe the question was confused in some way) and Nate responds with something of a rant that makes you feel dumb for even asking the question. Or often we/I felt like Nate had assumed we were asking a different thing, and would go on a spiel that would kind of assume you didn’t know what was going on. This often felt like rounding your statements off to the dumbest version. I think it often did turn out that the questions we asked were confused, this seems pretty expected given that we were doing deconfusion/conceptual work where part of the aim is to work out which questions are reasonable to ask.
I think it should have been possible for Nate to give feedback in a way that didn’t make you feel sad/bad or like you shouldn’t have asked the question in the first place. The feedback we often got was fairly cutting, and I feel like it should be possible to give basically the exact same feedback without making the other person feel sad/bad/frustrated.
Nate would often go on fairly long rants (not sure there is a more charitable word), and it could be hard to get a word in to say “I didn’t really want a response like this, and I don’t think it’s particularly useful”.
Sometimes it seemed like Nate was in a bad mood (or maybe our specific things we wanted to talk about caused him a lot of distress and despair). I remember feeling pretty rough after days that went badly, and then extremely relieved when they went well.
Overall, I think the norms of Nate-culture are pretty at-odds with standard norms. I think in general if you are going to do something norm-violating, you should warn the people you are interacting with (which did eventually happen).
Positive things
Nate is very smart, and it was clearly taxing/frustrating to work with us much of the time. In this sense he put in a bunch of effort, where the obvious alternative is to just not talk to us. (This is different than putting in effort into making communication go well or making things easy for us).
Nate is clearly trying to solve the problem, and has been working on it for a long time. I can see how it would be frustrating when people aren’t understanding something that you worked out 10 years ago (or were possibly never confused about in the first place). I can imagine that it really sucks being in Nate’s position, feeling the world is burning, almost no one is trying to save it, those who are trying to save it are looking at the wrong thing, and even when you try to point people at the thing to look at they keep turning to look at something else (something easier, less scary, more approachable, but useless).
We actually did learn a bunch of things, and I think most/all of us feel like we can think better about alignment than before we started. There is some MIRI/Nate/Eliezer frame of the alignment problem that basically no one else has. I think it is very hard to work this out just from MIRI’s public writing, particularly the content related to the Sharp Left Turn. But from talking to Nate (a lot), I think I do (partially) understand this frame, I think this is not nonsense, and is important.
If this frame is the correct one, and working with Nate in a somewhat painful environment is the only way to learn it, then this does seem to be worth it. (Note that I am not convinced that the environment needed to be this hard, and it seems very likely to me that we should have been able to have meetings which were both less difficult and more productive).
It also seems important to note that when chatting with Nate about things other than alignment the conversations were good. They didn’t have this “bulldozer” quality, they were frequently fun and kind, and didn’t feel “unsafe”.
I have some empathy for the position that Nate didn’t really sign up to be a mentor, and we suddenly had all these expectations for him. And then the project kind of morphed into a thing where we expected Nate-mentorship, which he did somewhat grudgingly, and assumed that because we kept requesting meetings that we were ok with dealing with the communication difficulties.
I would probably ex post still decide to join the project
I think I learned a lot, and the majority of this is because of Nate’s mentorship. I am genuinely grateful for this.
I do think that the project could have been more efficient if we had better communication, and it does feel (from my non-Nate perspective) that this should have been an option.
I think that being warned/informed earlier about likely communication difficulties would have helped us prepare and mitigate these, rather than getting somewhat blindsided. It would also have just been nice to have some explicit agreement for the new norms, and some acknowledgement that these are not standard communication norms.
I feel pretty conflicted about various things. I think that there should clearly be incentives such that people with power can’t get away with being disrespectful/mean to people under them. Most people should be able to do this. I think that sometimes people should be able to lay out their abnormal communication norms, and give people the option of engaging with them or not (I’m pretty confused about how this interacts with various power dynamics). I wouldn’t want strict rules on communication stopping people like Nate being able to share their skills/knowledge/etc with others; I would like those others to be fully informed about what they are getting into.
I am curious to what extent you or Nate think I understand that frame? And how easy it would be to help me fully get it? I am confused about how confused I am.
even when you try to point people at the thing to look at they keep turning to look at something else (something easier, less scary, more approachable, but useless).
I understand why someone might be frustrated in his position, and it’s fine to feel however. However, I want to push back on any implicit advancement of sentiments like “his intense feelings justify the behavior.”[1] The existing discussion has focused a lot on the social consequences of e.g. aggressive and mean behavior. I’ll now take a more pragmatic view.
If you want to convince people of something, you should not severely punish them for talking to you. For example, I’d be more open to hearing Nate’s perspective if he had conducted himself in an even somewhat reasonable manner. As I wrote in my original comment:
[Nate’s behavior] killed my excitement for engaging with the MIRI-sphere.
Even from a pragmatic “world-saving” perspective, and given Nate’s apparent views, Nate’s behavior still doesn’t make sense to me. It doesn’t seem like he’s making some clever but uncooperative trade whereby he effectively persuades people of true stuff, albeit at (sometimes) large emotional cost to others. It seems more like “relative lack of persuasion, and other people sometimes get hurt (without agreeing to it), and people sometimes become significantly less interested in considering his views.”
I sometimes get frustrated that people stillseem to be missing key shard theory insights, even after several conversations. I get frustrated that Nate in particular possibly still doesn’t understand what I was trying to explain in our July 2022 chat. I still do not rant at people or leave them feeling intensely drained. Even if my emotions were more intense, I would still think it pragmatically unwise to have strong negative effects on my employees and colleagues.
There is some MIRI/Nate/Eliezer frame of the alignment problem that basically no one else has.
This might be true, and if true it might be very important. But, outside view, I think the track record of people/organizations claiming things along the lines of “we and we alone have the correct understanding of X, and your only way to understand X is to seek our wisdom” is pretty bad, and that of people/organizations about whom other people say “they and they alone have the correct understanding, etc.” isn’t much better.
I know that MIRI expresses concern about the dangers of spreading their understanding of things that might possibly be used to advance AI capabilities. But if an important thing they have is a uniquely insightful way of framing the alignment problem then that seems like the sort of thing that (1) is very unlikely to be dangerous to reveal, (2) could be very valuable to share with others, and (3) if so shared would (a) encourage others to take MIRI more seriously, if indeed it turns out that they have uniquely insightful ways of thinking about alignment and (b) provide opportunities to correct errors they’re missing, if in fact what they have is (something like) plausible rhetoric that doesn’t stand up to close critical examination.
I think the 2021 MIRI Conversations and 2022 MIRI Alignment Discussion sequences are an attempt at this. I feel like I have a relatively good handle on their frame after reading those sequences, and I think the ideas contained within are pretty insightful.
Like Zvi, I might be confused about how confused I am, but I don’t think it’s because they’re trying to keep their views secret. Maybe there’s some more specific capabilities-adjacent stuff they’re not sharing, but I suspect the thing the grandparent is getting at is more about a communication difficulty that in practice seems to be overcome mostly by working together directly, as opposed to the interpretation that they’re deliberately not communicating their basic views for secrecy reasons.
(I also found Eliezer’s fiction helpful for internalizing his worldview in general, and IMO it is also has some pretty unique insights.)
I want to say some things about the experience working with Nate, I’m not sure how coherent this will be.
Reflections on working with Nate
I think jsteinhardt is pretty correct when he talks about psychological safety, I think our conversations with Nate often didn’t feel particularly “safe”, possibly because Nate assumes his conversation partners will be as robust as him.
Nate can pretty easily bulldoze/steamroll over you in conversation, in a way that requires a lot of fortitude to stand up to, and eventually one can just kind of give up. This could happen if you ask a question (and maybe the question was confused in some way) and Nate responds with something of a rant that makes you feel dumb for even asking the question. Or often we/I felt like Nate had assumed we were asking a different thing, and would go on a spiel that would kind of assume you didn’t know what was going on. This often felt like rounding your statements off to the dumbest version. I think it often did turn out that the questions we asked were confused, this seems pretty expected given that we were doing deconfusion/conceptual work where part of the aim is to work out which questions are reasonable to ask.
I think it should have been possible for Nate to give feedback in a way that didn’t make you feel sad/bad or like you shouldn’t have asked the question in the first place. The feedback we often got was fairly cutting, and I feel like it should be possible to give basically the exact same feedback without making the other person feel sad/bad/frustrated.
Nate would often go on fairly long rants (not sure there is a more charitable word), and it could be hard to get a word in to say “I didn’t really want a response like this, and I don’t think it’s particularly useful”.
Sometimes it seemed like Nate was in a bad mood (or maybe our specific things we wanted to talk about caused him a lot of distress and despair). I remember feeling pretty rough after days that went badly, and then extremely relieved when they went well.
Overall, I think the norms of Nate-culture are pretty at-odds with standard norms. I think in general if you are going to do something norm-violating, you should warn the people you are interacting with (which did eventually happen).
Positive things
Nate is very smart, and it was clearly taxing/frustrating to work with us much of the time. In this sense he put in a bunch of effort, where the obvious alternative is to just not talk to us. (This is different than putting in effort into making communication go well or making things easy for us).
Nate is clearly trying to solve the problem, and has been working on it for a long time. I can see how it would be frustrating when people aren’t understanding something that you worked out 10 years ago (or were possibly never confused about in the first place). I can imagine that it really sucks being in Nate’s position, feeling the world is burning, almost no one is trying to save it, those who are trying to save it are looking at the wrong thing, and even when you try to point people at the thing to look at they keep turning to look at something else (something easier, less scary, more approachable, but useless).
We actually did learn a bunch of things, and I think most/all of us feel like we can think better about alignment than before we started. There is some MIRI/Nate/Eliezer frame of the alignment problem that basically no one else has. I think it is very hard to work this out just from MIRI’s public writing, particularly the content related to the Sharp Left Turn. But from talking to Nate (a lot), I think I do (partially) understand this frame, I think this is not nonsense, and is important.
If this frame is the correct one, and working with Nate in a somewhat painful environment is the only way to learn it, then this does seem to be worth it. (Note that I am not convinced that the environment needed to be this hard, and it seems very likely to me that we should have been able to have meetings which were both less difficult and more productive).
It also seems important to note that when chatting with Nate about things other than alignment the conversations were good. They didn’t have this “bulldozer” quality, they were frequently fun and kind, and didn’t feel “unsafe”.
I have some empathy for the position that Nate didn’t really sign up to be a mentor, and we suddenly had all these expectations for him. And then the project kind of morphed into a thing where we expected Nate-mentorship, which he did somewhat grudgingly, and assumed that because we kept requesting meetings that we were ok with dealing with the communication difficulties.
I would probably ex post still decide to join the project
I think I learned a lot, and the majority of this is because of Nate’s mentorship. I am genuinely grateful for this.
I do think that the project could have been more efficient if we had better communication, and it does feel (from my non-Nate perspective) that this should have been an option.
I think that being warned/informed earlier about likely communication difficulties would have helped us prepare and mitigate these, rather than getting somewhat blindsided. It would also have just been nice to have some explicit agreement for the new norms, and some acknowledgement that these are not standard communication norms.
I feel pretty conflicted about various things. I think that there should clearly be incentives such that people with power can’t get away with being disrespectful/mean to people under them. Most people should be able to do this. I think that sometimes people should be able to lay out their abnormal communication norms, and give people the option of engaging with them or not (I’m pretty confused about how this interacts with various power dynamics). I wouldn’t want strict rules on communication stopping people like Nate being able to share their skills/knowledge/etc with others; I would like those others to be fully informed about what they are getting into.
I am curious to what extent you or Nate think I understand that frame? And how easy it would be to help me fully get it? I am confused about how confused I am.
I understand why someone might be frustrated in his position, and it’s fine to feel however. However, I want to push back on any implicit advancement of sentiments like “his intense feelings justify the behavior.”[1] The existing discussion has focused a lot on the social consequences of e.g. aggressive and mean behavior. I’ll now take a more pragmatic view.
If you want to convince people of something, you should not severely punish them for talking to you. For example, I’d be more open to hearing Nate’s perspective if he had conducted himself in an even somewhat reasonable manner. As I wrote in my original comment:
Even from a pragmatic “world-saving” perspective, and given Nate’s apparent views, Nate’s behavior still doesn’t make sense to me. It doesn’t seem like he’s making some clever but uncooperative trade whereby he effectively persuades people of true stuff, albeit at (sometimes) large emotional cost to others. It seems more like “relative lack of persuasion, and other people sometimes get hurt (without agreeing to it), and people sometimes become significantly less interested in considering his views.”
I sometimes get frustrated that people still seem to be missing key shard theory insights, even after several conversations. I get frustrated that Nate in particular possibly still doesn’t understand what I was trying to explain in our July 2022 chat. I still do not rant at people or leave them feeling intensely drained. Even if my emotions were more intense, I would still think it pragmatically unwise to have strong negative effects on my employees and colleagues.
Probably you (Peter) did not mean to imply this, in which case my comment will just make the general point.
I’m struck by this:
This might be true, and if true it might be very important. But, outside view, I think the track record of people/organizations claiming things along the lines of “we and we alone have the correct understanding of X, and your only way to understand X is to seek our wisdom” is pretty bad, and that of people/organizations about whom other people say “they and they alone have the correct understanding, etc.” isn’t much better.
I know that MIRI expresses concern about the dangers of spreading their understanding of things that might possibly be used to advance AI capabilities. But if an important thing they have is a uniquely insightful way of framing the alignment problem then that seems like the sort of thing that (1) is very unlikely to be dangerous to reveal, (2) could be very valuable to share with others, and (3) if so shared would (a) encourage others to take MIRI more seriously, if indeed it turns out that they have uniquely insightful ways of thinking about alignment and (b) provide opportunities to correct errors they’re missing, if in fact what they have is (something like) plausible rhetoric that doesn’t stand up to close critical examination.
I think the 2021 MIRI Conversations and 2022 MIRI Alignment Discussion sequences are an attempt at this. I feel like I have a relatively good handle on their frame after reading those sequences, and I think the ideas contained within are pretty insightful.
Like Zvi, I might be confused about how confused I am, but I don’t think it’s because they’re trying to keep their views secret. Maybe there’s some more specific capabilities-adjacent stuff they’re not sharing, but I suspect the thing the grandparent is getting at is more about a communication difficulty that in practice seems to be overcome mostly by working together directly, as opposed to the interpretation that they’re deliberately not communicating their basic views for secrecy reasons.
(I also found Eliezer’s fiction helpful for internalizing his worldview in general, and IMO it is also has some pretty unique insights.)