Rohin Shah comments on Disincentives for participating on LW/AF

Rohin Shah 12 May 2019 4:49 UTC
10 points
Are you seeing this reflected in the pattern of votes (comments/posts reflecting “the MIRI viewpoint” get voted up more), pattern of posts (there’s less content about other viewpoints), or pattern of engagement (most replies you’re getting are from this viewpoint)?
All three. I do want to note that “MIRI viewpoint” is not exactly right so I’m going to call it “viewpoint X” just to be absolutely clear that I have not precisely defined it. Some examples:
- In the Value Learning sequence, Chapter 3 and the posts on misspecification from Chapter 1 are upvoted less than the rest of Chapter 1 and Chapter 2. In fact, Chapter 3 is the actual view I wanted to get across, but I knew that it didn’t really fit with viewpoint X. I created Chapters 1 and 2 with the aim of getting people with viewpoint X to see why one might have the mindset that generates Chapter 3.
- Looking at the last ~20 posts on the Alignment Forum, if you exclude the newsletters and the retrospective, I would classify them all as coming from viewpoint X.
- On comments, it’s hard to give a comparative example because I can’t really remember any comments coming from not-viewpoint X. A canonical example of a viewpoint X comment is this one, chosen primarily because it’s on the post of mine that is most explicitly not coming from viewpoint X.
In any case, do you think recruiting more alignment/safety researchers with other viewpoints to participate on LW/AF would be a good solution?
This would help with my personal disincentives; I don’t know if it’s a good idea overall. It could be hard to have a productive discussion: I already find it hard, and of the people who would say they disagree with viewpoint X, I think I understand viewpoint X very well. (Also, while many ML researchers who care about safety don’t know too much about viewpoint X, there definitely exist some who explicitly choose not to engage with viewpoint X because it doesn’t seem productive or valuable.)
Would you like the current audience to consider the arguments for other viewpoints more seriously?
Yes, in an almost trivial sense that I think that other viewpoints are more important/correct than viewpoint X.
I’m not actually sure this would better incentivize me to participate; I suspect that if people tried to understand my viewpoint they would at least initially get it wrong, in the same way that often when people try to steelman arguments from some person they end up saying things that that person does not believe.
Other solutions you think are worth trying?
More high-touch in-person conversations where people try to understand other viewpoints? Having people with viewpoint X study ML for a while? I don’t really think either of these are worth trying, they seem unlikely to work and are costly.
- Wei Dai 12 May 2019 22:04 UTC
  7 points
  Parent
  It sounds like you might prefer a separate place to engage more with people who already share your viewpoint. Does that seem right? I think I would prefer having something like that too if it means being able to listen in on discussions of AI safety researchers with perspectives different from myself.
  
  I would be interested in getting a clearer picture of what you mean by “viewpoint X”, how your viewpoint differs from it, and what especially bugs you about it, but I guess it’s hard to do, or you would have done it already.
  - Rohin Shah 13 May 2019 16:28 UTC
    17 points
    Parent
    It sounds like you might prefer a separate place to engage more with people who already share your viewpoint.
    I mean, I’m not sure if an intervention is necessary—I do in fact engage with people who share my viewpoint, or at least understand it well; many of them are at CHAI. It just doesn’t happen on LW/AF.
    I would be interested in getting a clearer picture of what you mean by “viewpoint X”
    I can probably at least point at it more clearly by listing out some features I associate with it:
    A strong focus on extremely superintelligent AI systems
    A strong focus on utility functions
    Emphasis on backwards-chaining rather than forward-chaining. Though that isn’t exactly right. Maybe I more mean that there’s an emphasis that any particular idea must have a connection via a sequence of logical steps to a full solution to AI safety.
    An emphasis on exact precision rather than robustness to errors (something like treating the problem as a scientific problem rather than an engineering problem)
    Security mindset
    Note that I’m not saying I disagree with all of these points; I’m trying to point at a cluster of beliefs / modes of thinking that I tend to see in people who have viewpoint X.
    - Wei Dai 14 May 2019 14:36 UTC
      7 points
      Parent
      
      I mean, I’m not sure if an intervention is necessary—I do in fact engage with people who share my viewpoint, or at least understand it well; many of them are at CHAI. It just doesn’t happen on LW/AF.
      
      Yeah, I figured as much, which is why I said I’d prefer having an online place for such discussions so that I would be able to listen in on these discussions. :) Another advantage is to encourage more discussions across organizations and from independent researchers, students, and others considering going into the field.
      
      Maybe I more mean that there’s an emphasis that any particular idea must have a connection via a sequence of logical steps to a full solution to AI safety.
      
      It’s worth noting that many MIRI researchers seem to have backed away from this (or clarified that they didn’t think this in the first place). This was pretty noticeable at the research retreat and also reflected in their recent writings. I want to note though how scary it is that almost nobody has a good idea how their current work logically connects to a full solution to AI safety.
      
      Note that I’m not saying I disagree with all of these points; I’m trying to point at a cluster of beliefs / modes of thinking that I tend to see in people who have viewpoint X.
      
      I’m curious what your strongest disagreements are, and what bugs you the most, as far as disincentivizing you to participate on LW/AF.
      - Rohin Shah 14 May 2019 16:43 UTC
        5 points
        Parent
        It’s worth noting that many MIRI researchers seem to have backed away from this (or clarified that they didn’t think this in the first place).
        Agreed that this is reflected in their writings. I think this usually causes them to move towards trying to understand intelligence, as opposed to proposing partial solutions. (A counterexample: Non-Consequentialist Cooperation?) When others propose partial solutions, I’m not sure whether or not this belief is reflected in their upvotes or engagement through comments. (As in, I actually am uncertain—I can’t see who upvotes posts, and for the most part MIRI researchers don’t seem to engage very much.)
        I want to note though how scary it is that almost nobody has a good idea how their current work logically connects to a full solution to AI safety.
        Agreed.
        I’m curious what your strongest disagreements are, and what bugs you the most, as far as disincentivizing you to participate on LW/AF.
        I don’t think any of those features strongly disincentivize me from participating on LW/AF; it’s more the lack of people close to my own viewpoint that disincentivizes me from participating.
        Maybe the focus on exact precision instead of robustness to errors is a disincentive, as well as the focus on expected utility maximization with simple utility functions. A priori I assign somewhat high probability that I will not find useful a critical comment on my work from anyone holding that perspective, but I’ll feel obligated to reply anyway.
        Certainly those two features are the ones I most disagree with; the other three seem pretty reasonable in moderation.
        What links here?
        Wei Dai's comment on Meta-discussion from “Circling as Cousin to Rationality” by Raemon (4 Jan 2020 8:52 UTC; 12 points)
        Wei Dai 15 May 2019 1:54 UTC
        5 points
        Parent
        
        I don’t think any of those features strongly disincentivize me from participating on LW/AF; it’s more the lack of people close to my own viewpoint that disincentivizes me from participating.
        
        I see. Hopefully the LW/AF team is following this thread and thinking about what to do, but in the meantime I encourage you to participate anyway, as it seems good to get ideas from your viewpoint “out there” even if no one is currently engaging with them in a way that you find useful.
        
        as well as the focus on expected utility maximization with simple utility functions
        
        I don’t think anyone talks about simple utility functions? Maybe you mean explicit utility functions?
        
        A priori I assign somewhat high probability that I will not find useful a critical comment on my work from anyone holding that perspective, but I’ll feel obligated to reply anyway.
        
        If this feature request of mine were implemented, you’d be able to respond to such comments with a couple of clicks. In the meantime it seems best to just not feel obligated to reply.
        
        Rohin Shah 15 May 2019 16:56 UTC
        5 points
        Parent
        I encourage you to participate anyway, as it seems good to get ideas from your viewpoint “out there” even if no one is currently engaging with them in a way that you find useful.
        Yeah, that’s the plan.
        I don’t think anyone talks about simple utility functions? Maybe you mean explicit utility functions?
        Yes, sorry. I said that because they feel very similar to me: any utility function that can be explicitly specified must be reasonably simple. But I agree “explicit” is more accurate.
        In the meantime it seems best to just not feel obligated to reply.
        That seems right, but also hard to do in practice (for me).

Rohin Shah comments on Disincentives for participating on LW/​AF

Rohin Shah comments on Disincentives for participating on LW/AF