The LW/AF audience by and large operates under a set of assumptions about AI safety that I don’t really share. I can’t easily describe this set, but one bad way to describe it would be “the MIRI viewpoint” on AI safety. This particular disincentive is probably significantly stronger for other “ML-focused AI safety researchers”.
More effort needed to write comments than to talk to people IRL
By a lot. As a more extreme example, on the recent pessimism for impact measures post, TurnTrout and I switched to private online messaging at one point, and I’d estimate it was about ~5x faster to get to the level of shared understanding we reached than if we had continued with typical big comment responses on AF/LW.
Curious how big each of these active ingredients seemed, or if there were other active ingredients:
1) the privacy (not having any expectation that any onlookers would need to understand what you were saying)
2) the format (linear column of chats, with a small textbox that subtly shaped how much you said at a time)
3) not having other people talking (so you don’t have to stop and pay attention to them)
4) the realtime nature (wherein you expect to get responses quickly, which allows for faster back-and-forth and checking that you are both on the same page before moving to the next point)
The overall problem with the status quo is that private conversations are worse for onboarding new people into the AI space. So I think it’s quite likely that the best way to improve this is to facilitate private conversations, and them either make them public or distill them afterwards. But there are different ways to go about that depending on which elements are most important.
Primarily 4, somewhat 1, somewhat 2, not at all 3. I think 1 and 2 mattered mostly in the sense that with comments the expectation is that you respond in some depth and with justification, whereas with messaging I just said things with no justification that only TurnTrout had to understand and only needed to explain the ones that we disagreed on.
I do think that conversation was uniquely bad for onboarding new people, I’m not sure I would understand what was said if I reread it two months from now. I did in fact post a distillation of it afterwards.
The LW/AF audience by and large operates under a set of assumptions about AI safety that I don’t really share. I can’t easily describe this set, but one bad way to describe it would be “the MIRI viewpoint” on AI safety.
Are you seeing this reflected in the pattern of votes (comments/posts reflecting “the MIRI viewpoint” get voted up more), pattern of posts (there’s less content about other viewpoints), or pattern of engagement (most replies you’re getting are from this viewpoint)? Please give some examples if you feel comfortable doing that.
In any case, do you think recruiting more alignment/safety researchers with other viewpoints to participate on LW/AF would be a good solution? Would you like the current audience to consider the arguments for other viewpoints more seriously? Other solutions you think are worth trying?
TurnTrout and I switched to private online messaging at one point
Yeah, I think this is probably being done less than optimally, and I’d like to see LW support or encourage this somehow. One problem with the way people are doing this currently is that the chat transcripts are typically not posted, which prevents others from following along and perhaps getting a similar level of understanding, or asking questions, or spotting errors that both sides are making, or learning discussion skills from such examples.
Are you seeing this reflected in the pattern of votes (comments/posts reflecting “the MIRI viewpoint” get voted up more), pattern of posts (there’s less content about other viewpoints), or pattern of engagement (most replies you’re getting are from this viewpoint)?
All three. I do want to note that “MIRI viewpoint” is not exactly right so I’m going to call it “viewpoint X” just to be absolutely clear that I have not precisely defined it. Some examples:
In the Value Learning sequence, Chapter 3 and the posts on misspecification from Chapter 1 are upvoted less than the rest of Chapter 1 and Chapter 2. In fact, Chapter 3 is the actual view I wanted to get across, but I knew that it didn’t really fit with viewpoint X. I created Chapters 1 and 2 with the aim of getting people with viewpoint X to see why one might have the mindset that generates Chapter 3.
Looking at the last ~20 posts on the Alignment Forum, if you exclude the newsletters and the retrospective, I would classify them all as coming from viewpoint X.
On comments, it’s hard to give a comparative example because I can’t really remember any comments coming from not-viewpoint X. A canonical example of a viewpoint X comment is this one, chosen primarily because it’s on the post of mine that is most explicitly not coming from viewpoint X.
In any case, do you think recruiting more alignment/safety researchers with other viewpoints to participate on LW/AF would be a good solution?
This would help with my personal disincentives; I don’t know if it’s a good idea overall. It could be hard to have a productive discussion: I already find it hard, and of the people who would say they disagree with viewpoint X, I think I understand viewpoint X very well. (Also, while many ML researchers who care about safety don’t know too much about viewpoint X, there definitely exist some who explicitly choose not to engage with viewpoint X because it doesn’t seem productive or valuable.)
Would you like the current audience to consider the arguments for other viewpoints more seriously?
Yes, in an almost trivial sense that I think that other viewpoints are more important/correct than viewpoint X.
I’m not actually sure this would better incentivize me to participate; I suspect that if people tried to understand my viewpoint they would at least initially get it wrong, in the same way that often when people try to steelman arguments from some person they end up saying things that that person does not believe.
Other solutions you think are worth trying?
More high-touch in-person conversations where people try to understand other viewpoints? Having people with viewpoint X study ML for a while? I don’t really think either of these are worth trying, they seem unlikely to work and are costly.
It sounds like you might prefer a separate place to engage more with people who already share your viewpoint. Does that seem right? I think I would prefer having something like that too if it means being able to listen in on discussions of AI safety researchers with perspectives different from myself.
I would be interested in getting a clearer picture of what you mean by “viewpoint X”, how your viewpoint differs from it, and what especially bugs you about it, but I guess it’s hard to do, or you would have done it already.
It sounds like you might prefer a separate place to engage more with people who already share your viewpoint.
I mean, I’m not sure if an intervention is necessary—I do in fact engage with people who share my viewpoint, or at least understand it well; many of them are at CHAI. It just doesn’t happen on LW/AF.
I would be interested in getting a clearer picture of what you mean by “viewpoint X”
I can probably at least point at it more clearly by listing out some features I associate with it:
A strong focus on extremely superintelligent AI systems
A strong focus on utility functions
Emphasis on backwards-chaining rather than forward-chaining. Though that isn’t exactly right. Maybe I more mean that there’s an emphasis that any particular idea must have a connection via a sequence of logical steps to a full solution to AI safety.
An emphasis on exact precision rather than robustness to errors (something like treating the problem as a scientific problem rather than an engineering problem)
Security mindset
Note that I’m not saying I disagree with all of these points; I’m trying to point at a cluster of beliefs / modes of thinking that I tend to see in people who have viewpoint X.
I mean, I’m not sure if an intervention is necessary—I do in fact engage with people who share my viewpoint, or at least understand it well; many of them are at CHAI. It just doesn’t happen on LW/AF.
Yeah, I figured as much, which is why I said I’d prefer having an online place for such discussions so that I would be able to listen in on these discussions. :) Another advantage is to encourage more discussions across organizations and from independent researchers, students, and others considering going into the field.
Maybe I more mean that there’s an emphasis that any particular idea must have a connection via a sequence of logical steps to a full solution to AI safety.
It’s worth noting that many MIRI researchers seem to have backed away from this (or clarified that they didn’t think this in the first place). This was pretty noticeable at the research retreat and also reflected in their recent writings. I want to note though how scary it is that almost nobody has a good idea how their current work logically connects to a full solution to AI safety.
Note that I’m not saying I disagree with all of these points; I’m trying to point at a cluster of beliefs / modes of thinking that I tend to see in people who have viewpoint X.
I’m curious what your strongest disagreements are, and what bugs you the most, as far as disincentivizing you to participate on LW/AF.
It’s worth noting that many MIRI researchers seem to have backed away from this (or clarified that they didn’t think this in the first place).
Agreed that this is reflected in their writings. I think this usually causes them to move towards trying to understand intelligence, as opposed to proposing partial solutions. (A counterexample: Non-Consequentialist Cooperation?) When others propose partial solutions, I’m not sure whether or not this belief is reflected in their upvotes or engagement through comments. (As in, I actually am uncertain—I can’t see who upvotes posts, and for the most part MIRI researchers don’t seem to engage very much.)
I want to note though how scary it is that almost nobody has a good idea how their current work logically connects to a full solution to AI safety.
Agreed.
I’m curious what your strongest disagreements are, and what bugs you the most, as far as disincentivizing you to participate on LW/AF.
I don’t think any of those features strongly disincentivize me from participating on LW/AF; it’s more the lack of people close to my own viewpoint that disincentivizes me from participating.
Maybe the focus on exact precision instead of robustness to errors is a disincentive, as well as the focus on expected utility maximization with simple utility functions. A priori I assign somewhat high probability that I will not find useful a critical comment on my work from anyone holding that perspective, but I’ll feel obligated to reply anyway.
Certainly those two features are the ones I most disagree with; the other three seem pretty reasonable in moderation.
I don’t think any of those features strongly disincentivize me from participating on LW/AF; it’s more the lack of people close to my own viewpoint that disincentivizes me from participating.
I see. Hopefully the LW/AF team is following this thread and thinking about what to do, but in the meantime I encourage you to participate anyway, as it seems good to get ideas from your viewpoint “out there” even if no one is currently engaging with them in a way that you find useful.
as well as the focus on expected utility maximization with simple utility functions
I don’t think anyone talks about simple utility functions? Maybe you mean explicit utility functions?
A priori I assign somewhat high probability that I will not find useful a critical comment on my work from anyone holding that perspective, but I’ll feel obligated to reply anyway.
If this feature request of mine were implemented, you’d be able to respond to such comments with a couple of clicks. In the meantime it seems best to just not feel obligated to reply.
I encourage you to participate anyway, as it seems good to get ideas from your viewpoint “out there” even if no one is currently engaging with them in a way that you find useful.
Yeah, that’s the plan.
I don’t think anyone talks about simple utility functions? Maybe you mean explicit utility functions?
Yes, sorry. I said that because they feel very similar to me: any utility function that can be explicitly specified must be reasonably simple. But I agree “explicit” is more accurate.
In the meantime it seems best to just not feel obligated to reply.
That seems right, but also hard to do in practice (for me).
Disincentives for me personally:
The LW/AF audience by and large operates under a set of assumptions about AI safety that I don’t really share. I can’t easily describe this set, but one bad way to describe it would be “the MIRI viewpoint” on AI safety. This particular disincentive is probably significantly stronger for other “ML-focused AI safety researchers”.
By a lot. As a more extreme example, on the recent pessimism for impact measures post, TurnTrout and I switched to private online messaging at one point, and I’d estimate it was about ~5x faster to get to the level of shared understanding we reached than if we had continued with typical big comment responses on AF/LW.
Curious how big each of these active ingredients seemed, or if there were other active ingredients:
1) the privacy (not having any expectation that any onlookers would need to understand what you were saying)
2) the format (linear column of chats, with a small textbox that subtly shaped how much you said at a time)
3) not having other people talking (so you don’t have to stop and pay attention to them)
4) the realtime nature (wherein you expect to get responses quickly, which allows for faster back-and-forth and checking that you are both on the same page before moving to the next point)
The overall problem with the status quo is that private conversations are worse for onboarding new people into the AI space. So I think it’s quite likely that the best way to improve this is to facilitate private conversations, and them either make them public or distill them afterwards. But there are different ways to go about that depending on which elements are most important.
Primarily 4, somewhat 1, somewhat 2, not at all 3. I think 1 and 2 mattered mostly in the sense that with comments the expectation is that you respond in some depth and with justification, whereas with messaging I just said things with no justification that only TurnTrout had to understand and only needed to explain the ones that we disagreed on.
I do think that conversation was uniquely bad for onboarding new people, I’m not sure I would understand what was said if I reread it two months from now. I did in fact post a distillation of it afterwards.
Are you seeing this reflected in the pattern of votes (comments/posts reflecting “the MIRI viewpoint” get voted up more), pattern of posts (there’s less content about other viewpoints), or pattern of engagement (most replies you’re getting are from this viewpoint)? Please give some examples if you feel comfortable doing that.
In any case, do you think recruiting more alignment/safety researchers with other viewpoints to participate on LW/AF would be a good solution? Would you like the current audience to consider the arguments for other viewpoints more seriously? Other solutions you think are worth trying?
Yeah, I think this is probably being done less than optimally, and I’d like to see LW support or encourage this somehow. One problem with the way people are doing this currently is that the chat transcripts are typically not posted, which prevents others from following along and perhaps getting a similar level of understanding, or asking questions, or spotting errors that both sides are making, or learning discussion skills from such examples.
All three. I do want to note that “MIRI viewpoint” is not exactly right so I’m going to call it “viewpoint X” just to be absolutely clear that I have not precisely defined it. Some examples:
In the Value Learning sequence, Chapter 3 and the posts on misspecification from Chapter 1 are upvoted less than the rest of Chapter 1 and Chapter 2. In fact, Chapter 3 is the actual view I wanted to get across, but I knew that it didn’t really fit with viewpoint X. I created Chapters 1 and 2 with the aim of getting people with viewpoint X to see why one might have the mindset that generates Chapter 3.
Looking at the last ~20 posts on the Alignment Forum, if you exclude the newsletters and the retrospective, I would classify them all as coming from viewpoint X.
On comments, it’s hard to give a comparative example because I can’t really remember any comments coming from not-viewpoint X. A canonical example of a viewpoint X comment is this one, chosen primarily because it’s on the post of mine that is most explicitly not coming from viewpoint X.
This would help with my personal disincentives; I don’t know if it’s a good idea overall. It could be hard to have a productive discussion: I already find it hard, and of the people who would say they disagree with viewpoint X, I think I understand viewpoint X very well. (Also, while many ML researchers who care about safety don’t know too much about viewpoint X, there definitely exist some who explicitly choose not to engage with viewpoint X because it doesn’t seem productive or valuable.)
Yes, in an almost trivial sense that I think that other viewpoints are more important/correct than viewpoint X.
I’m not actually sure this would better incentivize me to participate; I suspect that if people tried to understand my viewpoint they would at least initially get it wrong, in the same way that often when people try to steelman arguments from some person they end up saying things that that person does not believe.
More high-touch in-person conversations where people try to understand other viewpoints? Having people with viewpoint X study ML for a while? I don’t really think either of these are worth trying, they seem unlikely to work and are costly.
It sounds like you might prefer a separate place to engage more with people who already share your viewpoint. Does that seem right? I think I would prefer having something like that too if it means being able to listen in on discussions of AI safety researchers with perspectives different from myself.
I would be interested in getting a clearer picture of what you mean by “viewpoint X”, how your viewpoint differs from it, and what especially bugs you about it, but I guess it’s hard to do, or you would have done it already.
I mean, I’m not sure if an intervention is necessary—I do in fact engage with people who share my viewpoint, or at least understand it well; many of them are at CHAI. It just doesn’t happen on LW/AF.
I can probably at least point at it more clearly by listing out some features I associate with it:
A strong focus on extremely superintelligent AI systems
A strong focus on utility functions
Emphasis on backwards-chaining rather than forward-chaining. Though that isn’t exactly right. Maybe I more mean that there’s an emphasis that any particular idea must have a connection via a sequence of logical steps to a full solution to AI safety.
An emphasis on exact precision rather than robustness to errors (something like treating the problem as a scientific problem rather than an engineering problem)
Security mindset
Note that I’m not saying I disagree with all of these points; I’m trying to point at a cluster of beliefs / modes of thinking that I tend to see in people who have viewpoint X.
Yeah, I figured as much, which is why I said I’d prefer having an online place for such discussions so that I would be able to listen in on these discussions. :) Another advantage is to encourage more discussions across organizations and from independent researchers, students, and others considering going into the field.
It’s worth noting that many MIRI researchers seem to have backed away from this (or clarified that they didn’t think this in the first place). This was pretty noticeable at the research retreat and also reflected in their recent writings. I want to note though how scary it is that almost nobody has a good idea how their current work logically connects to a full solution to AI safety.
I’m curious what your strongest disagreements are, and what bugs you the most, as far as disincentivizing you to participate on LW/AF.
Agreed that this is reflected in their writings. I think this usually causes them to move towards trying to understand intelligence, as opposed to proposing partial solutions. (A counterexample: Non-Consequentialist Cooperation?) When others propose partial solutions, I’m not sure whether or not this belief is reflected in their upvotes or engagement through comments. (As in, I actually am uncertain—I can’t see who upvotes posts, and for the most part MIRI researchers don’t seem to engage very much.)
Agreed.
I don’t think any of those features strongly disincentivize me from participating on LW/AF; it’s more the lack of people close to my own viewpoint that disincentivizes me from participating.
Maybe the focus on exact precision instead of robustness to errors is a disincentive, as well as the focus on expected utility maximization with simple utility functions. A priori I assign somewhat high probability that I will not find useful a critical comment on my work from anyone holding that perspective, but I’ll feel obligated to reply anyway.
Certainly those two features are the ones I most disagree with; the other three seem pretty reasonable in moderation.
I see. Hopefully the LW/AF team is following this thread and thinking about what to do, but in the meantime I encourage you to participate anyway, as it seems good to get ideas from your viewpoint “out there” even if no one is currently engaging with them in a way that you find useful.
I don’t think anyone talks about simple utility functions? Maybe you mean explicit utility functions?
If this feature request of mine were implemented, you’d be able to respond to such comments with a couple of clicks. In the meantime it seems best to just not feel obligated to reply.
Yeah, that’s the plan.
Yes, sorry. I said that because they feel very similar to me: any utility function that can be explicitly specified must be reasonably simple. But I agree “explicit” is more accurate.
That seems right, but also hard to do in practice (for me).