Richard_Ngo comments on Towards more cooperative AI safety strategies

Richard_Ngo 18 Jul 2024 20:49 UTC
5 points
0
Would you feel the same way about “influence-seeking”, which I almost went with?
Note also that, while Bob is being a dick about it, the dynamic in your scenario is actually a very common one. Many people are social climbers who use every opportunity to network or shill themselves, and this does get noticed and reflects badly on them. We can debate about the precise terminology to use (which I think should probably be different for groups vs individuals) but if Alice just reasoned from the top down about how to optimize her networking really hard for her career, in a non-socially-skilled way, a friend should pull her aside and say “hey, communities often have defense mechanisms against the thing you’re doing, watch out”.
- Akash 18 Jul 2024 21:31 UTC
  5 points
  2
  Parent
  Influence-seeking activates the same kind of feeling though it’s less strong than for “power-seeking.”
  but if Alice just reasoned from the top down about how to optimize her networking really hard for her career, in a non-socially-skilled way, a friend should pull her aside
  +1. I suspect we’d also likely agree that if Alice just stayed in her room all day and only talked to her friends about what ideal housing policy should look like, someone should pull her aside and say “hey, you might want to go to some networking events and see if you can get involved in housing policy, or at least see if there are other things you should be doing to become a better fit for housing policy roles in the future.”
  In this case, it’s not the desire to have influence that is the core problem. The core problem is whether or not Alice is taking the right moves to have the kind of influence she wants.
  Bringing it back to the post– I think I’d be excited to see you write something more along the lines of “What mistakes do many people in the AIS community make when it comes to influence-seeking?” I suspect this would be more specific and concrete. I think the two suggestions at the end (prioritize legitimacy & prioritize competence) start to get at this.
  Otherwise, I feel like the discussion is going to go into less productive directions, where people who already agree with you react like “Yeah, Alice is such a status-seeker! Stick it to her!” and people who disagree with you are like “Wait what? Alice is just trying to network so she can get a job in housing policy– why are you trying to cast that as some shady plot? Should she just stay in her room and write blog posts?”
  - Richard_Ngo 18 Jul 2024 21:52 UTC
    6 points
    1
    Parent
    In this case, it’s not the desire to have influence that is the core problem. The core problem is whether or not Alice is taking the right moves to have the kind of influence she wants.
    I think I actually disagree with this. It feels like your framing is something like: “if you pursue power in the wrong ways, you’ll have problems. If you pursue power in the right ways, you’ll be fine”.
    And in fact the thing I’m trying to convey is more like “your default assumption should be that accumulating power triggers defense mechanisms, and you might think you can avoid this tradeoff by being cleverer, but that’s mostly an illusion”. (Or, in other words, it’s faulty CDT-style thinking.)
    Based on this I actually think that “structurally power-seeking” is the right term after all, because it’s implicitly asserting that you can’t separate out these two things (“power-seeking” and “gaining power in ‘the right ways’”).
    Note also that my solutions at the end are not in fact strategies for accumulating power in ‘the right ways.’ They’re strategies for achieving your goals while accumulating less power. E.g. prioritizing competence means that you’ll try less hard to get “your” person into power. Prioritizing legitimacy means you’re making it harder to get your own ideas implemented, when others disagree.
    (FWIW I think that on the level of individuals the tradeoff between accumulating power and triggering defense mechanisms is often just a skill issue. But on the level of movements the tradeoff is much harder to avoid—e.g. you need to recruit politically-savvy people, but that undermines your truth-seeking altruistically-motivated culture.)
    - Akash 18 Jul 2024 23:09 UTC
      7 points
      5
      Parent
      I’m not quite sure where we disagree, but if I had to put my finger on it, it’s something like “I don’t think that people would be offput by Alice going to networking events to try to get a job in housing policy, and I don’t think she would trigger any defense mechanisms.”
      Specific question for you: Would you say that “Alice going to a networking event” (assume she’s doing it socially conventional/appropriate ways) would count as structural power-seeking? And would you discourage her from going?
      More generally, there are a lot of things you’re labeling as “power-seeking” which feel inaccurate or at least quite unnatural to label as “power-seeking”, and I suspect that this will lead to confusion (or at worst, lead to some of the people you want to engage dismissing your valid points).
      I think in your frame, Alice going to networking events would be seen as “there are some socially-accepted ways of seeking power” and in my frame this would be seen as “it doesn’t really make sense to call this power-seeking, as most people would find it ridiculous/weird to apply the label ‘power-seeking’ to an action as simple as going to a networking event.”
      I’m also a bit worried about a motte-and-bailey here. The bold statement is “power-seeking (which I’m kind of defining as anything that increases your influence, regardless of how innocuous or socially accepted it seems) is bad because it triggers defense mechanisms” and the more moderated statement is “there are some specific ways of seeking power that have important social costs, and I think that some/many actors in the community underestimate those costs. Also, there are many strategies for achieving your goals that don’t involve seeking power, and I think some/many people in the community are underestimating those.”
      I agree with the more moderated claims.
      - Richard_Ngo 19 Jul 2024 4:02 UTC
        6 points
        2
        Parent
        Would you say that “Alice going to a networking event” (assume she’s doing it socially conventional/appropriate ways) would count as structural power-seeking? And would you discourage her from going?
        I think you’re doing a paradox of the heap here. One grain of sand is obviously not a heap, but a million obviously is. Similarly, Alice going to one networking event is obviously not power-seeking, but Alice taking every opportunity she can to pitch herself to the most powerful people she can find obviously is. I’m identifying a pattern of behavior that AI safety exhibits significantly more than other communities, and the fair analogy is to a pattern of behavior that Alice exhibits significantly more than other people around her.
        I’m also a bit worried about a motte-and-bailey here. The bold statement is “power-seeking (which I’m kind of defining as anything that increases your influence, regardless of how innocuous or socially accepted it seems) is bad because it triggers defense mechanisms”
        I flagged several times in the post that I was not claiming that power-seeking is bad overall, just that it typically has this one bad effect.
        the more moderated statement is “there are some specific ways of seeking power that have important social costs, and I think that some/many actors in the community underestimate those costs
        I repudiated this position in my previous comment, where I flagged that I’m trying to make a claim not about specific ways of seeking power, but rather about the outcome of gaining power in general.
    - Joe Collman 21 Jul 2024 3:47 UTC
      4 points
      −2
      Parent
      E.g. prioritizing competence means that you’ll try less hard to get “your” person into power. Prioritizing legitimacy means you’re making it harder to get your own ideas implemented, when others disagree.
      That’s clarifying. In particular, I hadn’t realized you meant to imply [legitimacy of the ‘community’ as a whole] in your post.
      I think both are good examples in principle, given the point you’re making. I expect neither to work in practice, since I don’t think that either [broad competence of decision-makers] or [increased legitimacy of broad (and broadening!) AIS community] help us much at all in achieving our goals.
      To achieve our goals, I expect we’ll need something much closer to ‘our’ people in power (where ‘our’ means [people with a pretty rare combination of properties, conducive to furthering our goals]), and increased legitimacy for [narrow part of the community I think is correct].
      I think we’d need to go with [aim for a relatively narrow form of power], since I don’t think accumulating less power will work. (though it’s a good plan, to the extent that it’s possible)
      - Richard_Ngo 21 Jul 2024 17:43 UTC
        8 points
        −1
        Parent
        I expect neither to work in practice, since I don’t think that either [broad competence of decision-makers] or [increased legitimacy of broad (and broadening!) AIS community] help us much at all in achieving our goals. To achieve our goals, I expect we’ll need something much closer to ‘our’ people in power.
        While this seems like a reasonable opinion in isolation, I also read the thread where you were debating Rohin and holding the position that most technical AI safety work was net-negative.
        And so basically I think that you, like Eliezer, have been forced by (according to me, incorrect) analyses of the likelihood of doom to the conclusion that only power-seeking strategies will work.
        From the inside, for you, it feels like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
        From the outside, for me, it feels like “The doomers have a cognitive bias that ends up resulting in them overrating power-seeking strategies, and this is not a coincidence but instead driven by the fact that it’s disproportionately easy for cognitive biases to have this effect (given how the human mind works)”.
        Fortunately I think most rationalists have fairly good defense mechanisms against naive power-seeking strategies, and this is to their credit. So the main thing I’m worried about here is concentrating less force behind non-power-seeking strategies.
        Joe Collman 21 Jul 2024 20:45 UTC
        4 points
        2
        Parent
        On your bottom line, I entirely agree—to the extent that there are non-power-seeking strategies that’d be effective, I’m all for them. To the extent that we disagree, I think it’s about [what seems likely to be effective] rather than [whether non-power-seeking is a desirable property].
        Constrained-power-seeking still seems necessary to me. (unfortunately)
        A few clarifications:
        I guess most technical AIS work is net negative in expectation. My ask there is that people work on clearer cases for their work being positive.
        I don’t think my (or Eliezer’s) conclusions on strategy are downstream of [likelihood of doom]. I’ve formed some model of the situation. One output of the model is [likelihood of doom]. Another is [seemingly least bad strategies]. The strategies are based around why doom seems likely, not (primarily) that doom seems likely.
        It doesn’t feel like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
        It feels like the level of power-seeking I’m suggesting seems necessary is appropriate.
        My cognitive biases push me away from enacting power-seeking strategies.
        Biases aside, confidence in [power seems necessary] doesn’t imply confidence that I know what constraints I’d want applied to the application of that power.
        In strategies I’d like, [constraints on future use of power] would go hand in hand with any [accrual of power].
        It’s non-obvious that there are good strategies with this property, but the unconstrained version feels both suspicious and icky to me.
        Suspicious, since [I don’t have a clue how this power will need to be directed now, but trust me—it’ll be clear later (and the right people will remain in control until then)] does not justify confidence.
        To me, you seem to be over-rating the applicability of various reference classes in assessing [(inputs to) likelihood of doom]. As I think I’ve said before, it seems absolutely the correct strategy to look for evidence based on all the relevant reference classes we can find.
        However, all else equal, I’d expect:
        Spending a long time looking for x, makes x feel more important.
        [Wanting to find useful x] tends to shade into [expecting to find useful x] and [perceiving xs as more useful than they are].
        Particularly so when [absent x, we’ll have no clear path to resolving hugely important uncertainties].
        The world doesn’t owe us convenient reference classes. I don’t think there’s any way around inside-view analysis here—in particular, [how relevant/significant is this reference class to this situation?] is an inside-view question.
        That doesn’t make my (or Eliezer’s, or …’s) analysis correct, but there’s no escaping that you’re relying on inside-view too. Our disagreement only escapes [inside-view dependence on your side] once we broadly agree on [the influence of inside-view properties on the relevance/significance of your reference classes]. I assume that we’d have significant disagreements there.
        Though it still seems useful to figure out where. I expect that there are reference classes that we’d agree could clarify various sub-questions.
        In many non-AI-x-risk situations, we would agree—some modest level of inside-view agreement would be sufficient to broadly agree about the relevance/significance of various reference classes.