Joe Collman comments on Towards more cooperative AI safety strategies

Joe Collman 21 Jul 2024 3:47 UTC
4 points
−2
E.g. prioritizing competence means that you’ll try less hard to get “your” person into power. Prioritizing legitimacy means you’re making it harder to get your own ideas implemented, when others disagree.
That’s clarifying. In particular, I hadn’t realized you meant to imply [legitimacy of the ‘community’ as a whole] in your post.
I think both are good examples in principle, given the point you’re making. I expect neither to work in practice, since I don’t think that either [broad competence of decision-makers] or [increased legitimacy of broad (and broadening!) AIS community] help us much at all in achieving our goals.
To achieve our goals, I expect we’ll need something much closer to ‘our’ people in power (where ‘our’ means [people with a pretty rare combination of properties, conducive to furthering our goals]), and increased legitimacy for [narrow part of the community I think is correct].
I think we’d need to go with [aim for a relatively narrow form of power], since I don’t think accumulating less power will work. (though it’s a good plan, to the extent that it’s possible)
- Richard_Ngo 21 Jul 2024 17:43 UTC
  8 points
  −1
  Parent
  I expect neither to work in practice, since I don’t think that either [broad competence of decision-makers] or [increased legitimacy of broad (and broadening!) AIS community] help us much at all in achieving our goals. To achieve our goals, I expect we’ll need something much closer to ‘our’ people in power.
  While this seems like a reasonable opinion in isolation, I also read the thread where you were debating Rohin and holding the position that most technical AI safety work was net-negative.
  And so basically I think that you, like Eliezer, have been forced by (according to me, incorrect) analyses of the likelihood of doom to the conclusion that only power-seeking strategies will work.
  From the inside, for you, it feels like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
  From the outside, for me, it feels like “The doomers have a cognitive bias that ends up resulting in them overrating power-seeking strategies, and this is not a coincidence but instead driven by the fact that it’s disproportionately easy for cognitive biases to have this effect (given how the human mind works)”.
  Fortunately I think most rationalists have fairly good defense mechanisms against naive power-seeking strategies, and this is to their credit. So the main thing I’m worried about here is concentrating less force behind non-power-seeking strategies.
  - Joe Collman 21 Jul 2024 20:45 UTC
    4 points
    2
    Parent
    On your bottom line, I entirely agree—to the extent that there are non-power-seeking strategies that’d be effective, I’m all for them. To the extent that we disagree, I think it’s about [what seems likely to be effective] rather than [whether non-power-seeking is a desirable property].
    Constrained-power-seeking still seems necessary to me. (unfortunately)
    A few clarifications:
    I guess most technical AIS work is net negative in expectation. My ask there is that people work on clearer cases for their work being positive.
    I don’t think my (or Eliezer’s) conclusions on strategy are downstream of [likelihood of doom]. I’ve formed some model of the situation. One output of the model is [likelihood of doom]. Another is [seemingly least bad strategies]. The strategies are based around why doom seems likely, not (primarily) that doom seems likely.
    It doesn’t feel like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
    It feels like the level of power-seeking I’m suggesting seems necessary is appropriate.
    My cognitive biases push me away from enacting power-seeking strategies.
    Biases aside, confidence in [power seems necessary] doesn’t imply confidence that I know what constraints I’d want applied to the application of that power.
    In strategies I’d like, [constraints on future use of power] would go hand in hand with any [accrual of power].
    It’s non-obvious that there are good strategies with this property, but the unconstrained version feels both suspicious and icky to me.
    Suspicious, since [I don’t have a clue how this power will need to be directed now, but trust me—it’ll be clear later (and the right people will remain in control until then)] does not justify confidence.
    To me, you seem to be over-rating the applicability of various reference classes in assessing [(inputs to) likelihood of doom]. As I think I’ve said before, it seems absolutely the correct strategy to look for evidence based on all the relevant reference classes we can find.
    However, all else equal, I’d expect:
    Spending a long time looking for x, makes x feel more important.
    [Wanting to find useful x] tends to shade into [expecting to find useful x] and [perceiving xs as more useful than they are].
    Particularly so when [absent x, we’ll have no clear path to resolving hugely important uncertainties].
    The world doesn’t owe us convenient reference classes. I don’t think there’s any way around inside-view analysis here—in particular, [how relevant/significant is this reference class to this situation?] is an inside-view question.
    That doesn’t make my (or Eliezer’s, or …’s) analysis correct, but there’s no escaping that you’re relying on inside-view too. Our disagreement only escapes [inside-view dependence on your side] once we broadly agree on [the influence of inside-view properties on the relevance/significance of your reference classes]. I assume that we’d have significant disagreements there.
    Though it still seems useful to figure out where. I expect that there are reference classes that we’d agree could clarify various sub-questions.
    In many non-AI-x-risk situations, we would agree—some modest level of inside-view agreement would be sufficient to broadly agree about the relevance/significance of various reference classes.