On your bottom line, I entirely agree—to the extent that there are non-power-seeking strategies that’d be effective, I’m all for them. To the extent that we disagree, I think it’s about [what seems likely to be effective] rather than [whether non-power-seeking is a desirable property].
Constrained-power-seeking still seems necessary to me. (unfortunately)
A few clarifications:
I guess most technical AIS work is net negative in expectation. My ask there is that people work on clearer cases for their work being positive.
I don’t think my (or Eliezer’s) conclusions on strategy are downstream of [likelihood of doom]. I’ve formed some model of the situation. One output of the model is [likelihood of doom]. Another is [seemingly least bad strategies]. The strategies are based around why doom seems likely, not (primarily) that doom seems likely.
It doesn’t feel like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
It feels like the level of power-seeking I’m suggesting seems necessary is appropriate.
My cognitive biases push me away from enacting power-seeking strategies.
Biases aside, confidence in [power seems necessary] doesn’t imply confidence that I know what constraints I’d want applied to the application of that power.
In strategies I’d like, [constraints on future use of power] would go hand in hand with any [accrual of power].
It’s non-obvious that there are good strategies with this property, but the unconstrained version feels both suspicious and icky to me.
Suspicious, since [I don’t have a clue how this power will need to be directed now, but trust me—it’ll be clear later (and the right people will remain in control until then)] does not justify confidence.
To me, you seem to be over-rating the applicability of various reference classes in assessing [(inputs to) likelihood of doom]. As I think I’ve said before, it seems absolutely the correct strategy to look for evidence based on all the relevant reference classes we can find.
However, all else equal, I’d expect:
Spending a long time looking for x, makes x feel more important.
[Wanting to find useful x] tends to shade into [expecting to find useful x] and [perceiving xs as more useful than they are].
Particularly so when [absent x, we’ll have no clear path to resolving hugely important uncertainties].
The world doesn’t owe us convenient reference classes. I don’t think there’s any way around inside-view analysis here—in particular, [how relevant/significant is this reference class to this situation?] is an inside-view question.
That doesn’t make my (or Eliezer’s, or …’s) analysis correct, but there’s no escaping that you’re relying on inside-view too. Our disagreement only escapes [inside-view dependence on your side] once we broadly agree on [the influence ofinside-view properties on the relevance/significance of your reference classes]. I assume that we’d have significant disagreements there.
Though it still seems useful to figure out where. I expect that there are reference classes that we’d agree could clarify various sub-questions.
In many non-AI-x-risk situations, we would agree—some modest level of inside-view agreement would be sufficient to broadly agree about the relevance/significance of various reference classes.
On your bottom line, I entirely agree—to the extent that there are non-power-seeking strategies that’d be effective, I’m all for them. To the extent that we disagree, I think it’s about [what seems likely to be effective] rather than [whether non-power-seeking is a desirable property].
Constrained-power-seeking still seems necessary to me. (unfortunately)
A few clarifications:
I guess most technical AIS work is net negative in expectation. My ask there is that people work on clearer cases for their work being positive.
I don’t think my (or Eliezer’s) conclusions on strategy are downstream of [likelihood of doom]. I’ve formed some model of the situation. One output of the model is [likelihood of doom]. Another is [seemingly least bad strategies]. The strategies are based around why doom seems likely, not (primarily) that doom seems likely.
It doesn’t feel like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
It feels like the level of power-seeking I’m suggesting seems necessary is appropriate.
My cognitive biases push me away from enacting power-seeking strategies.
Biases aside, confidence in [power seems necessary] doesn’t imply confidence that I know what constraints I’d want applied to the application of that power.
In strategies I’d like, [constraints on future use of power] would go hand in hand with any [accrual of power].
It’s non-obvious that there are good strategies with this property, but the unconstrained version feels both suspicious and icky to me.
Suspicious, since [I don’t have a clue how this power will need to be directed now, but trust me—it’ll be clear later (and the right people will remain in control until then)] does not justify confidence.
To me, you seem to be over-rating the applicability of various reference classes in assessing [(inputs to) likelihood of doom]. As I think I’ve said before, it seems absolutely the correct strategy to look for evidence based on all the relevant reference classes we can find.
However, all else equal, I’d expect:
Spending a long time looking for x, makes x feel more important.
[Wanting to find useful x] tends to shade into [expecting to find useful x] and [perceiving xs as more useful than they are].
Particularly so when [absent x, we’ll have no clear path to resolving hugely important uncertainties].
The world doesn’t owe us convenient reference classes. I don’t think there’s any way around inside-view analysis here—in particular, [how relevant/significant is this reference class to this situation?] is an inside-view question.
That doesn’t make my (or Eliezer’s, or …’s) analysis correct, but there’s no escaping that you’re relying on inside-view too. Our disagreement only escapes [inside-view dependence on your side] once we broadly agree on [the influence of inside-view properties on the relevance/significance of your reference classes]. I assume that we’d have significant disagreements there.
Though it still seems useful to figure out where. I expect that there are reference classes that we’d agree could clarify various sub-questions.
In many non-AI-x-risk situations, we would agree—some modest level of inside-view agreement would be sufficient to broadly agree about the relevance/significance of various reference classes.