CCC says (for non-evil goals) “if the optimal policy is catastrophic, then it’s because of power-seeking”
However, that’s not what the CCC currently says. E.g. compare: [Unaligned goals] tend to [have catastrophe-inducing optimal policies] because of [power-seeking incentives]. [People teleported to the moon] tend to [die] because of [lack of oxygen].
The latter doesn’t lead to the conclusion: “If people teleported to the moon had oxygen, they wouldn’t tend to die.”
Your meaning will become clear to anyone who reads this sequence. For anyone taking a more cursory look, I think it’d be clearer if your clarification were the official CCC:
CCC: (for non-evil goals) “if the optimal policy is catastrophic, then it’s because of power-seeking”
Currently, I worry about people pulling an accidental motte-and-bailey on themselves, and thinking that [weak interpretation of CCC] implies [conclusions based on strong interpretation]. (or thinking that you’re claiming this)
I understand what you mean with the CCC (and that this seems a bit of a nit-pick!), but I think the wording could usefully be clarified.
As you suggest here, the following is what you mean:
However, that’s not what the CCC currently says.
E.g. compare:
[Unaligned goals] tend to [have catastrophe-inducing optimal policies] because of [power-seeking incentives].
[People teleported to the moon] tend to [die] because of [lack of oxygen].
The latter doesn’t lead to the conclusion: “If people teleported to the moon had oxygen, they wouldn’t tend to die.”
Your meaning will become clear to anyone who reads this sequence.
For anyone taking a more cursory look, I think it’d be clearer if your clarification were the official CCC:
Currently, I worry about people pulling an accidental motte-and-bailey on themselves, and thinking that [weak interpretation of CCC] implies [conclusions based on strong interpretation]. (or thinking that you’re claiming this)