The basic answer is acausal decision theories make AI collusion/cooperation easier.
The most important aspect of acausal/logical decision theories is that they cooperate even in scenarios designed to force non-cooperation, like the Prisoner’s Dilemma.
And there’s already evidence that RLHF makes models have more acausal decision theories.
The basic answer is acausal decision theories make AI collusion/cooperation easier.
The most important aspect of acausal/logical decision theories is that they cooperate even in scenarios designed to force non-cooperation, like the Prisoner’s Dilemma.
And there’s already evidence that RLHF makes models have more acausal decision theories.