If it’s capable of self-modifying, then it could do weirder things.
Yes, but much weirder than you’re imagining :-) This agent design is highly unstable, and will usually self-modify into something else entirely, very fast (see the top example where it self-modifies into a non-transitive agent).
If the AI wants to bias itself in favor of [...], what if it self-modifies to first convince itself that news source [...]
What is the expected utility from that bias action (given the expected behaviour after)? The AI has to make that bias decision while not being biased. So this doesn’t get round the conservation of expected evidence.
Yes, but much weirder than you’re imagining :-) This agent design is highly unstable, and will usually self-modify into something else entirely, very fast (see the top example where it self-modifies into a non-transitive agent).
What is the expected utility from that bias action (given the expected behaviour after)? The AI has to make that bias decision while not being biased. So this doesn’t get round the conservation of expected evidence.