If it’s capable of self-modifying, then it could do weirder things.
For example, let’s say the AI knows that news source X will almost always push stories in favor of action Y. (Fox News will almost always push information that supports the argument we should bomb the middle east, The Guardian will almost always push information that supports us becoming more socailist, whatever.) If the AI wants to bias itself in favor of thinking that action Y will create more utility, what if it self-modifies to first convince itself that news source X is a much more reliable source of information then it actually is and to weigh that information more heavily in it’s future analysis?
If it can’t self-modify directly, it could maybe do tricky things involving only observing the desired information source at key moments with the goal of increasing it’s own confidence in that information source, and then once it has modified it’s own confidence sufficiently then it looks at that information source to find the information it is looking for.
(Again, this sounds crazy, but keep in mind humans do this stuff to themselves all the time.)
Ect. Basically what this all boils down to is the AI doesn’t really care about what happens in the real world, it’s not trying to actually accomplish a goal; instead it’s primary objective is to make itself think that it has an 80% chance of accomplishing the goal (or whatever), and once it does that it doesn’t really matter if the goal happens or not. It has a built in motivation to try to trick itself.
If it’s capable of self-modifying, then it could do weirder things.
Yes, but much weirder than you’re imagining :-) This agent design is highly unstable, and will usually self-modify into something else entirely, very fast (see the top example where it self-modifies into a non-transitive agent).
If the AI wants to bias itself in favor of [...], what if it self-modifies to first convince itself that news source [...]
What is the expected utility from that bias action (given the expected behaviour after)? The AI has to make that bias decision while not being biased. So this doesn’t get round the conservation of expected evidence.
If it’s capable of self-modifying, then it could do weirder things.
For example, let’s say the AI knows that news source X will almost always push stories in favor of action Y. (Fox News will almost always push information that supports the argument we should bomb the middle east, The Guardian will almost always push information that supports us becoming more socailist, whatever.) If the AI wants to bias itself in favor of thinking that action Y will create more utility, what if it self-modifies to first convince itself that news source X is a much more reliable source of information then it actually is and to weigh that information more heavily in it’s future analysis?
If it can’t self-modify directly, it could maybe do tricky things involving only observing the desired information source at key moments with the goal of increasing it’s own confidence in that information source, and then once it has modified it’s own confidence sufficiently then it looks at that information source to find the information it is looking for.
(Again, this sounds crazy, but keep in mind humans do this stuff to themselves all the time.)
Ect. Basically what this all boils down to is the AI doesn’t really care about what happens in the real world, it’s not trying to actually accomplish a goal; instead it’s primary objective is to make itself think that it has an 80% chance of accomplishing the goal (or whatever), and once it does that it doesn’t really matter if the goal happens or not. It has a built in motivation to try to trick itself.
Yes, but much weirder than you’re imagining :-) This agent design is highly unstable, and will usually self-modify into something else entirely, very fast (see the top example where it self-modifies into a non-transitive agent).
What is the expected utility from that bias action (given the expected behaviour after)? The AI has to make that bias decision while not being biased. So this doesn’t get round the conservation of expected evidence.