I don’t see any reason why AI has to act coherently. If it prefers A to B, B to C, and C to A, it might not care. You could program it to prefer that utility function.*
If not, maybe the A-liking aspects will reprogram B and C out of it’s utility function, or maybe not. What happens would depend entirely on the details of how it was programmed.
Maybe it would spend all the universe’s energy turning our future light cone from C to B, then from B to A, and also from A to C. Maybe it would do this all at once, if it was programmed to follow one “goal” before preceding to the next. Or maybe different parts of the universe would be in different stages, all at the same time. Think of it like a light-cone blender on pure.
Our default preferences seem about that coherent, but we’re able to walk and talk, so clearly it’s possible. It explains a lot of the madness and incoherence of the way the world is structured, certainly. Luckily, we seem to value coherence, or at least are willing to sacrifice on having our cake and eating it too when it becomes clear that we can’t have it both ways. It’s possible an subtly incoherent AGI would operate at cross purposes for a long time before discovering and correcting it’s utility function, if it valued coherence.
However, MIRI isn’t trying to program a sane AGI, not explore all possible ways an AI can be insane. Economists like to simplify human motives into idealized rational agents, because they are much, much simpler to reason about. The same is true for MIRI, I think.
I’ve given this sort of thing a little thought, and have a Evernote note I can turn into a LW post, if there is interest.
* I use the term “utility function broadly, here. I guess “programming” would be more correct, but even an A>B>C>A AI bears some rough resemblance to a utility function, even if it isn’t coherent.
I don’t see any reason why AI has to act coherently. If it prefers A to B, B to C, and C to A, it might not care. You could program it to prefer that utility function.*
If not, maybe the A-liking aspects will reprogram B and C out of it’s utility function, or maybe not. What happens would depend entirely on the details of how it was programmed.
Maybe it would spend all the universe’s energy turning our future light cone from C to B, then from B to A, and also from A to C. Maybe it would do this all at once, if it was programmed to follow one “goal” before preceding to the next. Or maybe different parts of the universe would be in different stages, all at the same time. Think of it like a light-cone blender on pure.
Our default preferences seem about that coherent, but we’re able to walk and talk, so clearly it’s possible. It explains a lot of the madness and incoherence of the way the world is structured, certainly. Luckily, we seem to value coherence, or at least are willing to sacrifice on having our cake and eating it too when it becomes clear that we can’t have it both ways. It’s possible an subtly incoherent AGI would operate at cross purposes for a long time before discovering and correcting it’s utility function, if it valued coherence.
However, MIRI isn’t trying to program a sane AGI, not explore all possible ways an AI can be insane. Economists like to simplify human motives into idealized rational agents, because they are much, much simpler to reason about. The same is true for MIRI, I think.
I’ve given this sort of thing a little thought, and have a Evernote note I can turn into a LW post, if there is interest.
* I use the term “utility function broadly, here. I guess “programming” would be more correct, but even an A>B>C>A AI bears some rough resemblance to a utility function, even if it isn’t coherent.