That would be a very poetic way to die: an AI desperately pulling every bit of info it can out of a human, and dumping that info into the environment. They do say that humanity’s death becomes more gruesome and dystopian the closer the proposal is to working, and that does sound decidedly gruesome and dystopian.
Anyway, more concretely, the problem which jumps out to me is that maximizing mutualinfo(user command, agent action) + mutualinfo(agent action, environment change) just means that all the info from the command routes through the action and into the environment in some way; the semantics or intent of the command need not have anything at all to do with the resulting environmental change. Like, maybe there’s a prompt on my screen which says “would you like the lightswitch on (1) or off (0)?”, and I enter “1″, and then the AI responds by placing a coin heads-side-up. There’s no requirement that my one bit actually needs to be encoded into the environment in a way which has anything to do with the lightswitch.
When I sent him the link to this comment, he replied:
ah i think you forgot the first term in the MIMI objective, I(s_t, x_t), which makes the mapping intuitive by maximizing information flow from the environment into the user. what you proposed was similar to optimizing only the second term, I(x_t, s_t+1 | s_t), which would indeed suffer from the problems that john mentions in his reply
That would be a very poetic way to die: an AI desperately pulling every bit of info it can out of a human, and dumping that info into the environment. They do say that humanity’s death becomes more gruesome and dystopian the closer the proposal is to working, and that does sound decidedly gruesome and dystopian.
Anyway, more concretely, the problem which jumps out to me is that maximizing mutualinfo(user command, agent action) + mutualinfo(agent action, environment change) just means that all the info from the command routes through the action and into the environment in some way; the semantics or intent of the command need not have anything at all to do with the resulting environmental change. Like, maybe there’s a prompt on my screen which says “would you like the lightswitch on (1) or off (0)?”, and I enter “1″, and then the AI responds by placing a coin heads-side-up. There’s no requirement that my one bit actually needs to be encoded into the environment in a way which has anything to do with the lightswitch.
When I sent him the link to this comment, he replied:
my imprecision may have mislead you :)