Congratulations! You have just outsmarted an AI, sneakily allowing it to have great impact where it desires to not have impact at all.
Edited to add: the above was sarcastic. Surely an AI would realise that it is possible you are trying tricks like this, and still output the wrong coordinate if this possibility is high enough.
I read those two, but I don’t see what this idea contributes to AI control on top of those ideas. If you can get the AI to act like it believes what you want it to, in spite of evidence, then there’s no need to try the tricks with two coordinates. Conversely, if you cannot then you won’t fool it either with telling it that there’s a second coordinate involved. Why is it useful to control an AI through this splitting of information, if we already have the false miracles? Or in case the miracles fail, how do you prevent an AI from seeing right through this scheme? I think that in the latter case you are trying nothing more than to outsmart an AI here...
to act like it believes what you want it to, in spite of evidence
The approach I’m trying to get is to be able to make the AI do stuff without having to define hard concepts. “Deflect the meteor but without having undue impact on the world” is a hard concept.
“reduced impact” seems easier, and “false belief” is much easier. It seems we can combine the two in this way to get something we want without needing to define it.
Congratulations! You have just outsmarted an AI, sneakily allowing it to have great impact where it desires to not have impact at all.
Edited to add: the above was sarcastic. Surely an AI would realise that it is possible you are trying tricks like this, and still output the wrong coordinate if this possibility is high enough.
http://lesswrong.com/r/discussion/lw/lyh/utility_vs_probability_idea_synthesis/ and http://lesswrong.com/lw/ltf/false_thermodynamic_miracles/
I read those two, but I don’t see what this idea contributes to AI control on top of those ideas. If you can get the AI to act like it believes what you want it to, in spite of evidence, then there’s no need to try the tricks with two coordinates. Conversely, if you cannot then you won’t fool it either with telling it that there’s a second coordinate involved. Why is it useful to control an AI through this splitting of information, if we already have the false miracles? Or in case the miracles fail, how do you prevent an AI from seeing right through this scheme? I think that in the latter case you are trying nothing more than to outsmart an AI here...
The approach I’m trying to get is to be able to make the AI do stuff without having to define hard concepts. “Deflect the meteor but without having undue impact on the world” is a hard concept.
“reduced impact” seems easier, and “false belief” is much easier. It seems we can combine the two in this way to get something we want without needing to define it.