Vladimir_Nesov comments on A taxonomy of Oracle AIs

Vladimir_Nesov 9 Mar 2012 9:42 UTC
0 points

Given that a True Oracle AI acts, by answering questions, to achieve its goal, it follows that True Oracle AI is only safe if its goal is fully compatible with human values. A limited interaction channel is not a good defense against a superintelligence.

Bad argument for the right policy. What does it follow from that “True Oracle AI is only safe if its goal is fully compatible with human values”? You just said “it follows”, like “may I use the copy machine because I need to make copies”. There may well be other circumstances other than AI’s goals being “fully compatible” (what does that mean?) with human values, that render (this particular instance of) AI safe, this question is presumably the topic of this post. And a sufficiently limited interaction channel (e.g. a few output bits without timestamp) may well be a sufficiently good defense against a superintelligence.

(Good arguments include insufficiently limited interaction channel being dangerous, and us not knowing how much limitation is sufficient, the AI escaping on some other occasion if the technology to create it exists, AI pretending to be dumber than it is, and its operators opening up too wide an interaction channel or just dropping some parts of security in an attempt to make it useful, etc.)

Very powerful Predictors are unsafe in a rather surprising way: when given sufficient data about the real world, they exhibit goal-seeking behavior, i.e. they calculate a distribution over future data in a way that brings about certain real-world states.

You are speaking the language of observation and certainty, which I think is inappropriate in this case. There are probably different kinds of predictors, not all of which exhibit such features. You are essentially asking, “How would I, as an optimizer, design a good Predictor?”, but a predictor doesn’t obviously have to be built this way, to consider counterfactuals following from various predictions; it could just be helpless to influence its prediction in a way that makes its prediction more accurate, broken as an optimizer. It may also, as usual, be impossible to control anything in the physical world to any useful extent without a sufficient output channel (or, equivalently, systems that predict the predictor).