Which is a mistake; at least, I’ve been reading about Oracle AIs for at least as long as him and have read the relevant papers, and I had the distinct impression that Oracle AIs were defined as AIs with a utility function of answering questions which take undesired actions anyway. He’s just conflating Oracle AIs with AI-in-a-box, which is wrong.
This is not what he’s talking about either. He thinks of “utility function of answering questions” as an AI-in-a-box, and different from a Tool AI.
I think his notion is closer (I still don’t know exactly what he means, but I am pretty sure your summary is not right) to a pure decision theory program, you give it a set of inputs and it outputs what it would do in this situation. For example, an ultra simple version of this might be you input a finite number of option utilities and it does an argmax to find the biggest one, returning the option number. This would not be automatically self improving, because each time an action has to be taken, humans have to take that action. Even for thinks like “turn on sensors” or “gather more information”.
There’s no utility function involved in the program.
Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an “agent mode” (as Watson was on Jeopardy!) but all can easily be set up to be used as “tools” (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)
The “tool mode” concept is importantly different from the possibility of Oracle AI sometimes discussed by SI. The discussions I’ve seen of Oracle AI present it as an Unfriendly AI that is “trapped in a box”—an AI whose intelligence is driven by an explicit utility function and that humans hope to control coercively. Hence the discussion of ideas such as the AI-Box Experiment. A different interpretation, given in Karnofsky/Tallinn 2011, is an AI with a carefully designed utility function—likely as difficult to construct as “Friendliness”—that leaves it “wishing” to answer questions helpfully. By contrast with both these ideas, Tool-AGI is not “trapped” and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching “want,” and so, as with the specialized AIs described above, while it may sometimes “misinterpret” a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.
The ‘different interpretation’ of 2011 is the standard interpretation. Holden is the only one who thinks the standard interpretation is actually a UFAI-in-a-box. If you don’t believe me, go back into old materials.
For example, this 2007 email by Eliezer replying to Tim Freeman. Is what Tim says that is described by Eliezer as applying to Oracle AI consistent with what I claim is “Oracle AI”, or with what Holden claims it is? Or Stuart Armstrong, or Vladimir alluding to Bostrom & Yudkowsky, or Peter McCluskey.
Oracle AI means an AI with a utility function of answering questions. It does not mean an AI with any utility function inside a box. Case closed.
I think we’re misunderstanding each other. You seemed to think that this “He talks about that and thinks Oracle AIs are distinct from tools.” was a mistake.
I understand Holden to be trying to invent a new category of AI, called “tool-AI”, which is not just an AGI with a utility function for answering questions nor a UFAI in a box (he may be wrong about which definition/interpretation is more popular, but that’s mostly irrelevant to his claim because he’s just trying to distinguish his idea from these other ideas). He claims that this category has not been discussed much.
He says “Yes, I agree AI’s with utility functions for answering questions will do terrible things just like UFAI in a box, but my idea is qualitatively different either of these, and it hasn’t been discussed”.
Which is a mistake; at least, I’ve been reading about Oracle AIs for at least as long as him and have read the relevant papers, and I had the distinct impression that Oracle AIs were defined as AIs with a utility function of answering questions which take undesired actions anyway. He’s just conflating Oracle AIs with AI-in-a-box, which is wrong.
This is not what he’s talking about either. He thinks of “utility function of answering questions” as an AI-in-a-box, and different from a Tool AI.
I think his notion is closer (I still don’t know exactly what he means, but I am pretty sure your summary is not right) to a pure decision theory program, you give it a set of inputs and it outputs what it would do in this situation. For example, an ultra simple version of this might be you input a finite number of option utilities and it does an argmax to find the biggest one, returning the option number. This would not be automatically self improving, because each time an action has to be taken, humans have to take that action. Even for thinks like “turn on sensors” or “gather more information”.
There’s no utility function involved in the program.
Go back and look at what he wrote:
The ‘different interpretation’ of 2011 is the standard interpretation. Holden is the only one who thinks the standard interpretation is actually a UFAI-in-a-box. If you don’t believe me, go back into old materials.
For example, this 2007 email by Eliezer replying to Tim Freeman. Is what Tim says that is described by Eliezer as applying to Oracle AI consistent with what I claim is “Oracle AI”, or with what Holden claims it is? Or Stuart Armstrong, or Vladimir alluding to Bostrom & Yudkowsky, or Peter McCluskey.
Oracle AI means an AI with a utility function of answering questions. It does not mean an AI with any utility function inside a box. Case closed.
I think we’re misunderstanding each other. You seemed to think that this “He talks about that and thinks Oracle AIs are distinct from tools.” was a mistake.
I understand Holden to be trying to invent a new category of AI, called “tool-AI”, which is not just an AGI with a utility function for answering questions nor a UFAI in a box (he may be wrong about which definition/interpretation is more popular, but that’s mostly irrelevant to his claim because he’s just trying to distinguish his idea from these other ideas). He claims that this category has not been discussed much.
He says “Yes, I agree AI’s with utility functions for answering questions will do terrible things just like UFAI in a box, but my idea is qualitatively different either of these, and it hasn’t been discussed”.
Are we still talking past each other?
Probably not.