I don’t think Holden Karnofsky is too familiar with the topic of machine intelligence. This seems to be rather amateurish rambling to me.
“Tools”—as Karnofsky defines them—have no actuators besides their human displays and they keep humans “in the loop”. They are thus intrinsically slow and impotent. The future won’t belong to the builders of the slow and impotent machines—so: there’s a big motivation to cut the humans out of the loop using automation—and give the machines more potent actuators. This process has already been taking place on a large scale during industrial automation.
Also, this idea has been discussed extensively in the form of “Oracle AI”. Holden should have searched for that.
So what? He may be an amateur, but he is very clearly a highly intelligent person who has worked hard to understand SI’s position. SI is correct to acknowledge as a flaw that no part of their published writings addresses what he is saying for a nonspecialist like him.
Holden should update here, IMO. One of the lessons is probably to not criticise others reagrding complex technical topics when you don’t really understand them. Holden’s case doesn’t really benefit from technical overstatements, IMO—especially muddled ones.
Matt’s right, Karnofsky does think tools are distinct from Oracles. But I agree with your main point: my first thought was “you can make an algorithmic stock trader ‘tool AI’ that advises humans, but it’ll get its card punched by the fully automated millisecond traders already out there.”
Yes; what we now see are the HFT realm where only algorithms can compete; and the realm 6-12 orders of magnitude or so slower where human+tool AI symbiotes still dominate. Of course, HFT is over half of equity trading volume these days, and seems to be still growing—both in absolute numbers, and as a proportion of total trading. I’d guess that human+tool AI’s scope of dominance is shrinking.
Which is a mistake; at least, I’ve been reading about Oracle AIs for at least as long as him and have read the relevant papers, and I had the distinct impression that Oracle AIs were defined as AIs with a utility function of answering questions which take undesired actions anyway. He’s just conflating Oracle AIs with AI-in-a-box, which is wrong.
This is not what he’s talking about either. He thinks of “utility function of answering questions” as an AI-in-a-box, and different from a Tool AI.
I think his notion is closer (I still don’t know exactly what he means, but I am pretty sure your summary is not right) to a pure decision theory program, you give it a set of inputs and it outputs what it would do in this situation. For example, an ultra simple version of this might be you input a finite number of option utilities and it does an argmax to find the biggest one, returning the option number. This would not be automatically self improving, because each time an action has to be taken, humans have to take that action. Even for thinks like “turn on sensors” or “gather more information”.
There’s no utility function involved in the program.
Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an “agent mode” (as Watson was on Jeopardy!) but all can easily be set up to be used as “tools” (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)
The “tool mode” concept is importantly different from the possibility of Oracle AI sometimes discussed by SI. The discussions I’ve seen of Oracle AI present it as an Unfriendly AI that is “trapped in a box”—an AI whose intelligence is driven by an explicit utility function and that humans hope to control coercively. Hence the discussion of ideas such as the AI-Box Experiment. A different interpretation, given in Karnofsky/Tallinn 2011, is an AI with a carefully designed utility function—likely as difficult to construct as “Friendliness”—that leaves it “wishing” to answer questions helpfully. By contrast with both these ideas, Tool-AGI is not “trapped” and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching “want,” and so, as with the specialized AIs described above, while it may sometimes “misinterpret” a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.
The ‘different interpretation’ of 2011 is the standard interpretation. Holden is the only one who thinks the standard interpretation is actually a UFAI-in-a-box. If you don’t believe me, go back into old materials.
For example, this 2007 email by Eliezer replying to Tim Freeman. Is what Tim says that is described by Eliezer as applying to Oracle AI consistent with what I claim is “Oracle AI”, or with what Holden claims it is? Or Stuart Armstrong, or Vladimir alluding to Bostrom & Yudkowsky, or Peter McCluskey.
Oracle AI means an AI with a utility function of answering questions. It does not mean an AI with any utility function inside a box. Case closed.
I think we’re misunderstanding each other. You seemed to think that this “He talks about that and thinks Oracle AIs are distinct from tools.” was a mistake.
I understand Holden to be trying to invent a new category of AI, called “tool-AI”, which is not just an AGI with a utility function for answering questions nor a UFAI in a box (he may be wrong about which definition/interpretation is more popular, but that’s mostly irrelevant to his claim because he’s just trying to distinguish his idea from these other ideas). He claims that this category has not been discussed much.
He says “Yes, I agree AI’s with utility functions for answering questions will do terrible things just like UFAI in a box, but my idea is qualitatively different either of these, and it hasn’t been discussed”.
His definition of ‘tool’ is pretty weird. This is an ordinary dictionary word. Why give it a bizarre, esoteric meaning? Yes, I did read where Holden mentioned Oracles.
This seems to be rather amateurish rambling to me.
AI researchers in general appear to think this of SIAI, per the Muelhauser-Wang dialogue—they just don’t consider its concerns are worth taking seriously. That dialogue read to me like “I’m trying to be polite here, but you guys are crackpots.” This was a little disconcerting to see.
I don’t think Holden Karnofsky is too familiar with the topic of machine intelligence. This seems to be rather amateurish rambling to me.
“Tools”—as Karnofsky defines them—have no actuators besides their human displays and they keep humans “in the loop”. They are thus intrinsically slow and impotent. The future won’t belong to the builders of the slow and impotent machines—so: there’s a big motivation to cut the humans out of the loop using automation—and give the machines more potent actuators. This process has already been taking place on a large scale during industrial automation.
Also, this idea has been discussed extensively in the form of “Oracle AI”. Holden should have searched for that.
So what? He may be an amateur, but he is very clearly a highly intelligent person who has worked hard to understand SI’s position. SI is correct to acknowledge as a flaw that no part of their published writings addresses what he is saying for a nonspecialist like him.
Holden should update here, IMO. One of the lessons is probably to not criticise others reagrding complex technical topics when you don’t really understand them. Holden’s case doesn’t really benefit from technical overstatements, IMO—especially muddled ones.
Even assuming that you are right, SI should write more clearly, to make it for people like Holden easier to update.
If you try to communicate an idea, and even intelligent and curious people get it wrong, something is wrong with the message.
String theory seems to be a counter example. That’s relevant since machine intelligence is a difficult topic.
Matt’s right, Karnofsky does think tools are distinct from Oracles. But I agree with your main point: my first thought was “you can make an algorithmic stock trader ‘tool AI’ that advises humans, but it’ll get its card punched by the fully automated millisecond traders already out there.”
Will it? Human traders still exist right? If they can still make money then ones with a smart adviser would make more money.
Yes; what we now see are the HFT realm where only algorithms can compete; and the realm 6-12 orders of magnitude or so slower where human+tool AI symbiotes still dominate. Of course, HFT is over half of equity trading volume these days, and seems to be still growing—both in absolute numbers, and as a proportion of total trading. I’d guess that human+tool AI’s scope of dominance is shrinking.
ooh. HFT gives a great deal of perspective on my comment on QA->tool->daemon—HFT is daemon-level programs achieving results many consider unFriendly.
Did you read the whole post? He talks about that and thinks Oracle AIs are distinct from tools.
Which is a mistake; at least, I’ve been reading about Oracle AIs for at least as long as him and have read the relevant papers, and I had the distinct impression that Oracle AIs were defined as AIs with a utility function of answering questions which take undesired actions anyway. He’s just conflating Oracle AIs with AI-in-a-box, which is wrong.
This is not what he’s talking about either. He thinks of “utility function of answering questions” as an AI-in-a-box, and different from a Tool AI.
I think his notion is closer (I still don’t know exactly what he means, but I am pretty sure your summary is not right) to a pure decision theory program, you give it a set of inputs and it outputs what it would do in this situation. For example, an ultra simple version of this might be you input a finite number of option utilities and it does an argmax to find the biggest one, returning the option number. This would not be automatically self improving, because each time an action has to be taken, humans have to take that action. Even for thinks like “turn on sensors” or “gather more information”.
There’s no utility function involved in the program.
Go back and look at what he wrote:
The ‘different interpretation’ of 2011 is the standard interpretation. Holden is the only one who thinks the standard interpretation is actually a UFAI-in-a-box. If you don’t believe me, go back into old materials.
For example, this 2007 email by Eliezer replying to Tim Freeman. Is what Tim says that is described by Eliezer as applying to Oracle AI consistent with what I claim is “Oracle AI”, or with what Holden claims it is? Or Stuart Armstrong, or Vladimir alluding to Bostrom & Yudkowsky, or Peter McCluskey.
Oracle AI means an AI with a utility function of answering questions. It does not mean an AI with any utility function inside a box. Case closed.
I think we’re misunderstanding each other. You seemed to think that this “He talks about that and thinks Oracle AIs are distinct from tools.” was a mistake.
I understand Holden to be trying to invent a new category of AI, called “tool-AI”, which is not just an AGI with a utility function for answering questions nor a UFAI in a box (he may be wrong about which definition/interpretation is more popular, but that’s mostly irrelevant to his claim because he’s just trying to distinguish his idea from these other ideas). He claims that this category has not been discussed much.
He says “Yes, I agree AI’s with utility functions for answering questions will do terrible things just like UFAI in a box, but my idea is qualitatively different either of these, and it hasn’t been discussed”.
Are we still talking past each other?
Probably not.
His definition of ‘tool’ is pretty weird. This is an ordinary dictionary word. Why give it a bizarre, esoteric meaning? Yes, I did read where Holden mentioned Oracles.
AI researchers in general appear to think this of SIAI, per the Muelhauser-Wang dialogue—they just don’t consider its concerns are worth taking seriously. That dialogue read to me like “I’m trying to be polite here, but you guys are crackpots.” This was a little disconcerting to see.
In other words, Wang has made a System-1 evaluation of SI, on a topic where System-1 doesn’t work very well.
Downsides of technology deserve taking seriously—and are taken seriously by many researchers—e.g. here.
Remember: “With great power comes great responsibility”.