In defense of Oracle (“Tool”) AI research
(Update 2022: Enjoy the post, but note that it’s old, has some errors, and is certainly not reflective of my current thinking. –Steve)
Low confidence; offering this up for discussion
An Oracle AI is an AI that only answers questions, and doesn’t take any other actions. The opposite of an Oracle AI is an Agent AI, which might also send emails, control actuators, etc.
I’m especially excited about the possibility of non-self-improving oracle AIs, dubbed Tool AI in a 2012 article by Holden Karnofsky.
I’ve seen two arguments against this “Tool AI”:
First, as in Eliezer’s 2012 response to Holden, we don’t know how to safely make and operate an oracle AGI (just like every other type of AGI). Fair enough! I never said this is an easy solution to all our problems! (But see my separate post for why I’m thinking about this.)
Second, as in Gwern’s 2016 essay, there’s a coordination problem. Even if we could build a safe oracle AGI, the argument goes, there will still be an economic incentive to build an agent AGI, because you can do more and better and faster by empowering the AGI to take actions. Thus, agreeing to never ever build agent AGIs is a very hard coordination problem for society. I don’t find the coordination argument compelling—in fact, I think it’s backwards—and I wrote this post to explain why.
Five reasons I don’t believe the coordination / competitiveness argument against oracles
1. If the oracle isn’t smart or powerful enough for our needs, we can solve that by bootstrapping. Even if the oracle is not inherently self-modifying, we can ask it for advice and do human-in-the-loop modifications to make more powerful successor oracles. By the same token, we can ask an oracle AGI for advice about how to design a safe agent AGI.
2. Avoiding coordination problems is a pipe dream; we need to solve the coordination problem at some point, and that point might as well be at the oracle stage. As far as I can tell, we will never get to a stage where we know how to build safe AGIs and where there is no possibility of making more-powerful-and-less-safe AGIs. If we have a goal in the world that we really really want to happen, a low-impact agent is going to be less effective than a not-impact-restrained agent; an act-based agent is going to be less effective than a goal-seeking agent;[1] and so on and so forth. It seems likely that, no matter how powerful a safe AGI we can make, there will always be an incentive for people to try experimenting with even more powerful unsafe alternative designs.
Therefore, at some point in AI development, we have to blow the whistle, declare that technical solutions aren’t enough, and we need to start relying 100% on actually solving the coordination problem. When is that point? Hopefully far enough along that we realize the benefits of AGI for humanity—automating the development of new technology to help solve problems, dramatically improving our ability to think clearly and foresightedly about our decisions, and so on. Oracles can do all that! So why not just stop when we get to AGI oracles?
Indeed, once I started thinking along those lines, I actually see the coordination argument going in the other direction! I say restricting ourselves to oracle AI make coordination easier, not harder! Why is that? Two more reasons:
3. We want a high technological barrier between us and the most dangerous systems: These days, I don’t think anyone takes seriously the idea of building an all-powerful benevolent dictator AGI implementing CEV. [ETA: If you do take that idea seriously, see point 1 above on bootstrapping.] At least as far as I can tell from the public discourse, there seems to be a growing consensus that humans should always and forever be in the loop of AGIs. (That certainly sounds like a good idea to me!) Thus, the biggest coordination problem we face is: “Don’t ever make a human-out-of-the-loop free-roaming AGI world-optimizer.” This is made easier by having a high technological barrier between the safe AGIs that we are building and using, and the free-roaming AGI world-optimizers that we are forbidding. If we make an agent AGI—whether corrigible, aligned, norm-following, low-impact, or whatever—I just don’t see any technological barrier there. It seems like it would be trivial for a rogue employee to tweak such an AGI to stop asking permission, deactivate the self-restraint code, and go tile the universe with hedonium at all costs (or whatever that rogue employee happens to value). By contrast, if we stop when we get to oracle AI, it seems like there would be a higher technological barrier to turning it into a free-roaming AGI world-optimizer—probably not that high a barrier, but higher than the alternatives. (The height of this technological barrier, and indeed whether there’s a barrier at all, is hard to say.… It probably depends on how exactly the oracles are constructed and access-controlled.)
4. We want a bright-line, verifiable rule between us and the most dangerous systems: Even more importantly, take the rule:
“AGIs are not allowed to do anything except output pixels onto a screen.”
This is a nice, simple, bright-line rule, which moreover has at least a chance of being verifiable by external auditors. By contrast, if we try to draw a line through the universe of agent AGIs, defining how low-impact is low-impact enough, how act-based is act-based enough, and so on, it seems to me like it would inevitably be a complicated, blurry, and unenforceable line. This would make a very hard coordination problem very much harder still.
[Clarifications on this rule: (A) I’m not saying this rule would be easy to enforce (globally and forever), only that it would be less hard than alternatives; (B) I’m not saying that, if we enforce this rule, we are free and clear of all possible existential risks, but rather that this would be a very helpful ingredient along with other control and governance measures; (C) Again, I’m presupposing here that we succeed in making superintelligent AI oracles that always give honest and non-manipulative answers; (D) I’m not saying we should outlaw all AI agents, just that we should outlaw world-modeling AGI agents. Narrow-AI robots and automated systems are fine. (I’m not sure exactly how that line would be drawn.)]
Finally, one more thing:
5. Maybe superintelligent oracle AGI is “a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted...I don’t think there is a strong case for thinking much further ahead than that.” (copying from this Paul Christiano post). I hate this argument. It’s a cop-out. It’s an excuse to recklessly plow forward with no plan and everything at stake. But I have to admit, it seems to have a kernel of truth...
- ↩︎
See Paul’s research agenda FAQ section 0.1 for things that act-based agents are unlikely to be able to do.
- Where are people thinking and talking about global coordination for AI safety? by 22 May 2019 6:24 UTC; 94 points) (
- Six AI Risk/Strategy Ideas by 27 Aug 2019 0:40 UTC; 69 points) (
- Solving the whole AGI control problem, version 0.0001 by 8 Apr 2021 15:14 UTC; 63 points) (
- Self-Supervised Learning and AGI Safety by 7 Aug 2019 14:21 UTC; 29 points) (
- Thoughts on implementing corrigible robust alignment by 26 Nov 2019 14:06 UTC; 26 points) (
- If AI is based on GPT, how to ensure its safety? by 18 Jun 2020 20:33 UTC; 20 points) (
- 28 Apr 2023 15:43 UTC; 7 points) 's comment on AI doom from an LLM-plateau-ist perspective by (
- 12 Oct 2019 1:07 UTC; 5 points) 's comment on Thoughts on “Human-Compatible” by (
- 4 Aug 2020 19:36 UTC; 4 points) 's comment on Three mental images from thinking about AGI debate & corrigibility by (
- 5 Mar 2021 20:23 UTC; 4 points) 's comment on Book review: “A Thousand Brains” by Jeff Hawkins by (
- 11 Dec 2020 11:44 UTC; 3 points) 's comment on Richard Ngo’s Shortform by (
- Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe? by 9 Aug 2020 17:17 UTC; 2 points) (
- 5 Sep 2019 1:09 UTC; 1 point) 's comment on Self-supervised learning & manipulative predictions by (
It is a bright line in one sense, but it has the problem that humans remaining technically in the loop may not make much of a difference in practice. From “Disjunctive Scenarios of Catastrophic AI Risk”:
You made a lot of points, so I’ll be relatively brief in addressing each of them. (Taking at face value your assertion that your main goal is to start a discussion.)
1. It’s interesting to consider what it would mean for an Oracle AI to be good enough to answer extremely technical questions requiring reasoning about not-yet-invented technology, yet still “not powerful enough for our needs”. It seems like if we have something that we’re calling an Oracle AI in the first place, it’s already pretty good. In which case, it was getting to that point that was hard, not whatever comes next.
2. If you actually could make an Oracle that isn’t secretly an Agent, then sure, leveraging a True Oracle AI would help us figure out the general coordination problem, and any other problem. That seems to be glossing over the fact that building an Oracle that isn’t secretly an Agent isn’t actually something we know how to go about doing. Solving the “make-an-AI-that-is-actually-an-Oracle-and-not-secretly-an-Agent Problem” seems just as hard as all the other problems.
3. I … sure hope somebody is taking seriously the idea of a dictator AI running CEV, because I don’t see anything other than that as a stable (“final”) equilibrium. There are good arguments that a singleton is the only really stable outcome. All other circumstances will be transitory, on the way to that singleton. Even if we all get Neuralink implants tapping into our own private Oracles, how long does that status quo last? There is no reason for the answer to be “forever”, or even “an especially long time”, when the capabilities of an unconstrained Agent AI will essentially always surpass those of an Oracle-human synthesis.
4. If the Oracle isn’t allowed to do anything other than change pixels on the screen, then of course it will do nothing at all, because it needs to be able to change the voltages in its transistors, and the local EM field around the monitor, and the synaptic firings of the person reading the monitor as they react to the text … Bright lines are things that exist in the map, not the territory.
5. I’m emotionally sympathetic to the notion that we should be pursuing Oracle AI as an option because the notion of a genie is naturally simple and makes us feel empowered, relative to the other options. But I think the reason why e.g. Christiano dismisses Oracle AI is that it’s not a concept that really coheres beyond the level of verbal arguments. Start thinking about how to build the architecture of an Oracle at the level of algorithms and/or physics and the verbal arguments fall apart. At least, that’s what I’ve found, as somebody who originally really wanted this to work out.
Thanks, this is really helpful! For 1,2,4, this whole post is assuming, not arguing, that we will solve the technical problem of making safe and capable AI oracles that are not motivated to escape the box, give manipulative answers, send out radio signals with their RAM, etc. I was not making the argument that this technical problem is easy … I was not even arguing that it’s less hard than building a safe AI agent! Instead, I’m trying to counter the argument that we shouldn’t even bother trying to solve the technical problem of making safe AI oracles, because oracles are uncompetitive.
...That said, I do happen to think there are paths to making safe oracles that don’t translate into paths to making safe agents (see Self-supervised learning and AGI safety), though I don’t have terribly high confidence in that.
Can you find a link to where “Christiano dismisses Oracle AI”? I’m surprised that he has done that. After all, he coauthored “AI Safety via Debate”, which seems to addressed primarily (maybe even exclusively) at building oracles (question-answering systems). Your answer to (3) is enlightening, thank you, and do you have any sense for how widespread this view is and where it’s argued? (I edited the post to add that people going for benevolent dictator CEV AGI agents should still endorse oracle research because of the bootstrapping argument.)
Regarding the comment about Christiano, I was just referring to your quote in the last paragraph, and it seems like I misunderstood the context. Whoops.
Regarding the idea of a singleton, I mainly remember the arguments from Bostrom’s Superintelligence book and can’t quote directly. He summarizes some of the arguments here.
Nitpick: the capabilities of either a) unconstrained Agent AI/s, or b) Artificial Agent-human synthesis, will essentially always surpass those of an Oracle-human synthesis. We might have to work our way up to AIs without humans being more effective.
Maybe; but there also seems to be a general consensus that humans should be kept in the loop when doing any important decisions in general; yet there are also powerful incentives pushing various actors to automate their modern-day autonomous systems. In particular, there are cases where not having a human in the loop is an advantage by itself, because it e.g. buys you a faster reaction time (see high-frequency trading).
From “Disjunctive Scenarios of Catastrophic AI Risk”:
Suppose that you have a powerful government or corporate actor which has been spending a long time upgrading its AI systems to be increasingly powerful, and achieved better and better gains that way. Now someone shows up and says that they shouldn’t make [some set of additional upgrades], because that would push it to the level of a general intelligence, and having autonomous AGIs is bad. I would expect them to do everything in power to argue that no, actually this is still narrow AI, doing these upgrades and keeping the system in control of their operations are fine—especially if they know that failing to do so is likely to confer an advantage to one of their competitors.
The problem is related to one discussed by Goertzel & Pitt (2012): it seems unlikely that governments would ban narrow AI or restrict its development, but there’s no clear dividing line between narrow AI and AGI, meaning that if you don’t restrict narrow AI then you can’t restrict AGI either.
Thank you, those are very interesting references, and very important points! I was arguing that solving a certain coordination problem is even harder than solving a different coordination problem, but I’ll agree that this argument is moot if (as you seem to be arguing) it’s utterly impossible to solve either!
Since you’ve clearly thought a lot about this, have you written up anything about very-long-term scenarios where you see things going well? Are you in the camp of “we should make a benevolent dictator AI implementing CEV”, or “we can make task-limited-AGI-agents and coordinate to never make long-term-planning-AGI-agents”, or something else?
No idea. :-)
My general feeling is that having an opinion on the best course of approach would require knowing what AGI and the state of the world will be like when it is developed, but we currently don’t know either.
Lots of historical predictions about coming problems have been rendered completely irrelevant because something totally unexpected happened. And the other way around; it would have been hard for people to predict the issue of computer viruses before electricity had been invented, and harder yet to think about how to prepare for it. That might be a bit of exaggeration—our state of understanding about AGI is probably better than the understanding that pre-electric people would have had of computer viruses—but it still feels impossible to effectively reason about at the moment.
My preferred approach is to just have people pursue many different kinds of basic research on AI safety, understanding human values, etc., while also engaging with near-term AI issues so that they get influence in the kinds of organizations which will eventually make decisions about AI. And then hope that we figure out something once the picture becomes clearer.
It does seem that regulation of AI, should it become necessary, basically has to take the form of regulating access to computer chips. Supercomputers (and server farms) are relatively expensive. You can’t make your own in your basement. Production is centralized at a few locations and so it would not be terribly difficult to track who they’re sold to. They also use lots of electricity, making it easier to track down people who have acquired lots of them illicitly.
I think it’s likely that the computing power required for dangerous AGI will remain at a level well above what most people or non-AI businesses will need for their normal activities, at least up until transformative AI has become widespread. So putting strict limits on chip access would allow goverments to severely cripple AI research, without rolling back the narrow-AI tech we’ve already developed and without looking over every programmer’s shoulder to make sure they don’t code up a neural net.
(A plan like this could also backfire by creating a large hardware overhang and contributing to a fast takeoff.)
What does it take for something to qualify as agent AI?
Consider something like Siri. Suppose you could not only ask for information (“What will the weather be like today?”), but you could also ask for action (“Call 911/the hospital”). Does this cross the line from “Oracle” to “Agent”?
Maybe there are other definitions, but the way I’m using the term, what you described would definitely be an agent. An oracle probably wouldn’t have an internet connection at all, i.e. it would be “boxed”. (The box is just a second layer of protection … The first layer of protection is that a properly-designed safe oracle, even if it had an internet connection, would choose not to use it.)