Holden Karnofsky’s Singularity Institute Objection 2
The sheer length of GiveWell co-founder and co-executive director Holden Karnofsky’s excellent critique of the Singularity Institute means that it’s hard to keep track of the resulting discussion. I propose to break out each of his objections into a separate Discussion post so that each receives the attention it deserves.
Objection 2: SI appears to neglect the potentially important distinction between “tool” and “agent” AI.
Google Maps is a type of artificial intelligence (AI). It is far more intelligent than I am when it comes to planning routes.
Google Maps—by which I mean the complete software package including the display of the map itself—does not have a “utility” that it seeks to maximize. (One could fit a utility function to its actions, as to any set of actions, but there is no single “parameter to be maximized” driving its operations.)
Google Maps (as I understand it) considers multiple possible routes, gives each a score based on factors such as distance and likely traffic, and then displays the best-scoring route in a way that makes it easily understood by the user. If I don’t like the route, for whatever reason, I can change some parameters and consider a different route. If I like the route, I can print it out or email it to a friend or send it to my phone’s navigation application. Google Maps has no single parameter it is trying to maximize; it has no reason to try to “trick” me in order to increase its utility.
In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.
Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an “agent mode” (as Watson was on Jeopardy!) but all can easily be set up to be used as “tools” (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)
The “tool mode” concept is importantly different from the possibility of Oracle AI sometimes discussed by SI. The discussions I’ve seen of Oracle AI present it as an Unfriendly AI that is “trapped in a box”—an AI whose intelligence is driven by an explicit utility function and that humans hope to control coercively. Hence the discussion of ideas such as the AI-Box Experiment. A different interpretation, given in Karnofsky/Tallinn 2011, is an AI with a carefully designed utility function—likely as difficult to construct as “Friendliness”—that leaves it “wishing” to answer questions helpfully. By contrast with both these ideas, Tool-AGI is not “trapped” and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching “want,” and so, as with the specialized AIs described above, while it may sometimes “misinterpret” a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.
Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.” An “agent,” by contrast, has an underlying instruction set that conceptually looks like: “(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A.” In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the “tool” version rather than the “agent” version, and this separability is in fact present with most/all modern software. Note that in the “tool” version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter—to describe a program of this kind as “wanting” something is a category error, and there is no reason to expect its step (2) to be deceptive.
I elaborated further on the distinction and on the concept of a tool-AI in Karnofsky/Tallinn 2011.
This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing “Friendly AI” is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on “Friendliness theory” moot. Among other things, a tool-AGI would allow transparent views into the AGI’s reasoning and predictions without any reason to fear being purposefully misled, and would facilitate safe experimental testing of any utility function that one wished to eventually plug into an “agent.”
Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work, given that practically all software developed to date can (and usually does) run as a tool and given that modern software seems to be constantly becoming “intelligent” (capable of giving better answers than a human) in surprising new domains. In addition, it intuitively seems to me (though I am not highly confident) that intelligence inherently involves the distinct, separable steps of (a) considering multiple possible actions and (b) assigning a score to each, prior to executing any of the possible actions. If one can distinctly separate (a) and (b) in a program’s code, then one can abstain from writing any “execution” instructions and instead focus on making the program list actions and scores in a user-friendly manner, for humans to consider and use as they wish.
Of course, there are possible paths to AGI that may rule out a “tool mode,” but it seems that most of these paths would rule out the application of “Friendliness theory” as well. (For example, a “black box” emulation and augmentation of a human mind.) What are the paths to AGI that allow manual, transparent, intentional design of a utility function but do not allow the replacement of “execution” instructions with “communication” instructions? Most of the conversations I’ve had on this topic have focused on three responses:
Self-improving AI. Many seem to find it intuitive that (a) AGI will almost certainly come from an AI rewriting its own source code, and (b) such a process would inevitably lead to an “agent.” I do not agree with either (a) or (b). I discussed these issues in Karnofsky/Tallinn 2011 and will be happy to discuss them more if this is the line of response that SI ends up pursuing. Very briefly:
The idea of a “self-improving algorithm” intuitively sounds very powerful, but does not seem to have led to many “explosions” in software so far (and it seems to be a concept that could apply to narrow AI as well as to AGI).
It seems to me that a tool-AGI could be plugged into a self-improvement process that would be quite powerful but would also terminate and yield a new tool-AI after a set number of iterations (or after reaching a set “intelligence threshold”). So I do not accept the argument that “self-improving AGI means agent AGI.” As stated above, I will elaborate on this view if it turns out to be an important point of disagreement.
I have argued (in Karnofsky/Tallinn 2011) that the relevant self-improvement abilities are likely to come with or after—not prior to—the development of strong AGI. In other words, any software capable of the relevant kind of self-improvement is likely also capable of being used as a strong tool-AGI, with the benefits described above.
The SI-related discussions I’ve seen of “self-improving AI” are highly vague, and do not spell out views on the above points.
Dangerous data collection. Some point to the seeming dangers of a tool-AI’s “scoring” function: in order to score different options it may have to collect data, which is itself an “agent” type action that could lead to dangerous actions. I think my definition of “tool” above makes clear what is wrong with this objection: a tool-AGI takes its existing data set D as fixed (and perhaps could have some pre-determined, safe set of simple actions it can take—such as using Google’s API—to collect more), and if maximizing its chosen parameter is best accomplished through more data collection, it can transparently output why and how it suggests collecting more data. Over time it can be given more autonomy for data collection through an experimental and domain-specific process (e.g., modifying the AI to skip specific steps of human review of proposals for data collection after it has become clear that these steps work as intended), a process that has little to do with the “Friendly overarching utility function” concept promoted by SI. Again, I will elaborate on this if it turns out to be a key point.
Race for power. Some have argued to me that humans are likely to choose to create agent-AGI, in order to quickly gain power and outrace other teams working on AGI. But this argument, even if accepted, has very different implications from SI’s view.
Conventional wisdom says it is extremely dangerous to empower a computer to act in the world until one is very sure that the computer will do its job in a way that is helpful rather than harmful. So if a programmer chooses to “unleash an AGI as an agent” with the hope of gaining power, it seems that this programmer will be deliberately ignoring conventional wisdom about what is safe in favor of shortsighted greed. I do not see why such a programmer would be expected to make use of any “Friendliness theory” that might be available. (Attempting to incorporate such theory would almost certainly slow the project down greatly, and thus would bring the same problems as the more general “have caution, do testing” counseled by conventional wisdom.) It seems that the appropriate measures for preventing such a risk are security measures aiming to stop humans from launching unsafe agent-AIs, rather than developing theories or raising awareness of “Friendliness.”
One of the things that bothers me most about SI is that there is practically no public content, as far as I can tell, explicitly addressing the idea of a “tool” and giving arguments for why AGI is likely to work only as an “agent.” The idea that AGI will be driven by a central utility function seems to be simply assumed. Two examples:
I have been referred to Muehlhauser and Salamon 2012 as the most up-to-date, clear explanation of SI’s position on “the basics.” This paper states, “Perhaps we could build an AI of limited cognitive ability — say, a machine that only answers questions: an ‘Oracle AI.’ But this approach is not without its own dangers (Armstrong, Sandberg, and Bostrom 2012).” However, the referenced paper (Armstrong, Sandberg and Bostrom 2012) seems to take it as a given that an Oracle AI is an “agent trapped in a box”—a computer that has a basic drive/utility function, not a Tool-AGI. The rest of Muehlhauser and Salamon 2012 seems to take it as a given that an AGI will be an agent.
I have often been referred to Omohundro 2008 for an argument that an AGI is likely to have certain goals. But this paper seems, again, to take it as given that an AGI will be an agent, i.e., that it will have goals at all. The introduction states, “To say that a system of any design is an ‘artificial intelligence’, we mean that it has goals which it tries to accomplish by acting in the world.” In other words, the premise I’m disputing seems embedded in its very definition of AI.
The closest thing I have seen to a public discussion of “tool-AGI” is in Dreams of Friendliness, where Eliezer Yudkowsky considers the question, “Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn’t need to be Friendly. It wouldn’t need any goals at all. It would just answer questions.” His response:
To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck “answers” to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are “improbable” relative to random organizations of the AI’s RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out.
This passage appears vague and does not appear to address the specific “tool” concept I have defended above (in particular, it does not address the analogy to modern software, which challenges the idea that “powerful optimization processes” cannot run in tool mode). The rest of the piece discusses (a) psychological mistakes that could lead to the discussion in question; (b) the “Oracle AI” concept that I have outlined above. The comments contain some more discussion of the “tool” idea (Denis Bider and Shane Legg seem to be picturing something similar to “tool-AGI”) but the discussion is unresolved and I believe the “tool” concept defended above remains essentially unaddressed.
In sum, SI appears to encourage a focus on building and launching “Friendly” agents (it is seeking to do so itself, and its work on “Friendliness” theory seems to be laying the groundwork for others to do so) while not addressing the tool-agent distinction. It seems to assume that any AGI will have to be an agent, and to make little to no attempt at justifying this assumption. The result, in my view, is that it is essentially advocating for a more dangerous approach to AI than the traditional approach to software development.
- 13 May 2012 21:18 UTC; -1 points) 's comment on Practical tools and agents by (
In my opinion Karnofsky/Tallinn 2011 is required reading for this objection. Here is Holden’s pseudocode for a tool:
Jaan mentions that prediction_function seems to be a too-convenient rug to sweep details under. They discuss some wrappers around a tool and what those might do. In particular here are a couple of Jaan’s questions with Holden’s responses (paraphrased):
As we’ve seen from decision theory posts on LW, prediction_function has some very tricky questions around whether it’s thinking about the counterfactual “if do(humans implement $action) then $outcome” or “if do(this computation outputs $action) then $outcome” or others, and other odd questions like whether it considers inferences about the state of the world given that humans eventually take $action, etc. etc. I feel like moving to a tool doesn’t get rid of these problems for prediction_function.
At least in the original post, I don’t think Holden’s point is that tool-AI is much easier than agent-AI (though he seems to have intuition that it is), but that it’s potentially much safer (largely from increased feedback), and thus that it deserves more investigation (and that it’s a bad sign of SIAI that it’s neglected this approach).
Yes, good point. The objection is about SI not addressing tool-AI, much of their discussion is about addressing tool-AI, not the meta “why isn’t this explicitly called out by SI?” In particular the intuitions Holden has as responses to those questions, that we may well be able to create extremely useful general AI without creating general AI that can improve itself, do seem like they have received too little in-depth discussion here. We’ve often mentioned the possibility and often decided to skip the question because it’s very hard to think about but I don’t recall many really lucid conversations trying to ferret out what it would look like if more-than-narrow, less-than-having-the-ability-to-self-improve AI were a large enough target to reliably hit.
(as an aside, I think as Holden thinks of it, tool-AI could self improve, but because it’s tool like and not agent-like, it would not automatically self improve. Its outputs could be of the form “I would decide to rewrite my program with code X”, but humans would need to actually implement these changes.)
The LW-esque brush-off version, but there is some truth to it.
utility_function = construct_utility_function(“peace on earth”);
...
report($leading_action);
$ ‘press Enter for achieving peace’
2 year later… Crickets and sunshine. Only clrickets.
Agency has strong incentives due to real world advantages such as speed and capability. Automatic cars, high frequency trading algorithms, airplane autopilots, guided missiles, security systems, assisted living robots and personal enhancements like pacemakers… computers are better than humans at a sufficiently broad set of abilities that we will entrust more and more to them in the form of agency as time goes on. Tools won’t remain mere tools for long.
Tools can be used to build agents.
I’m not sure the distinction between “tool AI” and “agent AI” is that important when thinking of the existential risks of AI; as has already been said (by Holden Karnofsky himself!), a tool AI can be relatively easily turned into an agent AI.
We already have tools like Google search, but we also already have agents like scripts that check the status of servers and send mail under certain conditions, or the robot that cleans my living room. So when we get the technology for a tool AGI, we’ll have the technology for an agent AGI. And we won’t have less demand for agents than we already have now.
I cannot fathom how this is any more than a distraction from the hardline reality that when human beings gain the ability to manufacture “agent AI”, we WILL.
Any number of companies and/or individuals can ethically choose to focus on “tool AI” rather than “agent AI”, but that will never erase the inevitable human need to create that which it believes and/or knows it can create.
In simple terms, SI’s viewpoint (as I understand it) is that “agent AI’s” are inevitable.… some group or individual somewhere at some point WILL produce the phenomenon, if for no other reason than because it is human nature to look behind the curtain no matter what the inherent risks may be. History has no shortage of proof in support of this truth.
SI asserts that (again, as I understand it) it is imperative for someone to at least attempt to create a friendly “agent AI” FIRST, so there is at least a chance that human interests will be part of the evolving equation… an equation that could potentially change too quickly for humans to assume there will be time for testing or second chances.
I am not saying I agree with SI’s stance, but I don’t see how an argument that SI should spend time, money and energy on a possible alternative to “agent AI” is even relevant when the point is explicitly that it doesn’t matter how many alternatives there are nor how much more safe they may be to humans; “agent AI” WILL happen at some point in the future and its impact should be addressed, even if our attempts at addressing those impacts are ultimately futile due to unforseen developments.
Try applying Karnofsky’s style of argument above to the creation of the atomic bomb. Using the logic of this argument in a pre-atomic world, one would simply say, “It will be fine so long as we all agree NOT to go there. Let’s work on something similar, but with less destructive force,” and expecting this to stop the scientists of the world from proceeding to produce an atomic bomb. Once the human mind becomes aware of the possibility of something that was once considered beyond comprehension, it will never rest until it has been achieved.
Is this true though? As far as we’re aware, cobalt bombs and planet cracking nukes for example have not been built as far as anyone can tell.
I agree that agent AI doesn’t look like those two, in that both of those naturally require massive infrastructures and political will, whereas an agent AI once computers are sufficiently powerful should only require information of how to do it.
You caught me… I tend to make overly generalized statements. I am working on being more concise with my language, but my enthusiasm still gets the best of me too often.
You make a good point, but I don’t necessarily see the requirement of massive infrastructures and political will as the primary barriers for achieving such goals. As I see it, any idea, no matter how grand/costly, is achievable so long as a kernel exists at the core of that idea that promises something “priceless”, either spritually, intellectually, materially, etc. For example, a “planet cracking nuke” can only have one outcome, the absolute end to our world. There is no possible scenario imaginable where cracking the planet apart would benefit any group or individual. (Potentially, in the future, there could be benefits to cracking apart a planet that we did not actually live on, but in the context of the here and now, a planet cracking nuke holds no kernel, no promise of something priceless.
AI fascinates because, no matter how many horrorific outcomes the human mind can conceive of, there is an unshakable sense that AI also holds the key to unlocking answers to questions humanity has sought from the beginning of thought itself. That is a rather large kernel and it is never going to go dim, despite the very real OR the absurdly unlikely risks involved.
So, it is this kernel of priceless return at the core of an “agent AI” that, for me, makes its eventual creation a certainty along a long enough timeline, not a likelihood ratio.
My layman’s understanding of the SI position is as follows:
Many different kinds of AIs are possible, and humans will keep building AIs of different types and power to achieve different goals.
Such attempts have a chance of becoming AGI strong enough to reshape the world, and AGI further has a chance of being uFAI. It doesn’t matter whether the programmers’ intent matches the result in these cases, or what the exact probabilities are. It matters only that the probability of uFAI of unbounded power is non-trivial (pick your own required minimum probability here).
The only way to prevent this is to make a FAI that will expand its power to become a singleton, preventing any other AI/agent from gaining superpowers in its future light cone. Again, it doesn’t matter if there is a chance of failure in this mission, as long as success is likely enough (pick your required probability, but I think 10% has already been suggested as sufficient in this thread).
Making a super-powerful FAI will of course also solve a huge number of other problems humans have, which is a nice bonus.
I think this is a good critique, but I don’t think he explains his idea very well. I’m not positive I’ve understood it, but I’ll give my best understanding.
A tool AGI is something like a pure a decision theory with no attached utility function. It is not an agent with a utility function for answering questions or something like that. The decision theory is a program which takes a set of inputs including utility values, and returns an action. An ultra simple example might be an argmax function which takes a finite set of utility values for a finite set of action options and returns the option number that has the highest value. It doesn’t actually do the action, it just returns the number; it’s up to you to use the output.
Each time you run it, you provide a new set of inputs, these might include utility values, a utility function, a description of some problem, copy of its own source etc. Each time you run the program, it returns a decision. If you run the program twice with the same inputs it will return the same outputs. It’s up to the humans to use that decision. Some of the actions that the decision theory returns will be things like “rewrite tool AI program with code Y (to have feature X)”, but it’s still up to the humans to execute this task. Other actions will be “gather more information” operations perhaps even on a really low level like “read the next bit in the input stream”.
Now all this sounds hopelessly slow to me and potentially impossible for other reasons too, but I haven’t seen much discussion of ideas like this, and it seems like it deserves investigation.
As I see it (which may not be Holden’s veiw), the main benefit of this approach is that there is dramatically more feedback to the human designers. Agent-AI could easily be a one chance sort of thing, making it really easy to screw up. Increased feedback seems like a really good thing.
I’m not sure how do you distinguish the tool AGI from either a narrow AI or an agent with a utility function for answering questions (=an Oracle)?
If it solves problems of a specific kind (e.g., Input: GoogleMap, point A, point B, “quickest path” utility function; Output: the route) by following well-understood algorithms for solving these kinds of problems, then it’s obviously a narrow AI.
If it solves general problems, then its algorithm must have the goal to find the action that maximizes the input utility, and then it’s an Oracle.
Lets taboo the words narrow AI, AGI, Oracle, as I think they’re getting in the way.
Lets say you’ve found a reflective decision theory and found a pretty good computational approximation. You could go off and try to find the perfect utility function and link the two together and press “start”, this is what we normally imagine doing.
Alternatively, you could code up the decision theory and run it for “one step” with a predefined input and set of things in memory (which might include an approximate utility function or a set of utility values for different options etc.) and see what it outputs as the next action to take. Importantly, the program doesn’t do anything that it thinks of as in its option set (like “rewrite your code so that it’s faster” or “turn on sensor X”, or “press the blue button”), it just returns which option it deems best. You take this output and do what you want with it: maybe use it, discard it or maybe re run the decision theory with modified inputs. One of its outputs might be “replace my program with code X because it’s smarter” (so I don’t think it’s useful to call it narrow AI”), but it doesn’t automatically replace its code as such.
I don’t understand what you mean by “running a decision theory for one step”. Assume you give the system a problem (in the form of a utility function to maximize or a goal to achieve), and ask to find the best next action to take. This makes the system an Agent with the goal of finding the best next action (subject to all additional constraints you may specify, like maximal computation time, etc). If the system is really intelligent, and the problem (of finding the best next action) is hard so the system needs more resources, and there is any hole anywhere in the box, then the system will get out.
Regarding self-modification, I don’t think it is relevant to the safety issues by itself. It is only important in that using it, the system may become very intelligent very fast. The danger is intelligence, not self-modification. Also, a sufficiently intelligent program may be able to create and run a new program without your knowledge or consent, either by simulating an interpreter (slow, but if the new program makes exponential time-saving, this would still make a huge impact), or by finding and exploiting bugs in its own programming.
I made a similar point here. My conclusion: in theory, you can have a recursively self-improving tool without “agency”, and this is possibly even easier to do than “agency”. My design is definitely flawed but it’s a sketch for what a recursively self-improving tool would look like.
But they may think they have a theory of friendliness that works, but actually creates a UFAI. If SI already had code that could be slapped on that creates friendliness, this type of programmer would use it.
I don’t think Holden Karnofsky is too familiar with the topic of machine intelligence. This seems to be rather amateurish rambling to me.
“Tools”—as Karnofsky defines them—have no actuators besides their human displays and they keep humans “in the loop”. They are thus intrinsically slow and impotent. The future won’t belong to the builders of the slow and impotent machines—so: there’s a big motivation to cut the humans out of the loop using automation—and give the machines more potent actuators. This process has already been taking place on a large scale during industrial automation.
Also, this idea has been discussed extensively in the form of “Oracle AI”. Holden should have searched for that.
So what? He may be an amateur, but he is very clearly a highly intelligent person who has worked hard to understand SI’s position. SI is correct to acknowledge as a flaw that no part of their published writings addresses what he is saying for a nonspecialist like him.
Holden should update here, IMO. One of the lessons is probably to not criticise others reagrding complex technical topics when you don’t really understand them. Holden’s case doesn’t really benefit from technical overstatements, IMO—especially muddled ones.
Even assuming that you are right, SI should write more clearly, to make it for people like Holden easier to update.
If you try to communicate an idea, and even intelligent and curious people get it wrong, something is wrong with the message.
String theory seems to be a counter example. That’s relevant since machine intelligence is a difficult topic.
Matt’s right, Karnofsky does think tools are distinct from Oracles. But I agree with your main point: my first thought was “you can make an algorithmic stock trader ‘tool AI’ that advises humans, but it’ll get its card punched by the fully automated millisecond traders already out there.”
Will it? Human traders still exist right? If they can still make money then ones with a smart adviser would make more money.
Yes; what we now see are the HFT realm where only algorithms can compete; and the realm 6-12 orders of magnitude or so slower where human+tool AI symbiotes still dominate. Of course, HFT is over half of equity trading volume these days, and seems to be still growing—both in absolute numbers, and as a proportion of total trading. I’d guess that human+tool AI’s scope of dominance is shrinking.
ooh. HFT gives a great deal of perspective on my comment on QA->tool->daemon—HFT is daemon-level programs achieving results many consider unFriendly.
Did you read the whole post? He talks about that and thinks Oracle AIs are distinct from tools.
Which is a mistake; at least, I’ve been reading about Oracle AIs for at least as long as him and have read the relevant papers, and I had the distinct impression that Oracle AIs were defined as AIs with a utility function of answering questions which take undesired actions anyway. He’s just conflating Oracle AIs with AI-in-a-box, which is wrong.
This is not what he’s talking about either. He thinks of “utility function of answering questions” as an AI-in-a-box, and different from a Tool AI.
I think his notion is closer (I still don’t know exactly what he means, but I am pretty sure your summary is not right) to a pure decision theory program, you give it a set of inputs and it outputs what it would do in this situation. For example, an ultra simple version of this might be you input a finite number of option utilities and it does an argmax to find the biggest one, returning the option number. This would not be automatically self improving, because each time an action has to be taken, humans have to take that action. Even for thinks like “turn on sensors” or “gather more information”.
There’s no utility function involved in the program.
Go back and look at what he wrote:
The ‘different interpretation’ of 2011 is the standard interpretation. Holden is the only one who thinks the standard interpretation is actually a UFAI-in-a-box. If you don’t believe me, go back into old materials.
For example, this 2007 email by Eliezer replying to Tim Freeman. Is what Tim says that is described by Eliezer as applying to Oracle AI consistent with what I claim is “Oracle AI”, or with what Holden claims it is? Or Stuart Armstrong, or Vladimir alluding to Bostrom & Yudkowsky, or Peter McCluskey.
Oracle AI means an AI with a utility function of answering questions. It does not mean an AI with any utility function inside a box. Case closed.
I think we’re misunderstanding each other. You seemed to think that this “He talks about that and thinks Oracle AIs are distinct from tools.” was a mistake.
I understand Holden to be trying to invent a new category of AI, called “tool-AI”, which is not just an AGI with a utility function for answering questions nor a UFAI in a box (he may be wrong about which definition/interpretation is more popular, but that’s mostly irrelevant to his claim because he’s just trying to distinguish his idea from these other ideas). He claims that this category has not been discussed much.
He says “Yes, I agree AI’s with utility functions for answering questions will do terrible things just like UFAI in a box, but my idea is qualitatively different either of these, and it hasn’t been discussed”.
Are we still talking past each other?
Probably not.
His definition of ‘tool’ is pretty weird. This is an ordinary dictionary word. Why give it a bizarre, esoteric meaning? Yes, I did read where Holden mentioned Oracles.
AI researchers in general appear to think this of SIAI, per the Muelhauser-Wang dialogue—they just don’t consider its concerns are worth taking seriously. That dialogue read to me like “I’m trying to be polite here, but you guys are crackpots.” This was a little disconcerting to see.
In other words, Wang has made a System-1 evaluation of SI, on a topic where System-1 doesn’t work very well.
Downsides of technology deserve taking seriously—and are taken seriously by many researchers—e.g. here.
Remember: “With great power comes great responsibility”.
Tool-AGI seems to be an an incoherent concept. If a Tool simply solves a given set of problems in the prespecified allowed ways (only solves GoogleMap problems, takes its existing data set as fixed, and has some pre-determined, safe set of simple actions it can take), then it’s a narrow AI.
If an AGI is able to understand which of the intermediate outcomes an action may cause are important for people, and to summarize this information in a user-friendly manner, then building such AGI is FAI-complete.
I don’t think the concept is incoherent. What do you think of my more specific suggestion. Holden’s idea seems sufficiently different from other ideas that I don’t don’t think arguing about whether it is AGI or narrow AI is very useful.
My apologies for arriving late to the thread, since I got linked to this one from Reddit and don’t have a back-log file of LessWrong’s various AI threads.
We should be formulating this post as a question: does there exist a difference, in reality, between cognition and optimization? Certainly, as anyone acquainted with the maths will point out, you can, theoretically, construct an optimization metric characterizing any form of cognition (or at least, any computational cognition that an AI could carry out) you may care about. This does not mean, however, that the cognition itself requires or is equivalent to optimizing some universe for some utility metric. After all, we also know from basic programming-language theory that any computation whatsoever can be encoded (albeit possibly at consider effort) as a pure, referentially transparent function from inputs to outputs!
What we do know is that optimization is the “cheapest and easiest” way of mathematically characterizing any and all possible forms of computational cognition. That just doesn’t tell us anything at all about any particular calculation or train of thought we may wish to carry out, or any software we might wish to build.
For instance, we can say that AIXI or Goedel Machines, if they were physically realizable (they’re not), would possess “general intelligence” because they are parameterized over an arbitrary optimization metric (or, in AIXI’s case, a reward channel representing samples from an arbitrary optimization metric). This means they must be capable of carrying out any computable cognition, since we could input any possible cognition as a suitably encoded utility function. Of course, this very parameterization, this very generality, is what makes it so hard to encode the specific things we might want an agent to do: neither AIXI nor Goedel Machines even possess an accessible internal ontology we could use to describe things we care about!
Which brings us to the question: can we mathematically characterize an “agent” that can carry out any computable cognition, but does not actively optimize at all, instead simply computing “output thoughts” from “input thoughts” and writing them to some output tape?