How much friendliness is enough?
According to Eliezer, making AI safe requires solving two problems:
1) Formalize a utility function whose fulfillment would constitute “good” to us. CEV is intended as a step toward that.
2) Invent a way to code an AI so that it’s mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.
It is obvious to me that (2) must be solved, but I’m not sure about (1). The problem in (1) is that we’re asked to formalize a whole lot of things that don’t look like they should be necessary. If the AI is tasked with building a faster and more efficient airplane, does it really need to understand that humans don’t like to be bored?
To put the question sharply, which of the following looks easier to formalize:
a) Please output a proof of the Riemann hypothesis, and please don’t get out of your box along the way.
b) Please do whatever the CEV of humanity wants.
Note that I’m not asking if (a) is easy in absolute terms, only if it’s easier than (b). If you disagree that (a) looks easier than (b), why?
- 17 Jun 2011 15:04 UTC; 7 points) 's comment on Drive-less AIs and experimentation by (
It strikes me that this is the wrong way to look at the issue.
The problem scenario is if someone, anywhere, develops a powerful AGI that isn’t safe for humanity. How do you stop the invention and proliferation of an unsafe technology? Well, you can either try to prevent anybody from building an AI without authorization; or you can try to make your own powerful friendly AGI before anybody else gets unfriendly AGI. The latter has the advantage that you only have to be really good at technology, you don’t have to enforce an unenforceable worldwide law.
Building an AI that doesn’t want to get out of its box doesn’t solve the problem that somewhere, somebody may build an AI that does want to get out of its box.
...and the disadvantage that you are trying to solve a harder problem.
Yudkowsky recently said that his approach was to make incautious projects look stupid:
This seems to be a form of negative marketing.
How do you know its harder? The first problem (preventing anyone from building an AI) seems to require nothing short of world conquest (or at least setting up some kind of singleton, nothing weaker than that could hope to effectively enforce such a law), and while neither world conquest nor FAI has ever been achieved, more effort has been put into the former, so I would guess it is harder.
What I meant was that the disadvantage of this plan:
...was that the former problem is harder than the latter one.
A machine with safety features is usually somewhat harder to build than one without—it has more components and complexity.
I was not comparing with the difficulty of building a totalitarian government. I was continuing from the last sentence—with my ”...”.
Sorry for misunderstanding you. I agree that making Friendly AI probably is harder than making Unfriendly AI, so if Friendliness is necessary then our only hope is if anyone smart enough to successfully build an AI is also smart enough to see the importance of friendliness.
I think that a is just a special case of a narrow AI.
Like, GAI is dangerous because it can do anything, and would probably ruin this section of the universe for us if its goals were misaligned with ours.
I’m not sure if GAI is needed to do highly domain-specific tasks like a.
Yeah, this looks right. I guess you could rephrase my post as saying that narrow AI could solve most problems we’d want an AI to solve, but with less danger than the designs discussed on LW (e.g. UDT over Tegmark multiverse).
That’s what evolution was saying. Since recently I expect narrow AI developments to be directly on track to an eventual intelligence explosion.
What narrow AI developments do you have in mind?
Who’s ‘evolution’?
Apparently whoever downvoted understood what Vladimir was saying, can you please explain? I can’t parse “what evolution was saying”.
Vladimir’s writing style has high information density, but he leaves the work of unpacking to the reader. In this context “that’s what evolution was saying” seems to be a shorthand for something like:
Evolution optimized for goals that did not necessarily imply general intelligence, nor did evolution ever anticipate creating a general intelligence. Nevertheless a general intelligence appeared as the result of evolution’s optimizations. By analogy we should be not be too sure about narrow AI developments not leading to AGI.
Ah. This seems about right, though I think Vladimir’s statement was denser either denser and/or more ambiguous than usual.
Don’t preemptively refer to anyone who disagrees with you as brainwashed.
It seems I am missing something obvious—to what part, in what way, are you referring to the article? (genuine question, not-a-rant)
(edit)Ok, I got the original wording from the comments below. Stupid me.
I’d be more interested in hearing actual arguments...
It might be worth noting that I often phrase questions as “how would we design an FAI to think about that” not because I want to build an FAI, but because I want the answer to some philosophical question for myself, and phrasing it in terms of FAI seems to be (1) an extremely productive way of framing the problem, and (2) generates interest among those who have good philosophy skills and are already interested in FAI.
ETA: Even if we don’t build an FAI, eventually humanity might have god-like powers, and we’d need to solve those problems to figure out what we want to do.
If you figured out artificial general intelligence that is capable of explosive recursive self-improvement and know how to achieve goal-stability and know how to constrain it then you ought to concentrate on taking over the universe because of the multiple discovery hypothesis and that you can’t expect other humans to be friendly.
Why is this downvoted? Isn’t this one of the central theses of FAI?
Possible reasons:
I implicitly differentiated between AGI in general and the ability to recursively self-improve (which is usually lumped together on LW). I did this on purpose.
I included the ability to constrain such an AGI as a prerequisite to run it. I did this on purpose because friendliness is not enough if the AGI is free to hunt for vast utilities irregardless of tiny probabilities. Even an AGI equipped with perfect human-friendliness might try to hack the Matrix to support 3^^^^3 people rather than just a galactic civilisation. This problem isn’t solved and therefore, as suggested by Yudkowsky, it needs to be constrained using a “hack”.
I used the phrasing “taking over the universe” which is badly received yet factually correct if you got a fooming AI and want to use it to spawn a positive Singularity.
I said that you can’t expect other humans to be friendly which is not the biggest problem, it is stupidity.
I said one “ought” to concentrate on taking over the universe. I said this on purpose to highlight that I actually believe that to be the only sensible thing to do once fooming AI is possible because if you waste too much time with spatiotemporal bounded versions then someone who is ignorant of friendliness will launch one that isn’t constrained that way.
The comment might have been deemed unhelpful because it added nothing new to the debate.
That’s my analysis of why the comment might have initially been downvoted. Sadly most people who downvote don’t explain themselves, but I decided to stop complaining about that recently.
Awesome, thanks for the response. Do you know if there’s been any progress on the “expected utility maximization makes you do arbitrarily stupid things that won’t work” problem?
Though, stupidity is a form of un-Friendliness, isn’t it?
I only found out about the formalized version of that dilemma around a week ago. As far as I can tell it has not been shown that giving in to a Pascal’s mugging scenario would be irrational. It is merely our intuition that makes us believe that something is wrong with it. I am currently far too uneducated to talk about this in detail. What I am worried about is that basically all probability/utility calculations could be put into the same category (e.g. working to mitigate low-probability existential risks), where do you draw the line? You can be your own mugger if you weigh in enough expected utility to justify taking extreme risks.
There’s a formalization I gave earlier that distinguishes Pascal’s Mugging from problems that just have big numbers in them. It’s not enough to have a really big utility; a Pascal’s Mugging is when you have a statement provided by another agent, such that just saying a bigger number (without providing additional evidence) increases what you think your expected utility is for some action, without bound.
This question has resurfaced enough times that I’m starting to think I ought to expand that into an article.
Minor correction: It may need a hack if it remains unsolved.
My actions in this scenario depend on other factors, like how much time I have. If I had reasonable confidence of e.g. a month’s head start over other groups, I’d spend the month trying to work out some way to deter other groups from launching, because I prefer the world where no one launches to the world where I take over. I commented to that effect sometime ago.
I’m not sure about the Riemann hypothesis since there’s a likely chance that RH is undecidable in ZFC. But this might be more safe if one adds a time limit to when one wants the answer by.
But simply in terms of specification I agree that formalizing “don’t get out of your box” is probably easier than formalizing what all of humanity wants.
Why? I know certain people (i.e. Chaitin, who’s a bit cranky in this regard) have toyed around with the idea, but is there any reason to believe it?
Not any strong one. We do know that some systems similar to the integers have their analogs to be false, but for most analogs (such as the finite field case) it seems to be true. That’s very weak evidence for undecidability. However, I was thinking more in contrast to something like the classification of finite simple groups as of 1975 where there was a general program of what to do that had no obvious massive obstructions.
The goal is not to “make an AI friendly” (non-lethal), it’s to make a Friendly AI. That is, not to make some powerful agent that doesn’t kill you (and does something useful), but make an agent that can be trusted with autonomously building the future. For example, a merely non-lethal AI won’t help with preventing UFAI risks.
So it’s possible that some kind of Oracle AI can be built, but so what? And the risk of unknown unknowns remains, so it’s probably a bad idea even if it looks provably safe.
Doesn’t this also apply to provably friendly Friendly AI? Perhaps even more so, given that it is a project of higher complexity.
With FAI, you have a commensurate reason to take the risk.
Sure, but if the Oracle AI is used as a stepping stone towards FAI, then you also have a reason to take the risk.
I guess you could argue that the risk of Oracle + Friendly AI is higher than just going straight for FAI, but you can’t be sure how much the FAI risk could be mitigated by the Oracle AI (or any other type of not-so-powerful / constrained / narrow-domain AI). At least it doesn’t seem obvious to me.
To the extent you should expect it to be useful. It’s not clear in what way it can even in principle help with specifying morality. (See also this thread.)
Assume you have a working halting oracle. Now what? (Actually you could get inside to have infinite time to think about the problem.)
I think he means Oracle as in general powerful question-answer, not as in a halting oracle. A halting oracle could be used to answer many mathematical questions (like the aforementioned Riemann Hypothesis) though.
I know he doesn’t mean a halting oracle. A halting oracle is a well-specified superpower that can do more than real Oracles. The thought experiment I described considers an upper bound on usefulness of Oracles.
I figure we will build experts and forecasters before both oracles and full machine intelligence. That will be good—since forecasters will help to give us foresight—which we badly need.
Generally speaking, replacing the brain’s functions one-at-a-time seems more desirable than replacing them all-at-once. It is likely to result in a more gradual shift, and a smoother transfer—with a reduced chance of the baton getting dropped during the switch over.
If we get a working Oracle AI, couldn’t we just ask it how to build an FAI. I just don’t think this is of much use since the Oracle route doesn’t really seem much easier than the FAI route.
No, it won’t know what you mean. Even you don’t know what you mean, which is part of the problem.
Experts and general forecasters are easier to build than general intelligent agents—or so I argue in my section on Machine Forecasting Implications. That is before we even get to constraints on how we want them to behave.
At a given tech level, if you trying to use an use an general oracle on its own to create a general intelligence would probably produce a less intelligent agent than could be produced by other means, using a broader set of tools. An oracle might well be able to help, though.
If:
(1) There is a way to make an AI that is useful and provably not-unfriendly
(2) This requires a subset of the breakthroughs required for a true FAI
(3) It can be used to provide extra leverage towards building a FAI (i.e. using it to generate prestige and funds for hiring and training the best brains available. How? Start by solving protein folding or something.)
Then this safe & useful AI should certainly be a milestone on the way towards FAI.
Just barely possible, but any such system is also a recipe for destroying the universe, if mixed in slightly different proportions. Which on the net makes the plan wrong (destroy-the-universe wrong).
I just don’t think that this assertion has been adequately backed up.
The primary task that EY and SIAI have in mind for Friendly AI is “take over the world”. (By the way, I think this is utterly foolish, exactly the sort of appealing paradox (like “warring for peace”) that can nerd-snipe the best of us.)
To some extent technolology itself (lithography, for example) is actually Safe technology, (or BelievedSafe technology). As part of the development of the technology, we also develop the safety procedures around it. The questions and problems about “how should you correctly draw up a contract with the devil” come from:
Explicitly pursuing recursive self-improvement, that is, self-modifying code where every potentially limiting component is on the table to be redesigned.
Using a theological-reasoning strategy regarding the fixpoint of the self-modifications.
If you do not pursue no-holds-barred recursive self-improvement so vigorously, then your task of developing a Riemann-Hypothesis-machine doesn’t have to involve theological reasoning at all. Indeed, I’m sure there are many mathematicians and computer scientists who have worked on RH machines, and they have not had problems with their creations running amok.
Could you explain this in more detail?
As I understand it, EY worked through a chain of reasoning about a decade ago, in his book “Creating Friendly AI”. The chain of reasoning is long and I won’t attempt to recap it here, but there are two relevant conclusions.
First, that self-improving artificial intelligences are dangerous, and that projects to build self-improving artificial intelligence, or general intelligence that might in principle become self-modifying (such as Goertzel’s), are increasing existential risk. Second, that the primary defense against self-improving artificial intelligences is a Friendly self-improving artificial intelligence, and so, in order to reduce existential risk, EY must work on developing (a restricted subset of) self-improving artificial intelligence.
This seems nigh-paradoxical (and unnecessarily dramatic) to me—you should not do , and yet EY must do . As I said before, this “cancel infinities against one another” sort of thinking (another example might be MAD doctrine), has enormous appeal to a certain (geeky) kind of person. The phenomenon is named “nerd-sniping” in the xkcd comic: http://xkcd.com/356/
Rather than pursuing Friendly AGI vigorously as last/best/only hope for humanity, we should do at least two things:
Look hard for errors in the long chain of reasoning that led to these peculiar conclusions, on the grounds that reality rarely calls for that kind of nigh-paradoxical action, and it’s far more likely that either all AI development is generally a good thing for existential risks, or all AI development is a generally bad thing for existential risks—EY shouldn’t get any special AI-development license.
Look hard for more choices—for example, building entities that are very capable at defeating rogue Unfriendly AGI takeoffs, and yet which are not themselves a threat to humanity in general, nor prone to hard takeoffs. It may be difficult to imagine such entities, but all the reduce-existential-risk tasks are very difficult.
In my experience, reality frequently includes scenarios where the best way to improve my ability to defend myself involves also improving my ability to harm others, should I decide to do that. So it doesn’t seem that implausible to me.
Indeed, militaries are pretty much built on this principle, and are fairly common.
But, sure… there are certainly alternatives.
I am familiar with the libertarian argument that if everyone has more destructive power, the society is safer. The analogous position would be that if everyone pursues (Friendly) AGI vigorously, existential risk would be reduced. That might well be reasonable, but as far as I can tell, that’s NOT what is advocated.
Rather, we are all asked to avoid AGI research (and go into software development and make money and donate? How much safer is general software development for a corporation than careful AGI research?) and instead sponsor SIAI/EY doing (Friendly) AGI research while SIAI/EY is fairly closed-mouth about it.
It just seems to me like it would take a terribly delicate balance of probabilities to make this the safest course forward.
I have similar misgivings, they prompted me to write the post. Fighting fire with fire looks like a dangerous idea. The problem statement should look like “how do we stop unfriendly AIs”, not “how do we make friendly AIs”. Many people here (e.g. Nesov and SarahC) seem convinced that the latter is the most efficient way of achieving the former. I hope we can find a better way if we think some more.
If the universe is capable of running super-intelligent beings, then eventually either there will be one, or civilization will collapse. Maintaining the current state where there are no minds more intelligent than base humans seems very unlikely to be stable in the long run.
Given that, it seems the problem should be framed as “how do we end up with a super-intelligent being (or beings) that will go on to rearrange the universe the way we prefer?” which is not too different from “how do we make friendly AIs” if we interpret things like recursively-improved uploads as AIs.
The Riemann hypothesis seems like a special case, since it’s a purely mathematical proposition. A real world problem is more likely to require Eliezer’s brand of FAI.
Also, I believe solving FAI requires solving a problem not on your list, namely that of solving GAI. :-)
This was supposed to be humour, right?
OK, that didn’t come across as intended. Edited the post.
It seems to me that human engineers don’t spend a lot of time thinking about the value of boredom or the problem of consciousness when they design airplanes. Why should an AI need to do that? If the answer involves “optimizing too hard”, then doesn’t the injunction “don’t optimize too hard” look easier to formalize than CEV?
“Don’t optimise for too long” looks easier to formalise. Or so I argued here.
Injecting randomness doesn’t look like a property of reasoning that would stand (or, alternatively, support) self-modification. This leaves the option of limiting self-modification (for the same reason), although given enough time and sanity even a system with low optimization pressure could find a reliable path to improvement.
Superintelligence isn’t a goal in itself. I’ll take super-usefulness over superintelligence any day. I know you want to build superintelligence because otherwise someone else will, but the same reasoning was used to justify nuclear weapons, so I suspect we should be looking for other ways to save the world.
(I see you’ve edited your comment. My reply still applies, I think.)
Are you arguing that the USA should not have developed nuclear weapons?
Use of nuclear weapons is often credited with shortening the war—and saving many lives—e.g. see here:
Well, that was what in fact happened. But what could have happened was perhaps a nuclear war leading to “significant curtailment of humankind’s potential”.
cousin_it’s point was that perhaps we should not even begin the arms race.
Consider the Terminator scenario where they send the terminator back in time to fix things, but this sending back of the terminator is precisely what provided the past with the technology that will eventually lead to the cataclysm in the first place.
EDIT: included Terminator scenario
Of course. But super-usefulness unfortunately requires superintelligence, and superintelligence is super-dangerous. Limited intelligence gives only limited usefulness, and in the long run even limited intelligence would tend to improve its capability, so it’s not reliably safe. And not very useful.
Someone will eventually make an intelligence explosion that destroys the world. That would be bad. Any better ideas on how to mitigate the problem?
This is an analogy that you use as an argument? As if we don’t already understand the details of the situation a few levels deeper than is covered by the surface similarity here. In making this argument, you appeal to intuition, but individual intuitions (even ones that turn out to be correct in retrospect or on reflection) are unreliable, and we should do better than that, find ways of making explicit reasoning trustworthy.
Is this not exactly the point that the cousin it is questioning in the OP? I’d think a “limited” intelligence that was capable of solving the Riemann hypothesis might also be capable of cracking some protein-folding problems or whatever.
If it’s that capable, it’s probably also that dangerous. But at this point the only way to figure out more about how it actually is, is to consider specific object-level questions about a proposed design. Absent design, all we can do is vaguely guess.
No. We already have computers that help design better airplanes etc., and they are not dangerous at all. Sewing-Machine’s question is right on.
Building machines that help us solve intelligence-bound problems (even if these problems are related to the real world, like building better airplanes) seems to be massively easier than building machines that will “understand” the existence of the real world and try to take it over for whatever reason. Evidence: we have had much success with the former task, but practically no progress on the latter. Moreover, the latter task looks very dangerous, kinda like nuclear weaponry.
Why do some people become so enamored with the singleton scenario that they can’t settle for anything less? What’s wrong with humans using “smart enough” machines to solve world hunger and such, working out any ethical issues along the way, instead of delegating the whole task to one big AI? If you think you need the singleton to protect you from some danger, what can be more dangerous than a singleton?
It’s potentially dangerous, given the uncertainty about what exactly you are talking about. If it’s not dangerous, go for it.
Settling for something less than a singleton won’t solve the problem of human-indifferent intelligence explosion.
Another singleton, which is part of the danger in question.
There are already computer programs that have solved open problems, e.g. That was a much simpler and less interesting question than the Riemann Hypothesis, but I don’t know that it’s fundamentally different or less dangerous than what cousin it is proposing.
Yes, there are non-dangerous useful things, but we were presumably talking about AI capable of open-ended planning.
Only superficially. It would be possible to create an AI with said properties with CDT.
The difficulty level seems on the same order of magnitude.
This looks suspicious. Imagine you didn’t know about Risch’s algorithm for finding antiderivatives. Would you then consider the problem “find me the antiderivative of this function, and please don’t get out of the box” to be on the same order of difficulty as (b)? Does Wolfram Alpha overturn your worldview? Last I looked, it wasn’t trying to get out...
Not even remotely. I don’t accept the analogy.
Wolfram Alpha isn’t really “in a box” in the first place.
Like most modern machines, its sensors and actuators extend into the real world.
We do restrain machines—but mostly when testing them. Elsewhere, constraints are often considered to be unnecessary expense. If a machine is dangerous, we typically keep humans away from it—and not the other way around.
I’m for ‘bool friendly = true’.
A perfectionist!
Perfectionism is often really bad, though—since it prevents you from getting much done.
No.
Well, that was my way of advising that I thought that your claim was in need of some expanding on. Superficially, modelling what is surely a probability as a boolean does not look like a very sensible thing to do.
Don’t confuse probability with degree. Answering “How much friendliness is enough?” with a probability is a category error.
Fair enough. I think I just lost the right to quiz you about your cryptic comment :-(
Forecasters and oracles typically have a reduced set of options for wiring themselves up to weaponry. However, we do also need robot controllers—and for them, fewer actuators is not much of an option.
Page and Brin appear to have empirically demonstrated that not terribly advanced machines can satisfy their own preferences fairly well today—by acting as a money maximiser with fairly basic constraints to do with obeying the law and not being too nasty.
Actually, I think the ‘don’t be evil’ injunction is meant to apply to the human employees. I’m sorry to disappoint you but I doubt that its actually written into any of their algorithms ;)
Their algorithms fairly evidently do all kinds of non-nasty things, including separating out ads before presenting them to the user, being fair to different sites—and so on.
Of course, it is Google that does the preference satisfaction, not just some algorithms—though obviously, the algorithms are important.