What if AI doesn’t quite go FOOM?
Intro
This article seeks to explore possible futures in a world where artificial intelligence turns out NOT to be able to quickly, recursively self-improve so as to influence our world with arbitrarily large strength and subtlety, i.e, “go FOOM.” Note that I am not arguing that AI won’t FOOM. Eliezer has made several good arguments for why AI probably will FOOM, and I don’t necessarily disagree. I am simply calling attention to the non-zero probability that it won’t FOOM, and then asking what we might do to prepare for a world in which it doesn’t.
Failure Modes
I can imagine three different ways in which AI could fail to FOOM in the next 100 years or so. Option 1 is a “human fail.” Option 1 means we destroy ourselves or succumb to some other existential risk before the first FOOM-capable AI boots up. I would love to hear in the comments section about (a) which existential risks people think are most likely to seriously threaten us before the advent of AI, and (b) what, if anything, a handful of people with moderate resources (i.e., people who hang around on Less Wrong) might do to effectively combat some of those risks.
Option 2 is a “hardware fail.” Option 2 means that Moore’s Law turns out to have an upper bound; if physics doesn’t show enough complexity beneath the level of quarks, or if quantum-sized particles are so irredeemably random as to be intractable for computational purposes, then it might not be possible for even the most advanced intelligence to significantly improve on the basic hardware design of the supercomputers of, say, the year 2020. This would limit the computing power available per dollar, and so the level of computing power required for a self-improving AI might not be affordable for generations, if ever. Nick Bostrom has some interesting thoughts along these lines, ultimately guessing (as of 2008) that the odds of a super-intelligence forming by 2033 was less than 50%.
Option 3 is a “software fail.” Option 3 means that *programming* efficiency turns out to have an upper bound; if there are natural information-theoretical limits on how efficiently a set number of operations can be used to perform an arbitrary task, then it might not be possible for even the most advanced intelligence to significantly improve on its basic software design; the supercomputer would be more than ‘smart’ enough to understand itself and to re-write itself, but there would simply not *be* an alternate script for the source code that was actually more effective.
These three options are not necessarily exhaustive; they are just the possibilities that have immediately occurred to me, with some help from User: JoshuaZ.
“Superintelligent Enough” AI
An important point to keep in mind is that even if self-improving AI faces hard limits before becoming arbitrarily powerful, AI might still be more than powerful enough to effortlessly dominate future society. I am sure my numbers are off by many orders of magnitude, but by way of illustration only, suppose that current supercomputers run at a speed of roughly 10^20 ops/second, and that successfully completing Eliezer’s coherent extrapolated volition project would require a processing speed of roughly 10^36 ops/second. There is obviously quite a lot of space here for a miniature FOOM. If one of today’s supercomputers starts to go FOOM and then hits hard limits at 10^25 ops/second, it wouldn’t be able to identify humankind’s CEV, but it might be able to, e.g, take over every electronic device capable of receiving transmissions, such as cars, satellites, and first-world factories. If this happens around the year 2020, a mini-FOOMed AI might also be able to take over homes, medical prosthetics, robotic soldiers, and credit cards.
Sufficient investments in security and encryption might keep such an AI out of some corners of our economy, but right now, major operating systems aren’t even proof against casual human trolls, let alone a dedicated AI thinking at faster-than-human speeds. I do not understand encryption well, and so it is possible that some plausible level of investment in computer security could, contrary to my assumptions, actually manage to protect human control over individual computers for the foreseeable future. Even if key industrial resources were adequately secured, though, a moderately super-intelligent AI might be capable of modeling the politics of current human leaders well enough to manipulate them into steering Earth onto a path of its choosing, as in Issac Asimov’s The Evitable Conflict.
If enough superintelligences develop at close enough to the same moment in time and have different enough values, they might in theory reach some sort of equilibrium that does not involve any one of them taking over the world. As Eliezer has argued (scroll down to 2nd half of the linked page), though, the stability of a race between intelligent agents should mostly be expected to *decrease* as those agents swallow their own intellectual and physical supply chains. If a supercomputer can take over larger and larger chunks of the Internet as it gets smarter and smarter, or if a supercomputer can effectively control what happens in more and more factories as it gets smarter and smarter, then there’s less and less reason to think that supercomputing empires will “grow” at roughly the same pace—the first empire to grow to a given size is likely to grow faster than its rivals until it takes over the world. Note that this could happen even if the AI is nowhere near smart enough to start mucking about with uploaded “ems” or nanoreplicators. Even in a boringly normal near-future scenario, a computer with even modest self-improvement and self-aggrandizement capabilities might be able to take over the world. Imagine something like the ending to David Brin’s Earth, stripped of the mystical symbolism and the egalitarian optimism.
Ensuring a “Nice Place to Live”
I don’t know what Eliezer’s timeline is for attempting to develop provably Friendly AI, but it might be worthwhile to attempt to develop a second-order stopgap. Eliezer’s CEV is supposed to function as a first-order stopgap; it won’t achieve all of our goals, but it will ensure that we all get to grow up in a Nice Place to Live while we figure out what those goals are. Of course, that only happens if someone develops a CEV-capable AI. Eliezer seems quite worried about the possibility that someone will develop a FOOMing unFriendly AI before Friendly AI can get off the ground, but is anything being done about this besides just rushing to finish Friendly AI?
Perhaps we need some kind of mini-FOOMing marginally Friendly AI whose only goal is to ensure that nothing seizes control of the world’s computing resources until SIAI can figure out how to get CEV to work. Although no “utility function” can be specified for a general AI without risking paper-clip tiling, it might be possible to formulate a “homeostatic function” at relatively low risk. An AI that “valued” keeping the world looking roughly the way it does now, that was specifically instructed *never* to seize control of more than X number of each of several thousand different kinds of resources, and whose principal intended activity was to search for, hunt down, and destroy AIs that seemed to be growing too powerful too quickly might be an acceptable risk. Even if such a “shield AI” were not provably friendly, it might pose a smaller risk of tiling the solar system than the status quo, since the status quo is full of irresponsible people who like to tinker with seed AIs.
An interesting side question is whether this would be counterproductive in a world where Failure Mode 2 (hard limits on hardware) or Failure Mode 3 (hard limits on software) were serious concerns. Assuming that, eventually, a provably friendly AI can be developed, then, several years after that, it’s likely that millions of people can be convinced that it would be really good to activate the provably friendly AI, and humans might be able to dedicate enough resources to specifically overcome the second-order stopgap “shield AI” that was knocking out other people’s un-provably Friendly AIs. But if the shield AI worked too well and got too close to the hard upper bound on the power of an AI, then it might not be possible to unmake the shield, even with added resources and with no holds barred.
- 29 Dec 2010 18:44 UTC; 2 points) 's comment on Tallinn-Evans $125,000 Singularity Challenge by (
One “anti-foom” factor is the observation that in the early stages we can make progress partly by cribbing from nature—and simply copying it. After roughly “human level” is reached, that short-cut is no longer available—so progress may require more work after that.
Valid argument, but we seem to be already making some progress beyond that, by noticing human cognitive biases and trying to correct them.
Carl Shulman has written about that topic. See:
Shulman, Carl M., “Arms Control and Intelligence Explosions.” Proceedings of the European Conference on Computing and Philosophy. Universitat Autònoma de Barcelona, Barcelona, Spain. 4 July 2009.
http://singinst.org/armscontrolintelligenceexplosions.pdf
Great! Thanks for the link.
Shulman has (and cites) some really good ideas, but the question remains: is anything being done about this? Is anyone actually working on developing a protocol for international regulation of artificial intelligence, or on selling the idea of Brin’s Transparent Society to decision-makers, or on programming a shield AI?
I think many in the field see intelligence as positive—and think that intelligent machines will reduce automobile accidents and liberate humans from much tedious drudgery.
Since at the moment, things seem to be proceeding at a managable pace, there seems to be little demand for elaborate braking systems.
Also, those who feel the most need for slowing down are probably those least well placed to influence the rate of progress.
Attempts to prevent a “race to the bottom” are likely to prove ineffective—and seem to be largely misguided. There is bound to be such a race—so, we should exert our resources where there’s a chance of making a difference.
At this stage I would avoid doing that. The more you try to convince those with power that they should make rules against AI the more they are going to think that creating an AI is something they need to do!
I am pretty sure that the government knows the score—or at least that the NSA do.
I’m sure they do. The important thing is what additional promotion can achieve. Will it make a leader think he needs to cooperate with the enemy and make treaties? Or it will make them more inclined to add a trillion or two to the NSA budget?
Yes indeed, the government seems to explore all promising technologies, and even some that are not so promising
Proposed measures seem to be ineffective for prevention of seed AI creation.
Human genome weights ~770 MBytes. This fact suggests that given open research publications in neuroscience, theory of algorithms, etc. one can come up with insight, which will allow small group of researchers to build and run comparatively simple human like ML algorithm. It can be expected that this algorithm is highly parallelizable, and it can be run on a cluster of consumer grade equipment (current GPUs have approximately 2GFLOPS/$). So it’ll cost about 5 to 20M$ (infrastructure overhead included) to run 1 petaflops (for specific tasks) cluster now (BTW Google uses something like this).
Is there other way to prevent such scenario besides worldwide hightech police state?
You seem to be possibly massively underestimating the computational level involved. The size of the human genome is a bad metric to use. In some respects it overestimates since a lot is junk DNA or DNA where changing the sequence slightly will not make a difference (since some amino acids are coded for by multiple codons). But overall, it is a massive underestimate of the computational power needed. The human brain is what needs to be modeled, and that’s a much more complicated object that arises from the interaction of the genome, the environment, and the initial conditions in the egg. Using an argument based on the genome size is at best naive.
Embryo’s environment is rather isolated from external influences, and AFAIK reproduction system development doesn’t depend on information collected from postnatal environment. Thus it must be egg initial conditions, which contain all additional information you speak of. However It is strange that evolution don’t use such massive amount of information for adaptations. Are you sure that Kolmogorov complexity of newborn child is much larger than that of genome?
Edit: Spellcheck.
Kolmogorov complexity might be low, but that’s only the amount needed to specify the entity. Predicting the behavior of the entity is a different question entirely. For example, the string given by f(n)= p(A(n,n)) mod 3 where A(m,k) is the Ackermann function and p(m) is the nth prime number has very low Kolmogorov complexity. (The Turing machine to do this is simple enough that one could if one wanted to write out all the states without much effort). But calculating this function beyond n=3 or so is not feasible. The issue is not just the total information but the degree of interaction. Your estimate ignores how much interaction there is between neurons. As with the example with the Ackermann function, the level of interaction may create something which is computationally intractable even if the specification length is very short.
There are other methods of estimating the computational level necessary to model the human brain that do try to calculate estimates based on neuronal interactions, and they are many orders of magnitude larger than your estimate. If computational power continues to increase exponentially this shouldn’t make that much of a difference (order of 15 to 30 years or so in terms of computational power), but it doesn’t stop your estimate from being far below the correct value.
You are right, I’ve underestimated processing power required (technically I estimated how many processing power can small group raise now, but implication was obvious). Brain’s PP have theoretical upper limit of 10^21- 10^22bit/s, estimated using Landauer’s principle. However there’s indication that brain cells operate orders of magnitude below theoretical limit, it doesn’t factor out other sources of computation power, but it somewhat mitigates my error.
“Small groups of researchers” are surely highly likely to be beaten by larger groups with decent funding and access to lots of training data and fast machines. We are not at the “Wright brothers” stage—that was back in the 1950s.
Want the white hats to get there before the black hats? Make sure they are better funded is the best way, I figure. They are, in fact quite a bit better funded. Effectively, consumers are voting with their feet. Though it is also true that some are “whiter” than others: I for one still pray that machine intelligence will not come from Microsoft.
Attempts to prevent a “race to the bottom” are likely to prove ineffective—and seem to be largely misguided. There is bound to be a race, the issue is which teams to back, and which teams to hinder.
And which information to conceal. Right?
As for “Wright brothers” situation, it’s not so obvious. We have AI methods which work but don’t scale well (theorem provers, semantic nets, expert systems. Not a method, but nevertheless worth mentioning: SHRDLU), we have well scaling methods, which lack generalization power (statistical methods, neural nets, SVMs, deep belief networks, etc.), and yet we don’t know how to put it all together.
It looks like we are going to “Wright stage”, where one will have all equipment to put together and make working prototype.
You got it backwards. These methods have generalization power, especially the SVM (achieving generalization is the whole point of the VC theory on which it’s based), but don’t scale well.
Yes, bad wording on my side. I mean something like capability of representing and operating on complex objects, situations and relations. However it doesn’t invalidate my (quite trivial) point that we don’t have practical theory of AGI yet.
The race participants are the ones with things to conceal, mostly. One could try and incentivise them to reveal things by using something like the patent system—but since machine intelligence is likely to start out as a server-side technology, patents seem likely to be irrelevant—you can just use trade secrets instead, since those have better security, don’t need lawyers to enforce and have no expiration date. I discuss code-hiding issues here:
“Tim Tyler: Closed source intelligent machines”
http://www.youtube.com/watch?v=Fn8Ly9QJF6s
I figure that we are well past the “Wright brothers” stage—in the sense that huge corporations are already involved in exploiting machine intelligence technology—and large sums of money are already being made out of it.
I don’t understand. The difference between server-side and client side is how you use it. It’s just going to be “really powerful technology” and from there it will be ‘server’, ‘client’, a combination of the two, a standalone system or something that does not reasonably fit that category (like Summer Glau).
Server side has enormous computer farms. Client side is mostly desktop and mobile devices—where there is vastly less power, storage and bandwidth available.
The server is like the queen bee—or with the analogy of multicellularity, the server is like the brain of the whole system.
The overwhelming majority of servers actually require less computing power than the average desktop. Many powerful computer farms don’t particularly fit in the category of ‘server’, in particular it isn’t useful to describe large data warehousing and datamining systems using a ‘client-server’ model. That would just be a pointless distraction.
I agree that the first machine intelligence is unlikely to be an iPhone app.
Right, but compare with the Google container data center tour.
I have little sympathy for the idea that most powerful computer farms are not “servers”. It is not right: most powerful computer farms are servers. They run server-side software, and they serve things up to “clients”. See:
http://en.wikipedia.org/wiki/Server_farm
I selected the word majority for a reason. I didn’t make a claim about the outliers and I don’t even make a claim about the ‘average power’.
That is a naive definition of ‘server’. “Something that you can access remotely and runs server software” is trivial enough that it adds nothing at all to our understanding of AIs to say it uses a server.
For comparison just last week I had a task requiring use of one of the servers I rent from some unknown server farm over the internet. The specific task involved automation of a process and required client side software (firefox, among other things). The software I installed and used was all the software that makes up a client. It also performed all the roles of a client. On the list I mentioned earlier that virtual machine is clearly “a combination of the two” and that fact is in no way a paradox. “Client” and “server” are just roles that a machine can take on and they are far from the most relevant descriptions of the machines that will run an early AI.
“Server” is a red herring.
It’s the servers in huge server farms where machine intelligence will be developed.
They will get the required power about 5-10 years before desktops do, and have more direct access to lots of training data.
Small servers in small businesses may be numerous—but they are irrelevant to this point—there seems to be no point in discussing them further.
Arguing about the definition of http://en.wikipedia.org/wiki/Computer_server would seem to make little difference to the fact that most powerful computer farms are servers. Anyhow, if you don’t like using the term “server” in this context, feel free to substitute “large computer farm” instead—as follows:
“machine intelligence is likely to start out as a large computer farm technology”
If nothing else we seem to agree that neither small servers nor iPhones are the likely birthplace of AI. That definitely rules out servers that ARE iPhones!
“Large computer farm” and, for that matter “large server farm” has a whole different meaning to “server-side technology”. I’m going here from using both client and server side technology simultaneously for several automation tools that intrinsically need to take on both those roles simultaniously to seeing the term used to mean essentially ‘requires a whole bunch of computing hardware’. This jumps out to me as misleading.
I don’t think there is much doubt about the kind of hardware that the first machine intelligence is run on. But I would be surprised if I arrive at that conclusion for the same reasons that you do. I think it is highly improbable that the critical theoretical breakthroughs will arrive in a form that makes a mere order of magnitude or two difference in computing power the critical factor for success. But I do know from experience that when crafting AI algorithms the natural tendency is to expand to use all available computational resources.
Back in my postgrad days my professor got us a grant to develop some AI for factory scheduling using the VPAC supercomputer. I had a hell of a lot of fun implementing collaborative agent code. MPI2 with C++ bindings if I recall. But was it necessary? Not even remotely. I swear I could have written practically the same paper using an old 286 and half the run time. But while doing the research I used every clock cycle I could and chafed at the bit wishing I had more.
If someone gets the theoretical progress to make a worthwhile machine intelligence I have no doubt that they will throw every piece of computer hardware at it that they can afford!
Computing power is fairly important:
“more computer power makes solving the AGI design problem easier. Firstly, more powerful computers allow us to search larger spaces of programs looking for good algorithms. Secondly, the algorithms we need to find can be less efficient, thus we are looking for an element in a larger subspace.”
http://www.vetta.org/2009/12/tick-tock-tick-tock-bing/
Those with a server farm have maybe 5-10 years hardware advantage over the rest of us—and they probably have other advantages as well: better funding, like-minded colleagues, etc.
I somewhat agree with what you are saying here. Where we disagree slightly, in a mater of degree and not fundamental structure, is on the relative importance of the hardware vs those other advantages. I suspect the funding, like-minded colleagues and particularly the etc are more important factors than the hardware.
Thanks, it’s interesting, despite I’m not very good at recognition of spoken english, I was unable to decipher robots part in particular.
Nevertheless I doubt that R&D division of single corporation can make all the work, which is nessesary for AGI launch, without open information from scientific community. Thus they can hide details of implementation, but they cannot hide ideas they based their work upon. Going back to Wright brothers, in 1910 there was already industry of internal combustion engines, and Henry Ford was already making money, and aerodynamics made some progress. All in all, I can’t see crucial difference.
The Ford Airplane Company did get in on aeroplanes—but in the 1920s. In 1910 there was no aeroplane business.
For the inventors of machine intelligence, I figure you have to look back to people like Alan Turing. What we are seeing now is more like the ramping up of an existing industrial process. Creating very smart agents is better seen as being comparable to breaking the sound barrier.
This would not be acceptable to me, since I hope to be one of those AIs.
The morals of FAI theory don’t mesh well at all with the morals of transhumanism. This is surprising, since the people talking about FAI are well aware of transhumanist ideas. It’s as if people compartmentalize them and think about only one or the other at a time.
Er, hypothetically would you be willing to wait a decade or so for ordinary humans to erect some safeguards if we could promise you all the medicine you needed to stay healthy? I mean, nothing against your transhumanist aspirations and all, but how am I supposed to distinguish between people who want to become AIs ’cause it’s intrinsically awesome and people who want to become AIs in order to take over the world and bend it to their own sinister, narrow ends?
The people who want to become AIs in order to take over the world and bend it to their own sinister, narrow ends will try to convince people that everyone else’s AIs are dangerous and must be destroyed.
I’m not sure whether you’re kidding.
As a joke, it’s funny.
As a serious rebuttal, I don’t think it works. A shield AI’s code could be made public in advance of its launch, and could verifiably NOT contain anything like the memories, personality, or secret agenda of the programmers. There’s nothing “narrow” about wanting the world to cooperate in enforcing a temporary ban on superintelligent AIs.
Such a desire is, as some other commenters have complained, a bit conservative—but in light of the unprecedented risks (both in terms of geographic region affected and in terms of hard-to-remove uncertainty), I’ll be happy to be a conservative on this issue.
Voted up for sheer balls. You have my backing sir.
Or just disagree with a specific transhumanist moral (or interpretation thereof). If you are growing “too powerful too quickly” the right thing for an FAI (or, for that matter, anyone else) to do is to stop you by any means necessary. A recursively self improving PhilGoetz with that sort of power and growth rate will be an unfriendly singularity. Cease your expansion or we will kill you before it is too late.
How do you infer that? Also, is CEV any better? I will be justly insulted if you prefer the average of all human utility functions to the PhilGoetz utility function.
You’re familiar with CEV so I’ll try to reply with the concepts from Eliezer’s CEV document.
PhilGoetz does not have a framework for a well specified abstract invariant self-modifying goal system. If Phil was “seeming to be growing too powerful too quickly” then quite likely the same old human problems are occurring and a whole lot more besides.
The problem isn’t with your values, CEV, the problem is that you aren’t a safe system for producing a recursively self improving singularity. Humans don’t even keep the same values when you give them power let alone when they are hacking their brains into unknown territory.
When talking about one individual, there is no C in CEV.
I use ‘extrapolated volition’ when talking about the outcome of the process upon an individual. “Coherent Extrapolated Volition” would be correct but redundant. When speaking of instantiations of CEV with various parameters (of individuals, species or groups) it is practical, technically correct and preferred to write CEV regardless of the count of individuals in the parameter. Partly because it should be clear that CEV and CEV are talking about things very similar in kind. Partly because if people see “CEV” and google it they’ll find out what it means. Mostly because the ‘EV’ acronym is overloaded within the nearby namespace.
AVERAGE(3.1415) works in google docs. It returns 2.1415. If you are comparing a whole heap of aggregations of a feature, some of which only have one value, it is simpler to just use the same formula.
Seems reasonable.
I think I’d prefer the average of all human utility functions to any one individual’s utility function; don’t take it personally.
Is that Phil Goetz’s CEV vs. all humans’ CEV, or Phil Goetz’s current preferences or behaviour-function vs. the average of all humans’ current preferences or behaviour-functions? In the former scenario, I’d prefer the global CEV (if I were confident that it would work as stated), but in the latter, even without me remembering much about Phil and his views other than that he appears to be an intelligent educated Westerner who can be expected to be fairly reliably careful about potentially world-changing actions, I’d probably feel safer with him as world dictator than with a worldwide direct democracy automatically polling everyone in the world on what to do, considering the kinds of humans who currently make up a large majority of the population.
Voted up for distinguishing these things.
“I am obliged to confess I should sooner live in a society governed by the first two thousand names in the Boston telephone directory than in a society governed by the two thousand faculty members of Harvard University. ”—William F. Buckley, Jr.
Yes, I agree that William F. Buckley, Jr. probably disagrees with me.
Heh. I figured you’d heard the quote: I just thought of it when I read your comment.
I agree with Buckley, mainly because averaging would smooth out our evolved unconscious desire to take power for ourselves when we become leaders.
I can’t agree with that. I’ve got a personal bias against people with surnames starting ’A”!
Sorry Phil but now we’ve got a theoretical fight on our hands between my transhuman value set and yours. Not good for the rest of humanity. I’d rather our rulers had values that benefited everybody on average and not skewed towards your value set (or mine) at the expense of everybody else.
Phil: an AI who is seeking resources to further it’s own goals at the expense of everyone else is by definition an unfriendly AI.
Transhuman AI PhilGoetz is such a being.
Now consider this: I’d prefer the average of all human utility function over my maximized utility function even if it means I have less utility.
I dont want humanity to die and I am prepared to die myself to prevent it from happening.
Which of the two utility functions would most of humanity prefer hmmmmm?
If you would prefer A over B, it’s very unclear to me what it means to say that giving you A instead of B reduces your utility.
It’s not unclear at all. Utility is the satisfaction of needs.
Ah. That’s not at all the understanding of “utility” I’ve seen used elsewhere on this site, so I appreciate the clarification, if not its tone.
So, OK. Given that understanding, “I’d prefer the average of all human utility function over my maximized utility function even if it means I have less utility.” means that xxd would prefer (on average, everyone’s needs are met) over (xxd’s needs are maximally met). And you’re asking whether I’d prefer that xxd’s preferences be implemented, or those of “an AI who is seeking resources to further it’s own goals at the expense of everyone else” which you’re calling “Transhuman AI PhilGoetz”… yes? (I will abbreviate that T-AI-PG herafter)
The honest answer is I can’t make that determination until I have some idea what having everyone’s needs met actually looks like, and some idea of what T-AI-PGs goals look like. If T-AI-PGs goals happen to include making life awesomely wonderful for me and everyone I care about, and xxd’s understanding of “everyone’s needs” leaves me and everyone I care about worse off than that, then I’d prefer that T-AI-PG’s preferences be implemented.
That said, I suspect that you’re taking it for granted that T-AI-PGs goals don’t include that, and also that xxd’s understanding of “everyone’s needs” really and truly makes everything best for everyone, and probably consider it churlish and sophist of me to imply otherwise.
So, OK: sure, if I make those further assumptions, I’d much rather have xxd’s preferences implemented than T-AI-PG’s preferences. Of course.
Your version is exactly the same as Phil’s, just that you’ve enlarged it to include yourself and everyone you care about’s utility being maximized rather than humanity as a whole having it’s utility maximized.
When we actually do get an FAI (if) it is going to be very interesting to see how it resolves given that even among those who are thinking about this ahead of time we can’t even agree on the goals defining what FAI should actually shoot for.
I do not understand what your first sentence means.
As for your second sentence: stating what it is we value, even as individuals (let alone collectively), in a sufficiently clear and operationalizable form that it could actually be implemented, in a sufficiently consistent form that we would want it implemented, is an extremely difficult problem. I have yet to see anyone come close to solving it; in my experience the world divides neatly into people who don’t think about it at all, people who think they’ve solved it and are wrong, and people who know they haven’t solved it.
If some entity (an FAI or whatever) somehow successfully implemented a collective solution it would be far more than interesting, it would fundamentally and irrevocably change the world.
I infer from my reading of your tone that you disagree with me here; the impression I get is that you consider the fact that we haven’t agreed on a solution to demonstrate our inadequacies as problem solvers, even by human standards, but that you’re too polite to say so explicitly. Am I wrong?
We actually agree on the difficulty of the problem. I think it’s very difficult to state what it is that we want AND that if we did so we’d find that individual utility functions contradict each other.
Moreover, I’m saying that maximizing Phil Goetz’s utility function or yours and everybody you love (or even my own selfish desires and wants plus those of everyone I love) COULD in effect be an unfriendly AI because MANY others would have theirs minimized.
So I’m saying that I think a friendly AI has to have it’s goals defined as: Choice A. the maximum number of people have their utility functions improved (rather than maximized) even if some minimized number of people have their utility functions worsened as opposed to Choice B. a small number having their utility functions maximized as opposed to a large number of people having their utility functions decreased (or zeroed out).
As a side note: I find it amusing that it’s so difficult to even understand each others basic axioms never mind agree on the details of what maximizing the utility function for all of us as a whole means.
To be clear: I don’t know what the details are of maximizing the utility function for all of humanity. I just think that a fair maximization of the utility function for everyone has an interesting corrollary: In order to maximize the function for everyone, some will have their individual utility functions decreased unless we accept a much narrower definition of friendly meaning “friendly to me” in which case as far as I’m concerned that no longer means friendly.
The logical tautology here is of course that those who consider “friendly to me” as being the only possible definition of friendly would consider an AI that maximized the average utility function of humanity and they themselves lost out, to be an UNfriendly AI.
Couple of things:
If you want to facilitate communication, I recommend that you stop using the word “friendly” in this context on this site. There’s a lot of talk on this site of “Friendly AI”, by which is meant something relatively specific. You are using “friendly” in the more general sense implied by the English word. This is likely to cause rather a lot of confusion.
You’re right that if strategy 1 optimizes for good stuff happening to everyone I care and strategy 2 optimizes for good stuff happening to everyone whether I care about them or not, then strategy 1 will (if done sufficiently powerfully) result in people I don’t care about having good stuff taken away from them, and strategy 2 will result in everyone I care about getting less good stuff than strategy 1 will.
You seem to be saying that I therefore ought to prefer that strategy 2 be implemented, rather than strategy 1. Is that right?
You seem to be saying that you yourself prefer that strategy 2 be implemented, rather than strategy 1. Is that right?
Fair enough. I will read the wiki.
Yes
Not saying anything about your preferences.
Nope, I’m saying strategy 2 is better for humanity. Of course personally I’d prefer strategy 1 but I’m honest enough with myself to know that certain individuals would find their utility functions severely degraded if I had an all powerful AI working for me and if I don’t trust myself to be in charge then I don’t trust any other human unless it’s someone like Ghandi.
It’s not as clear as you think it is. I’m not familiar with any common definition of “utility” that unambiguously means “the satisfaction of needs”, nor was I able to locate one in a dictionary.
“Utility” is used hereabouts as a numerical value assigned to outcomes such that outcomes with higher utilities are always preferred to outcomes with lower utilities. See Wiki:Utility function.
Nor am I familiar with “sophist” used as an adjective.
Utility is generally meant to be “economic utility” in most discussions I take part in notwithstanding the definition you’re espousing for hereabouts.
I believe that the definition of utility you’re giving is far too open and could all too easily lead to smiley world.
It is very common to use nouns as adjectives where no distinct adjective already exists and thus saying someone is “sophist” is perfectly acceptable English usage.
Yeah, that doesn’t quite nail it down either. Note Wiktionary:utility (3):
It ambiguously allows both ‘needs’ and ‘wants’, as well as ambiguous ‘satisfaction experienced’.
The only consistent, formal definition of utility I’ve seen used in economics (or game theory) is the one I gave above. If it was clear someone was not using that definition, I might assume they were using it as more generic “preference satisfaction”, or John Stuart Mill’s difficult-to-formalize-coherently “pleasure minus pain”, or the colloquial vague “usefulness” (whence “utilitarian” is colloquially a synonym for “pragmatic”).
Do you have a source defining utility clearly and unambiguously as “the satisfaction of needs”?
No you’re right it doesn’t nail it down precisely (the satisfaction of needs or wants).
I do believe, however, that it more precisely nails it down than the wiki on here.
Or on second thoughts maybe not because we again come back to conflicting utilities: a suicidal might value being killed as higher utility than someone who is sitting on death row and doesn’t want to die.
And I was using the term utility from economics since it’s the only place I’ve heard where they use “utility function” so I naturally assumed that’s what you were talking about since even if we disagree around the edges the meanings still fit the context for the purposes of this discussion.
The question is whether the PhilGoetz utility function, or the average human utility function, are better. Assume both are implemented in AIs of equal power. What makes the average human utility function “friendlier”? It would have you outlaw homosexuality and sex before marriage, remove all environmental protection laws, make child abuse and wife abuse legal, take away legal rights from women, give wedgies to smart people, etc.
I don’t think you understand utility functions.
“The question is whether the PhilGoetz utility function, or the average human utility function, are better. ”
That is indeed the question. But I think you’ve framed and stacked the the deck here with your description of what you believe the average human utility function is in order to attempt to take the moral high ground rather than arguing against my point which is this:
How do you maximize the preferred utility function for everyone instead of just a small group?
Although I disagree with your heartbreak position I agree with this.
1) There’s no real reason to pull in unrelated threads just because you’re talking to the same person.
2) Most of us are pretty sure, based on that other thread, that you misunderstand wedrifid’s “heartbreak position”.
3) When referencing other posts, it’s usually good form to link to them (as above), to make it easier for others to follow.
It’s not clear to me that a “transhuman” AI would have the same properties as a “synthetic” AI. I’m assuming that a transhuman AI would be based on scanning in a human brain and then running a simulation of the brain while a synthetic AI would be more declaratively algorithmic. In that scenario, proving a self-modification would be an improvement for a transhuman AI would be much more difficult so I would treat it differently. Because of that, I’d expect a transhuman AI to be orders of magnitude slower to adapt and thus less dangerous than a synthetic AI. For that reason, I think it is reasonable to treat the two classes differently.
What morals are at odds?
You mean: rather like the Monopolies and Mergers commission?
We already have organisations like that. One question is whether they will be enough. So far, they have hampered—but evidently not yet completely crippled—the Microsoft machine intelligence—and now show signs of switching their attention to Google:
http://techcrunch.com/2010/02/23/eu-antitrust-google-microsoft/
I don’t think the lack of an earth-shattering ka-FOOM changes much of the logic of FAI. Smart enough to take over the world is enough to make human existence way better, or end it entirely.
It’s quite tricky to ensure that your superintelligent AI does anything like what you wanted it to. I don’t share the intuition that creating a “homeostasis” AI is any easier than an FAI. I think one move Eliezer is making in his “Creating Friendly AI” strategy is to minimize the goals you’re trying to give the machine; just CEV.
I think this makes apparent what a good CEV seeker needs anyway; some sense of restraint when CEV can’t be reliably extrapolated in one giant step. It’s less than certain that even a full FOOM AI could reliably extrapolate to some final most-preferred world state.
I’d like to see a program where humanity actually chooses its own future; we skip the extrapolation and just use CV repeatedly; let people live out their own extrapolation.
Does just CV work all right? I don’t know, but it might. Sure, Palestinians want to kill Israelis and vice versa; but they both want to NOT be killed way more than they want to kill, and most other folks don’t want to see either of them killed.
Or perhaps we need a much more cautious, “OK, let’s vote on improvements, but they can’t kill anybody and benefits have to be available to everyone...” policy for the central guide of AI.
CEV is a well thought out proposal (perhaps the only one—counterexamples?), but we need more ideas in the realm of AI motivation/ethics systems. Particularly, ways to get from a practical AI with goals like “design neat products for GiantCo” or “obey orders from my commanding officer” to ensure that they don’t ruin everything if they start to self-improve. Not everyone is going to want to give their AI CEV as its central goal, at least not until it’s clear it can/will self improve, at which point it’s probably too late.
While CEV is an admirably limited goal compared to the goal of immediately bringing about paradise, it still allows the AI to potentially intervene in billions of people’s lives. Even if the CEV is muddled enough that the AI wouldn’t actually change much for the typical person, the AI is still being asked to ‘check’ to see what it’s supposed to do to everyone. The AI has to have some instructions that give it the power to redistribute most of the Earth’s natural resources, because it’s possible that the CEV would clearly and immediately call for some major reforms. With that power comes the chance that the power could be used unwisely, which requires tremendously intricate, well-tested, and redundant safeguards.
By contrast, a homeostasis or shield AI would never contemplate affecting billions of people; it would only be ‘checking’ to see whether a few thousand AI researchers are getting too close. It would only need enough resources to, say, shut off the electricity to a lab now and then, or launch an EMP or thermite weapon. It would be given invariant instructions not to seize control of most of Earth’s natural resources. That means, at least for some levels of risk-tolerance, that it doesn’t need quite as many safeguards, and so it should be easier and faster to design.
Actually, Shield AI can’t be much less intrusive then FAI to require sufficiently simpler safeguards.
It will need worldwide spying network to gather required intel.
It will need global enforcement network to establish its presence in uncooperative nations / terminate AI research.
Given restrictive rules for dealing with humans, its planning/problem solving algorithms will tend to circumvent them, as these rules are direct obstacle to its main goal.
When it will find that humans are weak links in any automated system, we can expect that it will try to fully eliminate its dependence on humans.
Etc.
So, I’ve read The Hidden Complexity of Wishes, and I think the dangers can be avoided. I don’t want to design a shield AI that minimizes the probability of unfriendly AIs launching—I want to design a shield AI that reduces the probability until either (a) the probability is some low number, like 0.1% per year, or (b) the shield AI has gained control of its quota of one of the thousands of specified resources. Then the shield AI stops.
A “worldwide spying network” could consist of three really good satellites and an decent hacking routine.
A “global enforcement network” need not be in constant effect; its components could be acquired and dismissed as and when needed.
If the AI “circumvents us,” then that’s great. That means we won’t much notice its actions. Likewise if it “eliminates its dependence on us.” Although if you mean that the AI might escape its box, then I would argue that certain instructions can be left unmodifiable; this AI would not have full ability to modify its source code. Hence the idea of a mini-FOOM rather than a full FOOM; with human vetos over certain modifications, it wouldn’t full-FOOM even if such things were possible.
You’ll find it surprisingly difficult to express what does “AI stops” mean, in terms of AI’s preference. AI always exerts some influence on the world, just by existing. By “AI stops”, you mean a particular kind of influence, but it’s very difficult to formalize what kind of influence, exactly, constitutes “stopping”.
I imagine that an AI would periodically evaluate the state of the world, generate values for many variables that describe that state, identify a set of actions that its programming recommends for those values of the variables, and then take those actions.
For an AI to stop doing something would mean that the state of the world corresponding to an AI having reached maximum acceptable resource usage generates variables that lead to a set of actions that do not include additional use by the AI of those resources.
For example, if the AI is controlling an amount of water that we would think is “too much water,” then the AI would not take actions that involve moving significant amounts of water or significantly affecting its quality. The AI would know that it is “controlling” the water because it would model the world for a few cycles after it took no action, and model the world after it took some action, and notice that the water was in different places in the two models. It would do this a few seconds out, a few minutes out, a few hours out, a few days out, a few months out, and a few years out, doing correspondingly blunter and cruder models each time the period increased to economize on processing power and screen out movements of water that were essentially due to chaos theory rather than proximate causation.
Am I missing something? I realize you probably have a lot more experience specifying AI behavior than I do, but it’s difficult for me to understand the logic behind your insight that specifying “AI stops” is hard. Please explain it to me when you get a chance.
Vladimir, I understand that you’re a respected researcher, but if you keep voting down my comments without explaining why you disagree with me, I’m going to stop talking to you. It doesn’t matter how right you are if you never teach me anything.
If you would like me to stop talking to you, feel free to say so outright, and I will do so, without any hard feelings.
(I didn’t downvote the grandparent, and didn’t even see it downvoted. Your comment is still on my “to reply” list, so when it doesn’t feel like work I might eventually reply (the basic argument is related to “hidden complexity of wishes” and preference is thorough). Also note that I don’t receive a notification when you reply to your own comment, I only saw this because I’m subscribed to wedrifid’s comments.)
All right, I apologize for jumping to conclusions.
We can subscribe to comments? Is this via RSS? And can we do it in bulk? (I suppose that would mean creating many subscriptions and grouping them in the reader.)
What I would like is to be able to subscribe to a feed of “comments that have been upvoted by people who usually upvote the comments that I like and downvote the comments that I dislike”.
Via RSS, I’m subscribed to comments by about 20 people (grouped in a Google Reader folder), which is usually enough to point me to whatever interesting discussions are going on, but doesn’t require looking through tons of relatively low-signal comments in the global comments stream. It’s a good solution, you won’t be able to arrange a similar quality of comment selection automatically.
(I upvoted the grandparent. It seems to ask relevant questions and give clear reasons for asking them.)
The shield AI bothers me. One would need to be very careful to specify how a shield AI would function so that it would a) not try to generally halt human research development in its attempt to prevent the development of AIs b) allow humans to turn it off when we think we’ve got friendly AIs even as it doesn’t let FOOMing AIs turn it off. Specifying these issues might be quite difficult. Edit: Sorry,b is only an issue for a slightly different form of shield AI since you seem to want a shield AI which is actually not given a specific method of being turned off. I’m not sure that’s a good idea (especially if the shield AI goes drastically wrong).
Regarding encryption issues: Even in the circumstance where encryption turns out to be provably secure (say a proof that factoring is not in P in some very strong sense) that doesn’t mean that our implementations are secure. It isn’t infrequent for an implementation of an encryption scheme to turn out to have a serious problem, not because of anything in the nature of the encryption, but due to errors and oversights in the human programming. This should increase the probability estimate that an AI could take over highly secured systems.
The one thing that could make me deeply worried about FOOMing AIs would be a non-constructive proof that P=NP. If that occurs, most forms of encryption become potentially highly non-secure and most of the potential ways that an Option 3 “software fail” could occur become much less likely. (I think I may have mentioned this to you earlier).
I’m curious what other existential risks other people will think are likely for Option 1. I think I’ll wait until listing my own in order not to anchor the thinking in any specific direction (although given earlier comments I made I suspect that Mass Driver can guess the general focus most of the ones I would list would take.)
Of course! I don’t pretend that it’s easy; it’s just that it may require, say, 6 people-years of work instead of 400 people-years of work, and thus be a project that actually could be completed before unFriendly AIs start launching.
I mean, you could have an off-switch that expires after the first 2 years or so, maybe based on the decay of a radioactive element in a black box, with the quantity put in the black box put there before the shield AI is turned on and with the exact quantity unknown to all except a very small number of researchers (perhaps one) who does her calculations on pencil and paper and then shreds and eats them. That way you could get a sense of the AI’s actual goals (since it wouldn’t know when it was safe to ‘cheat’) during whatever little time is left before unfriendly AI launches that could take over the off-switch start becoming a serious threat, and, if necessary, abort.
I agree that there is a serious problem about what to do in possible futures where we have an AI that’s smart enough to be dangerous, but not powerful enough to implement something like CEV. Unfortunately I don’t think this analysis offers any help.
Of your list of ways to avoid a FOOM, option 1 isn’t really relevant (since if we’re all dead we don’t have to worry about how to program an AI). Option 2 is already ruled out with very high probability, because you don’t need any exotic physics to reach mind-boggling levels of hardware performance. For instance, performance estimates for nanotech rod-logic computers come in at around a 10^9 op/sec per cubic micron, and electronic devices should beat that by several orders of magnitude. For comparison, the human brain seems to turn out around 10^15 op/sec in 1500cc, or ~10 op/sec per cubic micron). So a specific technology like microchip manufacturing might top out, but one way or another ordinary human efforts to improve computer performance will eventually carry us far beyond any plausible FOOM requirement.
Option 3 hinges on issues we won’t fully understand until we’re close to having a working AGI. But it’s hard to come up with a theory of intelligence that doesn’t boil down to a system of heuristic engines searching various abstract solution spaces, where at worst an exponential improvement in hardware yields a linear improvement in size of search space covered per unit time. In real applications you can usually get O(n) or even O(log n) solutions for the problems you actually care about, which implies that at a certain point a hard takeoff is inevitable.
But we have no way to know where that point is, and the complexity of CEV does make this an important issue. If an infrahuman AI can suddenly go FOOM and turn into an SI then something like CEV might be practical. But if the FOOM moment doesn’t come until the AI is well into transhumant territory we could spend years in a world of moderately-superhuman AIs that need a much less complex approach to Friendliness.
Unfortunately this just leads us back to all the problems that Eliezer was trying to dodge in proposing CEV. If you want to make an AI reliably Friendly you have to be able to describe Friendliness in a way that is unambiguous, complete, and not susceptible to gaming, which can’t be done with sentences of English text. You’d have to first understand how human language processing works, then build an AI implementation which handles unnatural categories and other fun complications in a predictable way, and then compose your instructions with the design of the ‘parser’ in mind.
Which is a lot less fun than the traditional ‘let’s talk about what orders to give the godlike AI’ debate, but it’s about the least rigorous approach that has any chance of working.
This is a brief summary of what I consider the software based non-fooming scenario.
Terminology
Impact—How much the agent can make the world how it wants (hit a small target in search space, etc) Knowledge = correct programming.
General Outlook/philosophy
Do not assume that an agent knows everything, assume that an agent has to start off with no program to run. Try and figure out where and how it gets the information for the program. And whether it can be mislead by sources of information.
General Suppositions
High impact requires that someone (either the creator or the agent) have a high knowledge of the world in order for the system to be appropriate. And the right knowledge; knowing trillions of digits of pi is generally not as useful as where the oil reserves are when trying to take over the world.
Usefulness of knowledge is not inherently obvious. It is also changeable. The knowledge of how to efficiently get blubber from a whale is less useful now we have oil.
Knowledge can be acquired through social means, derived from prior knowledge and experience or experimentally.
Moving beyond your current useful knowledge requires luck in picking the right things to analyse for statistical correlations.
Knowledge can rely on other knowledge.
Historical Explanations
Evolution can gather knowledge.
Brains can gather knowledge. It is monumentally wasteful if the individual dies and the knowledge is not passed on, as the next generation has to reinvent the wheel each time.
Lots of the increase in the impact of individual that has happened through evolution has been due to passing of knowledge between individuals, not improvements in base algorithms for deriving knowledge from sense data. This is especially true in the case of humans with language. High G/IQ may be based on the ability to get knowledge from other humans, thus being able to have higher impact.
Acquiring knowledge from other humans might not always be good as they lie and manipulate. Lots of rubbish out there. Hard to distinguish good from bad (requires knowledge), but still easier than reinventing the wheel. Equivalents of malware detectors needed for bad stuff. The malware detectors might falsely recognise good knowledge as bad programming.
Computational resources are only useful in so much as you have the correct knowledge to make use of them. If you can only find simple useful models you don’t need more resources.
Future
The fastest scenario is that AIs can quickly expand computational resources to enable it to make use of all of human knowledge. After that it is “slow”.
More likely we are going to have problems with AIs not being able to inherently recognise things like post-modernist thinking as dead ends of human thought. It naturally lacks all the knowledge that evolution has given humans as a shared base for understanding/predicting each other, so we might expect it to have lots of problems with this regard.
Encryption that an AI couldn’t break is the easy part. Just don’t do something dumb like with WEP. The hard part is not the encryption but all the stuff that is supposed to keep things safely hidden behind the encryption. Think “airtight blastproof shield held in place with pop rivets, duct tape and a some guy from marketing plugging one of the holes with his finger”
Strongly disagree. Consider for example the possibility that our encryption relies on the difficulty in factoring large numbers and the AI finds a way of doing so efficiently. Just because human mathematicians haven’t succeeded at something doesn’t mean a smart AI won’t. Moreover, as far as many encryption related claims have been, in general we’ve been much too optimistic about the difficulty of breaking encryption. See for example Rivest’s famously incorrect estimate. In 1977, Rivest estimated that breaking RSA-129 would take around 10^15 years but it was broken less than 20 years later.
I strongly disagree. Not because I disagree with the super-intelligence factoring large primes part since I actually considered that as I was writing. Rather I assert that this does not warrant the conclusion that the encryption is the hard part. That means I am asserting a higher probability for humans proving that some suitable task is not solvable in Jupiter-Brain time is greater than successfully using this to prevent a super-intelligence from accessing a computer system. Not only does such a system contains millions of places for software and hardware errors but more importantly contains human parts. I included “some guy from marketing plugging one of the holes with his finger” for a reason.
Ah ok. Then we don’t disagree substantially. I just consider the two possibilities (problems with the encryption method, and error in implementation) to be roughly the same probability or close enough given the data we currently have that I can’t make a decent judgment on the matter, whereas you seem to think that the human error problem is substantially more likely.
Yes, it sounds like just a difference in degree.
This subject deserves a whole chapter of Harry Potter Fanfiction. The need for Constant Vigilance when guarding against an enemy that is resourceful, clever, more powerful and tireless. It would conclude with Mad Eye Moody getting killed. Constant Vigilance is futile when you are a human. The only option is to kill the enemy once and for all, to eliminate that dependence.
I don’t think MoR really needs a chapter on that.
I mean, canon Harry Potter does that already—Mad Eye (the real one) is captured by Dark forces before we ever meet him, tortured routinely, and 2 or 3 years later is killed by them.
(And of course, canon Mad Eye had no chance of actually killing Voldemort once and for all, so Constant Vigilance was all he could do.)
More examples: (1) people have a history of reusing one-time pads (2) side-channel attacks. The latter is a big deal that doesn’t really fit the dichotomy.
Although the question of the effect of non-FOOMing AI is interesting, this particular article is full of sloppy thinking and cargo cult estimates. (Right now I’m not in the mood to break it down, but I’d guess many of the regulars are qualified to do that.)
1 seems unlikely and 2 and 3 seem silly to me. An associated problem of unknown scale is the wirehead problem. Some think that this won’t be a problem—but we don’t really know that yet. It probably would not slow down machine intelligence very much, until way past human level—but we don’t yet know for sure what its effects will be.
I’m curious why 3 seems silly.
If the complexity hierarchy does not exhibit major collapse (say P, NP. coNP, PSPACE, and EXP are all distinct (which at this point most theoretical computer scientists seem to believe)) then many genuinely practical problems cannot be done much more efficiently than we can do them today. For example, this would imply that factoring integers probably cannot be done in polynomial time. It also implies that the traveling salesman problem cannot be solved efficiently, a problem which shows up in many practical contexts including circuit design. If that were the case, even if there are no Option 2 problems (in that really good hardware is actually possible), designing such hardware might become increasingly difficult at a rate faster than the hardware improves. I consider that situation to be unlikely, but from what I know of the discipline it is plausible (possibly someone involved in the industries in question can comment on the plausibility. I think we we have a few such people on LW). Graph coloring would also be intrinsically hard, and graph coloring comes up in memory design and memory management issues which would be very relevant to an AI trying to go FOOM.
Even if the complexity hierarchy collapses or exhibits partial collapse there are still going to be bounds on all these practical problems beyond which they cannot be optimized. They will be polynomial bounds and so won’t grow fast which will make things happy for our AI but absolute bounds will still exist.
It is possible that the entire hierarchy doesn’t collapse but that there’s some algorithm for solving some standard NP complete problems that is very efficient as long as the length of the input is less than 10^80 or something like that. In which case even without the complexity collapse, the AI would still be able to go FOOM. But this possibility seems very unlikely.
Similarly, it is possible that someone will develop hardware allowing small wormholes to aid computation in which case the physical laws of the universe will allow heavy collapse (see Scott Aaronson’s remarks here) with everything up to PSPACE collapsing completely. But that’s essentially getting around problem 3 by making ridiculously optimistic hardware assumptions. It is also possible that quantum computing will become very practical and it turns out that BQP=NP or so close as to not make a practical difference (similar to our hypothetical algorithm that works well with inputs less than 10^80th one could conceive of a quantum algorithm that did the same thing for all small inputs even with BQP turning out to be a proper subset of NP (as I understand it, at present we don’t actually know even if BQP is a subset of NP but it is suspected that it is)). But that a) assumes that quantum computing will be strongly practical and b) requires extremely strange and unlikely results about computational complexity.
The best argument as far as I am aware against Option 3 failure is that if hardware takes off really well (say the hardware is possible and nanotech makes it fast to build) then the software constraints become not very relevant. So if hardware turns out to be good enough, software constraints might not matter much. But if FOOMing requires improvement of both then this becomes a real concern.
To solve the NP-hard problems in hardware design and such, you don’t necessarily need a solver that works on all (“random”, “general position”) NP-hard problems. You can find and exploit regularities in the problems that actually arise. We humans seem to do just that: someone who plays chess well can’t easily convert their skill to other chess-like games.
Replying separately to earlier reply since long-time gap will make an edited remark unlikely to be noticed by you.
I just ran across in a completely different context this paper on average difficulty of NP-hard problems. This paper is highly technical and I don’t think I follow all (or even most) of the details, but the upshot seems to be that, roughly speaking, the only way for most instances of fairly natural NP complete problems to be as difficult as the worst case is if NP complete problems are actually easy. This makes your objection all the more powerful because it suggests that it is likely that for specific NP complete problems an AI would need to solve it would not just be able to take advantage of regularity in the specific instances it cares about but it would also be able to rely on the fact that most random instances simply aren’t that tough compared to the very worst case. This issue combined with your remark forces me to update my estimation that a software issues will interfere substantially with FOOMing. Taken together, these issues undermine the arguments I gave for software barriers to FOOM.
Yes, that seems to be a very good point. It is unlikely an AI is going to need to solve arbitrary instances of the traveling salesman or graph coloring.
Replying separately to earlier reply since long-time gap will make an edited remark unlikely to be noticed by you.
I just ran across in a completely different context this paper on average difficulty of NP-hard problems. This paper is highly technical and I don’t think I follow all (or even most) of the details, but the upshot seems to be that, roughly speaking, the only way for most instances of fairly natural NP complete problems to be as difficult as the worst case is if NP complete problems are actually easy. This makes your objection all the more powerful because it suggests that it is likely that for specific NP complete problems an AI would need to solve it would not just be able to take advantage of regularity in the specific instances it cares about but it would also be able to rely on the fact that most random instances simply aren’t that tough compared to the very worst case. This issue combined with your remark forces me to update my estimation that a software issues will interfere substantially with FOOMing. Taken together, these issues undermine the arguments I gave for software barriers to FOOM.
It is important to realize that producing an agent capable of finding the optimum solution in a search space 1000 times as large is not the same thing as producing an agent capable of finding solutions that are 1000 times as good.
It sometimes seems to me that FOOM believers fail to take this distinction into account.
3 says software will eventually run into optimality limits, which will eventually slow growth. That is right—but we can see that that is far off—and is far enough away to allow machine intelligences to zoom far past human ones in all domains worth mentioning.
How do we know this is far off? For some very useful processes we’re already close to optimal. For example, linear programming is close to the theoretical optimum already as are the improved versions of the Euclidean algorithm, and even the most efficient of those are not much more efficient than Euclid’s original which is around 2000 years old. And again, if it turns out that the complexity hierarchy strongly does not collapse then many algorithms we have today will turn out to be close to best possible. So what makes you so certain that we can see that reaching optimality limits is far off?
I was comparing with the human brain. That is far from optimal—due to 1-size-fits-all pattern, ancestral nutrient availability issues (now solved) - and other design constraints.
Machine intelligence algorithms are currently well behind human levels in many areas. They will eventually wind up far ahead—and so currently there is a big gap.
Comparing to the human brain is primarily connected to failure option 2, not option 3. We’ve had many years now to make computer systems and general algorithms that don’t rely on human architecture. We know that machine intelligence is behind humans in many areas but we also know that computers are well ahead of humans in other areas (I’m pretty sure that no human on the planet can factor 100 digit integers in a few seconds unaided). FOOMing would likely require not just an AI that is much better than humans at many of the tasks that humans are good at but also an AI that is very good at tasks like factoring that computers are already much better at than humans. So pointing out that the human brains are very suboptimal doesn’t make this a slamdunk case. So I still don’t see how you can label concerns about 3 as silly.
Cousin it’s point (gah, making the correct possessive there looks really annoying because it looks like one has typed “it’s” when one should have “its”) that the NP hard problems that an AI would need to deal with may be limited to instances which have high regularity seems like a much better critique.
Edit: Curious for reason for downvote.
It feels a little better if I write cousin_it’s. Mind you I feel ‘gah’ whenever I write ‘its’. It’s a broken hack in English grammar syntax.
If linear programming is so close to the optimum, why did we see such massive speedups in it and integer programming over the past few decades? (Or are you saying those speedups brought us almost to the optimum?)
There are a variety of things going on here. One is that those speedups helped. A major part those is imprecision on my part. This is actually a good example of an interesting (and from the perspective of what is discussed above) potentially dangerous phenomenon. Most of the time when we discuss the efficiency of algorithms one is looking at big-O bounds. But those by nature have constants built into them. In the case of linear programming, it turned out that we could drastically improve the constants. This is related to what I discussed above where even if one has P != NP, one could have effective efficient ways of solving all instances of 3-SAT that have fewer than say 10^80 terms. That sort of situation could be about as bad from a FOOM perspective. To the immediate purpose being discussed above, the Euclidean algorithm example is probably better since in that case we actually know that there’s not much room for improvements in the constants.
Thank you for saying that non-FOOM has nonzero probability and should be considered!
Another case I’d like to be considered more is “if we can’t/shouldn’t control the AIs, what can we do to still have influence over them?”
Thermite. Destroying or preventing them is the ONLY option in that situation. (Well, I suppose you could launch them out of our future light cone.)
I hope that was a joke because that doesn’t square with our current understanding of how physics works...
You are mistaken.
I’m pretty sure I’m not mistaken. At this risk of driving this sidetrack off a cliff...
Once an object (in this case, a potentially dangerous AI) is in our past light cone, the only way for its world line to stay outside of our future light cone forever (besides terminating it through thermite destruction as mentioned above) is for it to travel at the speed of light or faster. That was the physics nitpick I was making. In short, destroy it because you cannot send it far enough away fast enough to keep it from coming back and eating us.
Close, but the tricky part is that the universe can expand at greater than the speed of light. Nothing (like photons) that can influence cause and effect can travel faster than c but the fabric of spacetime itself can expand faster than the speed of light. Looking at the (models of) the first 10^-30 seconds highlights this to an extreme degree. Even now some of the galaxies that are visible to us are becoming further away from us by more than a light year per year. That means that the light they are currently emitting (if any) will never reach us.
To launch an AI out of our future light cone you must send it past a point at which the expansion of the universe makes that point further away from us at c. At that time it will be one of the points at the edge of our future light cone and beyond it the AI can never touch us.
So you’re positing a technique that takes advantage of inflationary theory to permanently get rid of an AI. Thermite—very practical. Launching the little AI box across the universe at near light-speed for a few billion years until inflation takes it beyond our horizon—not practical.
To bring this thread back onto the LW Highway...
It looks like you fell into a failure mode of defending what would otherwise have been a throwaway statement—probably to preserve the appearance of consistency, the desire not to be wrong in public, etc. (Do we have a list of these somewhere? I couldn’t find examples in the LW wiki.) A better response to “I hope that was a joke...” than “You are mistaken” would be “Yeah. It was hyperbole for effect.” or something along those lines.
A better initial comment from me would have been to make it into an actual question because I thought you might have had a genuine misunderstanding about light cones and world lines. Instead, it came off hostile which wasn’t what I intended.
I don’t think that wedrifid made those remarks to save face or the like since wedrifid is the individual who proposed both thermite and the light cone option. The light cone option was clearly humorous and then wedrifid expalined how it would work (for some value of work). If I am reading this correctly there was not any serious intent at all in that proposal but to emphasize how wedrifid sees destruction as the only viable response.
Thankyou Joshua. I was going to let myself have too much fun with my reply so it is good that you beat me to it. I’ll allow myself to add two responses however.
The relevant failure mode here is “other optimising”.
No, no, no. That would be wrong, in as much as it is accepting a false claim about physics! Direct contradiction is exactly what I want to convey. This is wrong on a far more basic level than the belief that we could could control, or survive, an unfriendly GAI. There are even respected experts (who believe their expertise is relevant) who share that particular delusion—Robin Hanson for example.
No; I said, I’d like to case to be considered. What you are doing is NOT considering it.
Woah, that is a lot of divs I had to count to count!
Considering all the other alternatives it’s rather fortunate that we have thermite, an expanding cosmos and special relativity at our displosal for influencing cause and effect. Without those we’d be screwed!
The thing with hardware limits is that you can keep on building more computers, even if they’re slow, and create massive parallel processors. We know that they can at the very least achieve the status of the human brain. Not to mention that they can optimize the process for building them, possibly many different designs including biological inspired ones that can, with a large enough energy source, consume organic material (FYI, the earth is made of organic material—rocks) and grow at exponential rates. As the system grows exponentially larger it would channel resources it couldn’t use for use in other purposes (for example, it could make silicon based computers as well or maybe start producing vehicles to colonize other parts of the solar system.) The problem with this system is that even if it’s friendly, it means the irreversable destruction of the earth unless we make a point to preserve the earth and all life on it, maybe the rest of the solar system as well.
Rocks are INorganic.
Most rocks are inorganic but a substantial fraction are organic. Limestone and dolomite are common examples. Still, your basic point is correct. Certainly the majority of rocks are inorganic which is more than enough to make your point pretty clear. The post you were responding to is unambiguously very wrong.
I’m sorry… what is Foom? I can’t find a definition anywhere.
http://wiki.lesswrong.com/wiki/FOOM
I think that the prospects for general artificial intelligence are getting worse.
This argument might be better-received if you posted it, or an excerpt, here, rather than just supplying a link.
I would think that the more “AI problems” that are used up, the more that will be found. Only when an AI of a certain level is achieved will we spot the next places for it to be used.
I’m unimpressed by your arguments. I’m not going to deal with the fact that you don’t grapple with the arguments for why to expect general AI but rather just focus on the claims you do make.
Is this really the case? The argument at least around these parts is about recursive self-improvement after one has an AI. It is hard to see why recursive self-improvement would be that useful for a not very intelligent AI. I’d be curious to see what evidence you have that this is the “usual story.”
Let’s take your narrative for granted. You then say:
This has been false for some time. Eurisko was made in the late 1970s and was able to modify its own heuristics.
This seems like your strongest argument, that the research funding for general AI will become small if we do a very good job at solving many different AI tasks. However, even that won’t dry up funds completely, just a lot. So that would be an argument for why we would expect general AI to take a while, not why we’d expect it to not occur. And without more details, it isn’t at all clear what timeline to actually expect or to work how how much this will influence when general AI occur.
I’m not sure that’s fair. Eurisko could do that only within a very limited domain (a type of strategy game), and there hasn’t been anything similar since, as far as I know.
This AI would prevent anyone, including SIAI, from developing any sort of an AI.
I share this confusion. The only reasonable interpretation I can see is that Mass Driver prefers an AI that ensures nobody programs another AI, ever.
Otherwise you’d have to build the AI to tell Friendly AIs in development from Unfriendly ones, which appears to be tantamount to programming Friendliness in the first place.
I suppose there are situations in which we might choose to take that option. Most obviously if we do have the tech to build such an AI, don’t have the tech to build a full AI with CEV capabilities and we also know that some other fool is two months away from releasing a happy face maximizer. The options are then to kill the foolish AI developers or release the anti-AI and accept an eternity of mediocrity.
Letting the happy face AI run is arguably the right decision in this situation, if stopping the programmers is absolutely impossible for some reason (although I doubt very much that the “no AI AI” is a possible construction; compare this with a “do nothing AI”—how do you specify its goals, and have it optimize the world, in such a way that nothing happens?).
I don’t agree. I don’t want to die. I would prefer to live in a world that relied on non-AI technology. I hope you and Rolf do not get in my way when I do what needs to be done.
Just to summarize the argument that Vladimir is referring to: the smiley-face AI might be prepared to turn a tiny fraction of the universe (like, say, a galaxy) to the control of its best guess at a human-friendly AI, because the human-friendly AI would do the same for it in other, um, branches of the wavefunction or something.
This counterfactual trade is a positive sum game, because of astronomical waste, so the bad AI is like an ally to us.
This is a question of fact, not a fight between different preferences. I’m not certain either way, so I don’t argue that UFAI is definitely the right choice, but that the opposite is not obviously the right choice. You should give the arguments (which at least Wei Dai, Carl Schulman and I take seriously) some consideration, irrespective of how absurd the conclusion sounds. There seems to be deep similarity in the structure of disagreement between this idea and cryonics.
I am disagreeing on the question of fact. What we can do without an FAI is by far superior to any scraps we can expect smiley-face maximiser to contribute due to exchanges.The greatest of the existential risks that not having an FAI entails is the threat of an uFAI. anti-AI removes that. We do have some potential for survival based on other technologies within our grasp. SIAI would have to devote itself to solving other hard problems.
Wei mentioned a combinatorial explosion. He may have been applying it somewhat differently than I am but I am claiming that an overwhelming number of the possible mind designs that Smiley is bargaining with are also bad for me. He is bargaining with a whole lot of Clippy’s brothers and sisters. Bargaining with a whole lot of GAIs that are released that care primarily about their own propagation. Even more importantly that small proportion of FAIs that do exist are not friendly to things I care about. Almost none of them will result in me personally being alive.
This all assumes that the bargaining does in fact go ahead. I’m not certain either way either and nor am I certain that in the specific case of Smiley one of his optimal trading partners will be an FAI which I happen to like.
All this means that I am comfortable with the assertion you quote. If you or Rolf did try to stop me from pressing that no-AI button then you would just be obstacles that needed to be eliminated, even if your motives are pure. My life and all that I hold dear is at stake!
I think that makes some sense. It’s not clear to me that building a smiley-face maximizer that trades with AIs in other possible worlds would be better than having a no-AI future.
There is another possibility to consider though. Both we and the smiley-face maximizer would be better off if we did allow it to be built, and then it gives our preferences some control (enough for us to be better off than the no-AI future). It’s not clear that this opportunity for trade can be realized, but we should spend some time thinking about it before ruling it out.
It seems like we really need a theory of games that tells us (human beings) how to play games with superintelligences. We can’t depend on our FAIs to play the games for us, because we have to decide now what to do, including the above example, and also what kind of FAI to build.
Sounds like Drescher’s bounded Newcomb. This perspective suddenly painted it FAI-complete.
Can you please elaborate? I looked up “FAI-complete”, and found this but I still don’t get your point.
See the DT list. (Copy of the post here.) FAI-complete problem = solving it means that FAI gets solved as well.
That FAI is good for you is a property of the term “FAI”. If it doesn’t create value for you, it’s not FAI, but something like Smileys and Paperclippers, potential trade partner but not your guy. Let’s keep it simple.
“Friendly to their Creator AI”, choose an acronym. Perhaps FAI. Across the multiverse most civilizations that engage in successful AI efforts will produce an AI that is not friendly to me. AIs that are actually FAIs (which include by definition my own survival) are negligible.
Releasing a Smiley will make me die and destroy everything I care about. I will kill anyone who stops me preventing that disaster. That is as simple as I can make it.
Formal preference is a more abstract concept than survival in particular, and even though all else equal, in usual situations, survival is preferable to non-survival, there could be situations even better than “survival”. It’s not out of the question “by definition” (you know better than to invoke this argument pattern).
Formal preference is one particular thing. You can’t specify additional details without changing the concept. If preference says that “survival” is a necessary component, that worlds without “survival” are equally worthless, then so be it. But it could say otherwise. You can’t study something and already know the answer, you can’t just assume to know that this property that intuitively appeals to you is unquestionably present. How do you know? I’d rather build on clear foundation, and remain in doubt about what I can’t yet see.
Negligible, non-negligible, that’s what the word means. It talks about specifically working for your preference, because of what AI values and not because it needs to do so for trade. FAI could be impossible, for example, that doesn’t change the concept. BabyEater’s AI could be an UFAI, or it could be a FAI, depending on how well it serves your preference. It could turn out to be a FAI, if the sympathy aspect of their preference is strong enough to dole you a fair part of the world, more than you own by pure game-theoretic control.
FAI doesn’t imply full control given to your preference (for example, here on Earth we have many people with at least somewhat different preferences, and all control likely won’t be given to any single person). The term distinguishes AIs that optimize for you because of their own preference (and thus generate powerful control in the mathematical universe for your values, to a much more significant extent than you can do yourself), from AIs that optimize for you because of control pressure (in other terms, trade opportunity) from another AI (which is the case for “UFAI”).
(I’m trying to factor the discussion into the more independent topics to not lose track of the structure of the argument.)
Please don’t derail a civilized course of discussion, this makes clear communication more expensive in effort. This particular point was about a convention for using a word, and not about that other point you started talking sarcastically about here.
Also, speculating on the consequences of a conclusion (like the implication from it being correct to not release the UFAI, to you therefore having to destroy everything that stands in the way of preventing that event, an implication with which I more or less agree, if you don’t forget to take into account the moral value of said murders) is not helpful in the course of arguing about which conclusion is the correct one.
I engaged with your point and even accepted it.
I reject your labeling attempt. My point is a fundamental disagreement with an important claim you are making and in no way sarcastic. Your comments here are attempting to redirect emphasis away from the point by re framing my disagreement negatively while completely ignoring my engagement with and acceptance of your point.
I also do not understand the “Let’s keep it simple” rhetoric. My misuse of the ‘FAI’ algorithm was oversimplifying for the purposes of brevity and I was willing to accept your more rigorous usage even though it requires more complexity.
I have previously discussed the benefits of the ‘kill test’ in considering moral choices when things really matter. This is one of the reasons Eliezer’s fan-fiction is so valuable. In particular I am influenced by Three Worlds Collide and The Sword of Good. I find that it is only that sort of stark consideration that can overcome certain biases that arise from moral squeamishness that is not evolved to handle big decisions. The “Ugh” reaction to things that “violate people’s rights” and to coercement bias us towards justifying courses of action so we don’t have to consider being ‘bad’. I may write a top level post on the subject (but there are dozens above it on my list).
This conversation is not one that I will enjoy continuing. I do not believe I am likely to make you update and nor do I expect to elicit new arguments that have not been considered. If something new comes along or if a top level post is created to consider the issues then I would be interested to read them and would quite probably re-engage.
Okay, misunderstanding on both sides. From what I understood, there is no point in working on reaching agreement on this particular point of meta and rhetoric. (More substantial reply to the point we argue and attempt to reframe it for clarity are in the other two comments, which I assume you didn’t notice at the time of writing this reply.)
Could you restate that (together with what you see as the disagreement, and the way “kill test” applies to this argument)? From what I remember, it’s a reference to intuitive conclusion: you resolve the moral disagreement on the side of what you actually believe to be right. It’s not a universally valid path to figuring out what’s actually right, intuitions are sometimes wrong (although it might be the only thing to go on when you need to actually make that decision, but it’s still decision-making under uncertainty, a process generally unrelated to truth-seeking).
Ok. And yes, I hadn’t seen the other comments (either not yet written or hidden among the other subjects in my inbox).
Sadly Vladimir this failure to understand stakeholder theory is endemic in AI discussions. Friendly AI cannot possibly be defined as being “if it doesn’t create value for you it’s not FAI” because value is arbitrary. There are some people who want to die and others to want to live being the stark example. Everyone being killed is thus value for some and not value for others and vice versa.
What we end up with is having to define friendly as being “creating value for the largest possible number of human stakeholders even if some of them lose”.
For example, someone who derives value from ordering people around or having everyone else be their personal slaves such as Caligula or the ex-dictator Gaddafi doesn’t (didn’t....) see value in self-rule for the people and thus fought hard to maintain the status quo, murdering many people in the process.
In any scenario whereby you consider the wants of those who seek most of the world’s resources or domination over others, you’re going to end up with an impossible conundrum for any putative FAI.
So given that scenario, what is really in all of our best interests if some of us aren’t going to get what we want and there is only one Earth?
One answer I’ve seen is that the AI will create as many worlds as necessary in order to accommodate everyone’s desires in a reasonably satisfactory fashion. So, Gaddafi will get a world of his own, populated by all the people who (for some reason) enjoy being oppressed. If an insufficient number of such people exist, the FAI will create a sufficient number of non-sentient bots to fill out the population.
The AI can do all this because, as a direct consequence of its ability to make itself smarter exponentially, it will quickly acquire quasi-godlike powers, by, er, using some kind of nanotechnology or something.
By extrapolation it seems likely that the cheapest implementation of the different-worlds-for-conflicting-points of view is some kind of virtual reality if it proves too difficult to give each human it’s own material world.
Yes, and in the degenerate case, you’d have one world per human. But I doubt it would come to that, since a). we humans really aren’t as diverse as we think, and b). many of us crave the company of other humans. In any case, the FAI will be able to instantiate as many simulations as needed, because it has the aforementioned nano-magical powers.
Indeed. It’s likely that many of the simulations would be shared.
What I find interesting to speculate on then is whether we might be either forcibly scanned into the simulation or plugged into some kind of brain-in-a-vat scenario a la the matrix.
Perhaps the putative AI might make the calculation that most humans would ultimately be OK with one of those scenarios.
Meh… as far as I’m concerned, those are just implementation details. Once your AI gets a hold of those nano-magical quantum powers, it can pretty much do anything it wants, anyway.
I understand that you don’t want to die or lose the future, and I understand the ingrained thought that UFAI = total loss, but please try to look past that, consider that you may be wrong, see that being willing to ‘eliminate’ your allies over factual disagreements loses, and cooperate in the iterated epistemic prisoner’s dilemma with your epistemic peers. You seem to be pretty obviously coming at this question from a highly emotional position, and should try to deal with that before arguing the object level.
That it’s far superior is not obvious, both because it’s not obvious how well we could reasonably expect to do without FAI (How likely would we be to successfully construct a singleton locking in our values? How efficiently could we use resources? Would the anti-AI interfere with human intelligence enhancement or uploading, either of which seems like it would destroy huge amounts of value?), and because our notional utility function might see steeply diminishing marginal returns to resources before using the entire future light cone (see this discussion).
I am, or at least was, considering the facts, including what was supplied in the links. I was also assuming for the sake of the argument that the kind of agent that the incompetent AI developers created would recursively improve to one that cooperated without communication with other universes.
Discussing the effects and implications of decisions in counterfactuals is not something that is at all emotional for me. It fascinates me. On the other hand the natural conclusion to counterfactuals (which are inevitably discussing extreme situations) is something that does seem to inspire emotional judgments, which is something that overrides my fascination.
And I want to live in a world that has maximal benefits for the largest average group of stakeholders not just a group of elites like we have now. Unfortunately non-AI based governing systems are run by humans and history shows that fair systems are unstable and eventually become usurped by those who place their own interests ahead of the rest of the population. Will an AI system be better than that? I don’t know. Historically both benevolent dictatorships and republics are reasonable systems but the benevolent dictator eventually dies and republics ALWAYS transform into rule by the elite.
What I want is a long lived benevolent dictator whether it’s a transhuman or an AI but frankly I’d trust a benevolent AI ahead of an allegedly benevolent transhuman. Hell I’m not sure that I’d even trust myself to be a 100% fair benevolent transhuman dictator and I’m pretty reasonable. Power corrupts and all that.
You could have a “hiding superintelligence”. That’s the kind the NSA might have chained up in its basement.
Define ‘seize control.’ Wouldn’t such an AI be motivated to understate it’s effective resources, or create other nominally-independent AIs with identical objectives and less restraint, or otherwise circumvent that factor?
This is a brief summary of what I consider the software based non-fooming scenario.
Terminology
Impact—How much the agent can make the world how it wants (hit a small target in search space, etc) Knowledge = correct programming.
General Outlook/philosophy
Do not assume that an agent knows everything, assume that an agent has to start off with no program to run. Try and figure out where and how it gets the information for the program. And whether it can be mislead by sources of information.
General Suppositions
High impact requires that someone (either the creator or the agent) have a high knowledge of the world in order for the system to be appropriate. And the right knowledge; knowing trillions of digits of pi is generally not as useful as where the oil reserves are when trying to take over the world.
Usefulness of knowledge is not inherently obvious. It is also changeable. The knowledge of how to efficiently get blubber from a whale is less useful now we have oil.
Knowledge can be acquired through social means, derived from prior knowledge and experience or experimentally.
Moving beyond your current useful knowledge requires luck in picking the right things to analyse for statistical correlations.
Knowledge can rely on other knowledge.
Historical Explanations
Evolution can gather knowledge.
Brains can gather knowledge. It is monumentally wasteful if the individual dies and the knowledge is not passed on, as the next generation has to reinvent the wheel each time.
Lots of the increase in the impact of individual that has happened through evolution has been due to passing of knowledge between individuals, not improvements in base algorithms for deriving knowledge from sense data. This is especially true in the case of humans with language. High G/IQ may be based on the ability to get knowledge from other humans, thus being able to have higher impact.
-- Acquiring knowledge from other humans might not always be good as they lie and manipulate. Lots of rubbish out there. Hard to distinguish good from bad (requires knowledge), but still easier than reinventing the wheel. Equivalents of malware detectors needed for bad stuff. The malware detectors might falsely recognise good knowledge as bad programming.
Computational resources are only useful in so much as you have the correct knowledge to make use of them. If you can only find simple useful models you don’t need more resources.
Future
The fastest scenario is that AIs can quickly expand computational resources to enable it to make use of all of human knowledge. After that it is “slow”. —More likely we are going to have problems with AIs not being able to inherently recognise things like post-modernist thinking as dead ends of human thought. It naturally lacks all the knowledge that evolution has given humans as a shared base for understanding/predicting each other, so we might expect it to have lots of problems with this regard.
One of the many reasons that I will win my bet with Eliezer is that it is impossible for an AI to understand itself. If it could, it would be able to predict it’s own actions, and this is a logical contradiction, just as it is for us.
“Not being able, with 100% accuracy, predict own future actions” is nowhere near the same thing as “Not being able to, with at all useful accuracy, predict own future actions.”
I agree, but without 100% accuracy it will not be able to FOOM.
Why? I don’t follow this logic at all.
That is just… trivially false.
And that is the worst reasoning I have encountered in at least a week. Not only is it trying to foist a nonsensical definition ‘understand’, an AI could predict it’s own actions. AND even if it couldn’t it still wouldn’t be a logical contradiction. It’d just be a fact.
An AI could not predict its own actions, because any intelligent agent is quite capable of implementing the algorithm: “Take the predictor’s predicted action. Do the opposite.”
In order to predict itself (with 100% accuracy), it would have to be able to emulate its own programming, and this would cause a never-ending loop. Thus this is impossible.
Ok. And why would your AI decide to do so? You seem to be showing that a sufficiently pathological AI won’t be able to predict its own actions. How this shows that other AIs won’t be able to predict their own actions within some degree of certainty seems off.
This isn’t pathological. For example, it is a logical contradiction for someone to predict my actions in advance (and tell me about it), because my “programming” will lead me to do something else, much like the above algorithm. This is a feature, not a bug. Being able to be predicted is a great weakness. Any well programmed AI will avoid this weakness, just as we do.
Being able to be predicted is absolutely vital for making credible threats and promises. And, along with being able to accurately predict, allows for cooperation with other rational agents.
There appears to be a lot of logic here that is happening implicitly because I’m not following you.
You wrote:
Now, this seems like a very narrow sort of AI that would go and then do something else against what was predicted.
You seem to be using “logical contradiction” in a non-standard fashion. Do you mean it won’t happen given how your mind operates? In that case, permit me to make a few predictions about your actions over the next 48 hours (that you could probably predict also): 1) You will sleep at some point in that time period. 2) You will eat at some point in that time period. I make both of those with probability around .98 each. If we extend to one month I’m willing to make a similar confidence prediction that you will make a phonecall or check your email within in that time. I’m pretty sure you are not going to go out of your way as a result of these predictions to try to go do something else.
You also seem to be missing the point about what an AI would actually need to improve. Say for example that the AI has a subroutine for factoring integers. If it comes up with a better algorithm for factoring integers, it can replace the subroutine with the new one. It doesn’t need to think deeply about how this will alter behavior.
I agree with those predictions. However, my point would become clear if you attempted to translate your probability of 0.98 into a bet with me, with me betting $100 and you betting $5000. I would surely win the bet (with at least a probability of 0.98).
I am willing to bet, at 10,000 to 1 odds, that you will sleep sometime in the next 2 weeks. The pay out on this bet is not transferable to your heirs.
No it wouldn’t because that’s a very different situation. My probability estimate for you not eating food in a 48 hour period if you get paid $5000 when you succeed and must pay $100 if you fail is much lower. If I made the bet with some third party I’d be perfectly willing to do so as long as I had some reassurance that the third party isn’t intending to pay you a large portion of the resulting winnings if you win.
I don’t find predictability a weakness. If someone says to me, “Hey, Alicorn, I predict you’re going to eat that sandwich you’re holding,” I’m going to say, “Yes. You are exactly right. And I’m glad you are! If you were wrong, then I wouldn’t get to eat this delicious sandwich, which I want (that being why I made it and picked it up).”
Did you have some other, less general sort of predictability in mind when you made the claim that it’s a weakness?
It is only universal predictability that is a weakness.
Why? Predicting my actions doesn’t make them actions I don’t want to take. Predicting I’ll eat a sandwich if I want one doesn’t hurt me; and if others can predict that I’ll cooperate on the prisoner’s dilemma iff my opponent will cooperate iff I’ll cooperate, so much the better for all concerned.
Can you give an example of a case where being predictable would hurt someone who goes about choosing actions well in the first place? Note that, as with the PD thing above, actions are dependent on context; if the prediction changes the context, then that will already be factored into an accurate prediction.
Good question. Your intuition is correct as long as your actions are chosen “optimally” in the game-theoretic sense. This is one of the ideas behind Nash equilibria: your opponent can’t gain anything from knowing your strategy and vice versa. A caveat is that the Nash equilibria of many games require “mixed strategies” with unpredictable randomizing, so if the opponent can predict the output of your random device, you’re in trouble.
If you can accurately predict the action of a chess player faster than they can make it, then you have more time to think about your response. There are cases where this can make a difference—even if they happen to play perfectly.
Alicorn, your note about the PD implies that it is universally the case that there is some one action that will benefit you even if others predict it. There is no reason to think that this is the case; and if there is even one instance where doing what others predict you will do is harmful, then being universally predictable is a weakness.
Again, this is not a logical contradiction. You do not have a clear understanding of what the concept entails.It doesn’t mean ‘sometimes impractical’ or ‘often people adapt to avoid it’.
No, this really would be a logical contradiction if the agent being predicted does implement the stated algorithm (and won’t override it when something more important is at stake). It just has nothing to do with self-improvement, for which predicting abstract properties of specific algorithms is what matters; much like Rice’s theorem doesn’t mean we can’t prove that specific programs output pi (e.g.).
No, it is not a logical contradiction. The fact that someone can implement a stupid algorithm does not make the claim “it is a logical contradiction for someone to predict my actions in advance and tell me about it”. Just because someone could implement a stupid algorithm for decision making or a naive algorithm for prediction (don’t know when to shut up) doesn’t mean you can make that general claim. Not even close.
Your argument would probably apply if I were refuting a different but somewhat related assertion.
It does mean you can make a general claim analogous to Rice’s theorem / the undecidability of the halting problem — not that such a claim is incredibly interesting for our purposes.
Point taken; it doesn’t seem like we actually disagree about anything.
The cache of this conversation is buried somewhat in my brain but I think there is something to what you say here.
But an AI with that programming is predictable, and, much worse, manipulable! In order to get it to do anything, you need only inform it that you predicted that it will not do that thing*. It’s just a question of how long it takes people to realize that it has this behavior. It is far weaker than an AI that sometimes behaves as predicted and sometimes does not. Consider e.g. Alicorn’s sandwich example; if we imagine an AI that needed to eat (a silly idea but demonstrates the point), you don’t want it to refuse to do so simply because someone predicted it will (which anyone easily could).
*This raises the question of whether the AI will realize that in fact you are secretly predicting that it will do the opposite. But once you consider that then the AI has to keep track of probabilities of what people’s true (rather than just claimed) predictions are, I think it becomes clear that this is just a silly thing to be implementing in the first place. Especially because even if people didn’t go up to it and say “I bet you’re going to try to keep yourself alive”, they would still be implicitly predicting it by expecting it.
Yes, that as well. Such an AI would, it seems offhand, be playing a perpetual game of Poisoned Chalice Switcheroo to no real end.
I don’t see a logical contradiction here. And we have examples in nature of beings able to understand themselves very well: humans are a good example. People predict their own actions all the time. For example, I predict that after I finish typing this message I am going to hit comment and then get up and refill my glass of orange juice. Moreover, human understanding of ourselves has improved and has allowed us to optimize ourselves. For example, all the cognitive biases which we frequently discuss here are examples of humans understanding our own architecture and improving our processing. We also deliberately improve ourselves by playing games or doing specific mental exercises designed to improve specific mental skills. Soon we will more directly improve our cognitive structures by genetic engineering (we’ve already identified multiple examples of small genetic changes that can make rodents much smarter than they normally are (see this example or this one)). In general, claiming something is a logical contradiction when it occurs in reality is not a great idea.
See my response to wedrifid.
I agree that an AI (or any other intelligence) cannot predict its own choices (since predicting your own choices is the same as actually choosing, so it’s impossible to know what you’re going to do before you know what you’re going to do).
But the type of “understanding itself” needed to self-improve seems to be of a different type, it needs to understand the algorithms that lead to its decisions, but it doesn’t need to be able to copy them in real time.
Perfect self-understanding isn’t possible because a mind can’t completely include itself
Perfect self-prediction isn’t possible because of the chances of new information, new thoughts, unreliable abilities, and sheer perversity.
Nonetheless, pretty good self-understanding is possible, and so is pretty good self-prediction. That’s all that people have, and it’s enough to have led to a considerable increase of ability to accomplish things.