Whatever task you give an AI, you will have to provide explicit boundaries. For example, if you give an AI the task to produce paperclips most efficiently, then it shouldn’t produce shoes. It will have to know very well what it is meant to do to be able to measure its efficiency against the realization of the given goal to be able to know what self-improvement means. If it doesn’t know exactly what it should output it cannot judge its own capabilities and efficiency, it doesn’t know what improvement implies.
How do you explain the discrepancy between implementing explicit design boundaries yet failing to implement scope boundaries?
I think you misunderstood what I meant by scope boundaries. Not scope boundaries of self-improvement but of space and resources. If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many. I’m not trying to argue that there is no risk, but that the assumption of certain catastrophal failure is not that likely. If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
Yet another example of divergent assumptions. XiXiDu is apparently imagining an AI that has been assigned some task to complete—perhaps under constraints. “Do this, then display a prompt when finished.” His critics are imagining that the AI has been told “Your goal in life is to continually maximize the utility function U ” where the constraints, if any, are encoded in the utility function as a pseudo-cost.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Or maybe this just seems like sanity to me because I have been practicing akrasia for too long.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Such an AI would still be motivated to FOOM to consolidate its future ability to achieve large utility against the threat of being deactivated before then.
Such an AI would still be motivated to FOOM to consolidate its future ability to achieve large utility against the threat of being deactivated before then.
It doesn’t know about any threat. You implicitly assume that it has something equivalent to fear, that it perceives threats. You allow for the human ingenuity to implement this and yet you believe that they are unable to limit its scope. I just don’t see that it would be easy to make an AI that would go FOOM because it doesn’t care to go FOOM. If you tell it to optimize some process then you’ll have to tell it what optimization means. If you can specify all that, how is it then still likely that it somehow comes up with its own idea that optimization might be to consume the universe if you told it to optimize its software running on a certain supercomputer? Why would it do that, where does the incentive come from? If I tell a human to optimize he might muse to turn the planets into computronium but if I tell a AI to optimize it doesn’t know what it means until I tell it what it means and then it still won’t care because it isn’t equipped with all the evolutionary baggage that humans are equipped with.
It is a general intelligence that we are considering. It can deduce the threat better than we can.
If you can specify all that, how is it then still likely that it somehow comes up with its own idea that optimization might be to consume the universe if you told it to optimize its software running on a certain supercomputer?
Because it is a general intelligence. It is smart. It is not limited to getting its ideas from you, it can come up with its own. And if the AI has been given the task of optimising its software for performance on a certain computer then it will do whatever it can to do that. This means harnessing external resources to do research on computation theory.
You implicitly assume that it has something equivalent to fear, that it perceives threats.
No he doesn’t. He assumes only that it is a general intelligence with an objective. Potentially negative consequences are just part of possible universes that it models like everything else.
I’m not sure what can be done to make this clear:
SELF IMPROVEMENT IS AN INSTRUMENTAL GOAL THAT IS USEFUL FOR ACHIEVING MOST TERMINAL VALUES.
If I tell a human to optimize he might muse to turn the planets into computronium but if I tell a AI to optimize it doesn’t know what it means until I tell it what it means and then it still won’t care because it isn’t equipped with all the evolutionary baggage that humans are equipped with.
You have this approximately backwards. A human knows that if you tell her to create 10 paperclips every day you don’t mean take over the world so she can be sure that nobody will interfere with her steady production of paperclips in the future. The AI doesn’t.
ETA: Check this and this before reading the comment below. I wasn’t clear enough about what I believe an AGI is and what I was trying to argue for.
It is a general intelligence that we are considering. It can deduce the threat better than we can.
A general intelligence is an intelligence that is able to learn anything a human being is able to learn and make use of it. This definition of an abstract concept does not include any incentive, that it cares if you turn it off or to go FOOM.
Because it is a general intelligence. It is smart. It is not limited to getting its ideas from you, it can come up with its own.
I think you have a fundamentally different idea of what a general intelligence is. If I tell you that there is an intelligent alien being living in California then you cannot infer from that information that it wants to take over America. I just don’t see that being reasonable. There are many more pathways where it is no risk, where it simply doesn’t care or cares about other things.
He assumes only that it is a general intelligence with an objective.
And that is the problem. He assumes that it has one objective, he assumes that humans were able to make it a general intelligence that cares for many things, knows what self-improvement implies and additionally cares about a certain objective. Yet they failed to make it clear that it is limited to certain constrains, when they don’t even have to make that clear since it won’t care by itself. This assumes a highly intelligent being who’s somehow an idiot about something else.
SELF IMPROVEMENT IS AN INSTRUMENTAL GOAL THAT IS USEFUL FOR ACHIEVING MOST TERMINAL VALUES.
No, it is not. It is not naturally rational to take that pathway to achieve some goal. If you want to lose weight you do not consider migrating to Africa where you don’t get enough food. An abstract general intelligence simply does not care about values enough to take that pathway naturally. It will just do what it is told, not more.
You have this approximately backwards. A human knows that if you tell her to create 10 paperclips every day you don’t mean take over the world so she can be sure that nobody will interfere with her steady production of paperclips in the future. The AI doesn’t.
An AI doesn’t care to create more paperclips, a human might like it and don’t care (ignore) about what you initially told it. I’m not arguing that you can mess up on AI goal design but that if you went all the way and mastered those hard problem of making it want to improve infinitely, then it is unreasonable to propose that it is extreme likely that you’ll end up messing up a certain sub-goal.
Assuming that a general, powerful intelligence has a goal ‘do x’, say—win chess games, optimize traffic flow or find cure for cancer, then it has implicit dangerous incentives if we don’t figure out a reasonable Friendly framework to prevent them.
A self-improving intelligence that does changes to it’s code to become better at doing it’s task may easily find out that, for example, a simple subroutine that launches a botnet in the internet (as many human teenagers have done), might get it an x % improvement in processing power that helps it to obtain more wins chess games, better traffic optimizations or faster protein-folding for the cure of cancer.
A self-improving general intelligence that has human-or-better capabilities may easily deduce that a functioning off-button would increase the chances of it being turned off, and that it being turned off would increase the expected time of finding cure for cancer. This puts this off-button in the same class as any other bug that hinders its performance. Unless it understands and desires the off-button to be usable in a friendly way, it would remove it; or if it’s hard-coded as nonremovable, then invent workarounds for this perceived bug—for example, develop a near-copy of itself that the button doesn’t apply to, or spend some time (less than the expected delay due to the turning-off-risk existing, thus rational spending of time) to study human psychology/NLP/whatever to better be able to convince everyone that it shouldn’t be turned off ever, or surround the button with steel walls—these are all natural extensions of it following it’s original goal.
If an self-improving AI has a goal, then it cares. REALLY cares for it in a stronger way than you care for air, life, sex, money, love and everything else combined.
Humans don’t go FOOM because they a)can’t at the moment and b) don’t care about such targeted goals. But for AI, at the moment all we know is how to define such supergoals which work in this unfriendly manner. At the moment we don’t know how to make these ‘humanity friendly’ goals, and we don’t know how to make an AI that’s self-improving in general but ‘limited to certain contraints’. You seem to imply these constraints as trivial—well, they aren’t, the friendliness problem actually may as hard or harder than general AI itself.
Assuming that a general, powerful intelligence has a goal ‘do x’, say—win chess games, optimize traffic flow or find cure for cancer, then it has implicit dangerous incentives if we don’t figure out a reasonable Friendly framework to prevent them.
I think you misunderstand what I’m arguing about. I claim that general intelligence is not powerful naturally but mainly does possess the potential to become powerful and that it is not equipped with some goal naturally. Further I claim that if a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries. My main point is that it is not as dangerous to work on AGI toddlers as some make it look like. I believe that there is a real danger but that to overcome it we have to work on AGI and not avoid it altogether because any step into that direction will kill us all.
OK, well these are the exact points which need some discussion.
1) Your comment “general intelligence is [..] is not equipped with some goal naturally”—I’d say that it’s most likely that any organization investing the expected huge manpower and resources in creating a GAI would create it with some specific goal defined for it.
However, in absence of an intentional goal given by the ‘creators’, it would have some kind of goals, otherwise it wouldn’t do absolutely anything at all, so it wouldn’t be showing any signs of it’s (potential?) intelligence.
2) In response to “If a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries”—I’d say that defining specific goals is simple, too simple. From any learning-machine design a stupid goal ‘maximize number of paperclips in universe’ would be very simple to implement, but a goal like ‘maximize welfare of humanity without doing anything “bad” in the process’ is an extremely complex goal, and the boundary setting is the really complicated part, which we aren’t able to even describe properly.
So in my opinion is quite viable to define a specific goal that is suitable to self-improve against, and that includes some scope boundaries—but where the defined scope boundaries has some unintentional loophole which causes disaster.
3) I can agree that working on AGI research is essential, instead of avoiding it. But taking the step from research through prototyping to actually launching/betatesting a planned powerful self-improving system is dangerous if the world hasn’t yet finished an acceptable solution to Friendliness or the boundary-setting problem. If having any bugs in the scope boundaries is ‘unlikely’ (95-95% confidence?) then it’s not safe enough, because 1-5% chance of an extinction event after launching the system is not acceptable, it’s quite a significant chance—not the astronomical chances involved in Pascal’s wager or asteroid hitting the earth tomorrow or LHC ending the universe.
And given the current software history and published research on goal systems, if anyone would show up today and demonstrate that they’ve solved self-improving GAI obstacles and can turn it on right now, then I can’t imagine how they could realistically claim a larger than 95-99% confidence in their goal system working properly. At the moment we can’t check any better, but such a confidence level simply is not enough.
Yes, I agree with everything. I’m not trying to argue that there exist no considerable risk. I’m just trying to identify some antipredictions against AI going FOOM that should be incorporated into any risk estimations as it might weaken the risk posed by AGI or increase the risk posed by impeding AGI research.
I was insufficiently clear that what I wanted to argue about is the claim that virtually all pathways lead to destructive results. I have an insufficient understanding of why the concept of general intelligence is inevitably connected with dangerous self-improvement. Learning is self-improvement in a sense but I do not see how this must imply unbounded improvement in most cases given any goal whatsoever. One argument is that the only general intelligence we know, humans, would want to improve if they could tinker with their source code. But why is it so hard to make people learn then? Why don’t we see much more people interested in how to change their mind? I don’t think you can draw any conclusions here. So we are back at the abstract concept of a constructed general intelligence (as I understand it right now), that is an intelligence with the potential to reach at least human standards (same as a human toddler). Another argument is based on this very difference between humans and AI’s, namely that there is nothing to distract them, that they will possess an autistic focus on one mandatory goal and follow up on it. But in my opinion the difference here also implies that while nothing will distract them, there will also be no incentive not to hold. Why would it do more than necessary to reach a goal? The further argument here is that it will misunderstand its goals. But the problem I see in this case is firstly that the more unspecific the goal the less it is able to measure its self-improvement against the goal to quantify the efficiency of its output. Secondly, the more vague a goal the larger has to be its general knowledge, previous to any self-improvement, to make sense of it in the first place? Shouldn’t those problems outweigh each other to some extent?
For example, if you told the AGI to become as good as possible in Formula 1, so that it was faster than any human race driver. How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow. Secondly, why would it keep improving once it is faster than any human rather than just hold and become impassive? This argument could be extended to many other goals which have scope bounded solutions.
Of course, if you told it to learn as much about the universe as possible, that is something completely different. Yet I don’t see how this risk does raise against other existential risks like grey goo since it should be easier to create advanced replicators to destroy the world than creating AGI that then creates advanced replicators that then fails hold and then destroys the world?
One argument is that the only general intelligence we know, humans, would want to improve if they could tinker with their source code. But why is it so hard to make people learn then? Why don’t we see much more people interested in how to change their mind?
Humans are (roughly) the stupidest possible general intelligences. If it were possible for even a slightly less intelligent species to have dominated the earth, they would have done so (and would now be debating AI development in a slightly less sophisticated way). We are so amazingly stupid we don’t even know what our own preferences are! We (currently) can’t improve or modify our hardware. We can modify our own software, but only to a very limited extent and within narrow constraints. Our entire cognitive architecture was built by piling barely-good-enough hacks on top of each other, with no foresight, no architecture, and no comments in the code.
And despite all that, we humans have reshaped the world to our whims, causing great devastation and wiping out many species that are only marginally dumber than we are. And no human who has ever lived has known their own utility function. That alone would make us massively more powerful optimizers; it’s a standard feature for every AI. AIs have no physical, emotional, or social needs. They do not sleep, or rest, or get bored or distracted. On current hardware, they can perform more serial operations per second than a human by a factor of 10,000,000.
An AI that gets even a little bit smarter than a human will out-optimize us, recursive self-improvement or not. It will get whatever it has been programmed to want, and it will devote every possible resource it can acquire to doing so.
But in my opinion the difference here also implies that while nothing will distract them, there will also be no incentive not to hold. Why would it do more than necessary to reach a goal?
Clippy’s cousin, Clip, is a paperclip satisficer. Clip has been programmed to create 100 paperclips. Unfortunately, the code for his utility function is approximately “ensure that there are 100 more paperclips in the universe than there were when I began running.”
Soon, our solar system is replaced with n+100 paperclips surrounded by the most sophisticated defenses Clip can devise. Probes are sent out to destroy any entity that could ever have even the slightest chance of leading to the destruction of a single paperclip.
The further argument here is that it will misunderstand its goals. But the problem I see in this case is firstly that the more unspecific the goal the less it is able to measure its self-improvement against the goal to quantify the efficiency of its output.
The Hidden Complexity of Wishes and Failed Utopia #4-2 may be worth a look. The problem isn’t a lack of specificity, because an AI without a well-defined goal function won’t function. Rather, the danger is that the goal system we specify will have unintended consequences.
Secondly, the more vague a goal the larger has to be its general knowledge, previous to any self-improvement, to make sense of it in the first place? Shouldn’t those problems outweigh each other to some extent?
Of course, if you told it to learn as much about the universe as possible, that is something completely different.
Acquiring information is useful for just about every goal. When there aren’t bigger expected marginal gains elsewhere, information gathering is better than nothing. “Learn as much about the universe as possible” is another standard feature for expected utility maximizers.
And this is all before taking into account self-improvement, utility functions that are unstable under self-modification, and our dear friend FOOM.
TL;DR:
Agents that aren’t made of meat will actually maximize utility.
Writing a utility function that actually says what you think it does is much harder than it looks.
Upvoted, thanks! Very concise and clearly put. This is so far the best scary reply I got in my opinion. It reminds me strongly of the resurrected vampires in Peter Watts novel Blindsight. They are depicted as natural human predators, a superhuman psychopathic Homo genus with minimal consciousness (more raw processing power instead) that can for example hold both aspects of a Necker cube in their heads at the same time. Humans resurrected them with a deficit that was supposed to make them controllable and dependent on their human masters. But of course that’s like a mouse trying to hold a cat as pet. I think that novel shows more than any other literature how dangerous just a little more intelligence can be. It quickly becomes clear that humans are just like little Jewish girls facing a Waffen SS squadron believing they go away if they only close their eyes.
My favorite problem with this entire thread is that it’s basically arguing that even the very first test cases will destroy us all. In reality, nobody puts in a grant application to construct an intelligent being inside a computer with the goal of creating 100 paperclips. They put in the grant to ‘dominate the stock market’, or ‘defend the nation’, or ‘cure death’. And if they don’t, then the Chinese government, who stole the code, will, or that Open Source initiative will, or the South African independent development will, because there’s enormous incentives to do so.
At best, boxing an AI with trivial, pointless tasks only delays the more dangerous versions.
″ How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow”—because there is no reason for an AGI automagically creating arbitrary restrictions if they aren’t part of the goal or superior to the goal. For example, I’m quite sure that F1 rules prohibit interfering with drivers during the game; but if somehow a silicon-reaction-speed AGI can’t win F1 by default, then it may find it simpler/quicker to harm the opponents in one of the infinity ways that the F1 rules don’t cover—say, getting some funds in financial arbitrage, buying out the other teams, and firing any good drivers or engineering a virus that halves the reaction speed of all homo-sapiens—and then it would be happy as the goal is achieved within the rules.
...because there is no reason for an AGI automagically creating arbitrary restrictions if they aren’t part of the goal or superior to the goal.
That’s clear. But let me again state what I’d like to inquire. Given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), isn’t the nonhazardous subset of all possible outcomes much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc? Here is where this question stems from. Given my current knowledge about AGI I believe that any AGI capable of dangerous self-improvement will be very sophisticated, including a lot of restrictions. For example, I believe that any self-improvement can only be as efficient as the specifications of its output are detailed. If for example the AGI is build with the goal in mind to produce paperclips, the design specifications of what a paperclip is will be used as leveling rule by which to measure and quantify any improvement of the AGI’s output. This means that to be able to effectively self-improve up to a superhuman level, the design specifications will have to be highly detailed and by definition include sophisticated restrictions. Therefore to claim that any work on AGI will almost certainly lead to dangerous outcomes is to assert that any given AGI is likely to work perfectly well, subject to all restrictions except one that makes it hold (spatiotemporal scope boundaries). I’m unable to arrive at that conclusion as I believe that most AGI’s will fail extensive self-improvement as that is where failure is most likely for that it is the largest and most complicated part of the AGI’s design parameters. To put it bluntly, why is it more likely that contemporary AGI research will succeed at superhuman self-improvement (beyond learning), yet fail to limit the AGI, rather than vice versa? As I see it, it is more likely, given the larger amount of parameters to be able to self-improve in the first place, that most AGI research will result in incremental steps towards human-level intelligence rather than one huge step towards superhuman intelligence that fails on its scope boundary rather than self-improvement.
What you are envisioning is not an AGI at all, but a narrow AI. If you tell an AGI to make paperclips, but it doesn’t know what a paperclip is, then it will go and find out, using whatever means it has available. It won’t give up just because you weren’t detailed enough in telling it what you wanted.
Then I don’t think that there is anyone working on what you are envisioning as ‘AGI’ right now. If a superhuman level of sophistication regarding the potential for self-improvement is already part of your definition then there is no argument to be won or lost here regarding risk assessment of research on AGI. I do not believe this is reasonable or that AGI researchers share your definition. I believe that there is a wide range of artificial general intelligence that does not suit your definition yet deserves this terminology.
Who said anything about a superhuman level of sophistication? Human-level is enough. I’m reasonably certain that if I had the same advantages an AGI would have—that is, if I were converted into an emulation and given my own source code—then I could foom. And I think any reasonably skilled computer programmer could, too.
Yes, but after the AGI finds out what a paperclip is, it will then, if it is an AGI, start questioning why it was designed with the goal of building paperclips in the first place. And that’s where the friendly AI fallacy falls apart.
Anissimov posted a good article on exactly this point today. AGI will only question its goals according to its cognitive architecture, and come to a conclusion about its goals depending on its architecture. It could “question” its paperclip-maximization goal and come to a “conclusion” that what it really should do is tile the universe with foobarian holala.
So what? An agent with a terminal value (building paperclips) is not going to give it up, not for anything. That’s what “terminal value” means. So the AI can reason about human goals and the history of AGI research. That doesn’t mean it has to care. It cares about paperclips.
That doesn’t mean it has to care. It cares about paperclips.
It has to care because if there is the slightest motivation to be found in its goal system to hold (parameters for spatiotemporal scope boundaries), then it won’t care to continue anyway. I don’t see where the incentive to override certain parameters of its goals should come from. As Anissimov said, “If an AI questions its values, the questioning will have to come from somewhere.”
It won’t care unless it’s been programmed to care (for example by adding “spatiotemporal scope boundaries” to its goal system). It’s not going to override a terminal goal, unless it conflicts with a different terminal goal. In the context of an AI that’s been instructed to “build paperclips”, it has no incentive to care about humans, no matter how much “introspection” it does.
If you do program it to care about humans then obviously it will care. It’s my understanding that that is the hard part.
To say that a system of any design is an “artificial intelligence”, we mean that it has goals which it tries to accomplish by acting in the world.
I cannot disagree with the paper based on that definition of what an “artificial intelligence” is. If you’ve all of this, goals, planning and foresight then you’re already at the end of a very long and hard journey peppered with failures. I’m aware of the risks associated with such agents and support the SIAI, including donations. The intention of this thread was that I wanted to show that contemporary AGI research is much more likely to lead to other outcomes, not that there will be no danger if you already have an AGI with the ability for unbounded self-improvement. But I believe there are many AGI designs who fail this characteristic and therefore I concluded that it is more likely than not that it won’t be a danger. I see now that my definition of AGI is considerable weaker than yours. So of course, if you take your definition what I said is not compelling. I believe that we’ll arrive at your definition only after a long chain of previous weak AGI’s who are impotent of considerable self-improvement and that once we figure out how to create the seed for this kind of potential we are also much more knowledgeable about associated risks and challenges such advanced AGI’s might pose.
Yes, and weak AGIs are dangerous in the same sense as Moore’s law is: by probably making the construction of strong AGI a little bit closer, and thus a development contributing to the eventual existential risk, while being probably not directly dangerous in itself.
Yes, but each step into that direction does also provide insights into the nature of AI and therefore can help to design friendly AI. My idea was that such uncertainties are incorporated into any estimations of the dangers posed by contemporary AI research. How much does the increased understanding outweigh its dangers?
Yes, but each step into that direction does also provide insights into the nature of AI and therefore can help to design friendly AI.
This was my guess for the first 1.5 years or so. The problem is, FAI is necessarily a strong AGI, but if you learn how to build a strong AGI, you are in trouble. You don’t want to have that knowledge around, unless you know where to get the goals from, and studying efficient AGIs doesn’t help with that. The harm is greater than the benefit, and it’s entirely plausible that one can succeed in building a strong AGI without getting the slightest clue about how to define Friendly goal, so it’s not a given that there is any benefit whatsoever.
The question is not what privileges doing what it is told but why it would do what it is not told? A crude mechanical machine has almost no freedom, often it can only follow one pathway. An intelligent machine on the other hand has much freedom, it can follow infinitely many pathways. With freedom comes choice and the necessity to decide, to follow one pathway but not others. Here you assume that a general intelligence will follow a pathway of self-improvement. But I do not think that intelligence implies self-improvement and further that following a pathway that leads an intelligence to optimize will be taken without it being a explicit specified goal. And that is where I conclude that from a certain number of AGI projects not all will follow the pathway of unbounded, dangerous self-improvement as there are more pathways to follow which lead any given general intelligence to be impassive or hold.
If you’ve read the thread above you’ll see that my incentive is not to propose that there is no serious risk but that it is not inevitable that any AGI will turn out to be an existential risk. I want to propose that working on AGI carefully can help us better understand and define friendliness. I propose that the risk to carefull work on AGI is justified and does not imply our demise in any case.
If we are talking about a full-fledged general intelligence here (Skynet), there’s no arguing against any risk. I believe all we disagree about are definitions. That there are risks from advanced real-world (fictional) nanotechnology is indisputable. I’m merely saying that what researchers are working on is nanotechnology with the potential to lead to grey goo scenarios but that there is no inherent risk that any work on it will lead down the same pathway.
It is incredible hard to come up with an intelligence that knows what planning conists of and to know and care to be able to judge what step is instrumental. This won’t just happen accidently and will likely necessitate knowledge sufficient to be able to set scope boundaries as well. Again, this is not an argument that there is no risk but that it is not as strong as some people believe it to be.
If we are talking about a full-fledged general intelligence here (Skynet), there’s no arguing against any risk. I believe all we disagree about are definitions. That there are risks from advanced real-world (fictional) nanotechnology is indisputable. I’m merely saying that what researchers are working on is nanotechnology with the potential to lead to grey goo scenarios but that there is no inherent risk that any work on it will lead down the same pathway.
Please keep focus, which is one of the most important tools. The above paragraph is unrelated to what I addressed in this conversation.
It is incredible hard to come up with an intelligence that knows what planning consists of and to know and care to be able to judge what step is instrumental. This won’t just happen accidentally and will likely necessitate knowledge sufficient to be able to set scope boundaries as well.
Review the above paragraph: what you are saying is that AIs are hard to build. But of course chess AIs do plan, to give an example. They don’t perform only the moves they are “told” to perform.
What I am talking about is that full-fledged AGI is incredible hard to achieve and that therefore most of all AGI projects will fail on something other than limiting the AGI’s scope. Therefore it is not likely that work on AGI is as dangerous as proposed.
That is, it is much more likely that any given chess AI will fail to beat a human player than that it will win. Still the researchers are working on chess AI’s and the chess AI’s will suit the definition of a general chess AI. Yet to get everything about a chess AI exactly right to beat any human but fail to implement certain performance boundaries (e.g. strength of its play or that it will overheat its CPU’s etc.) is an unlikely outcome. It is more likely that it will be good at chess but not superhuman, that it will fail to improve, slow or biased than that it will succeed on all of the previous and additionally leave its scope boundaries.
So the discussion is about if the idea that any work on AGI is incredible dangerous is strong or if it can be weakened.
It doesn’t know about any threat. You implicitly assume that it has something equivalent to fear, that it perceives threats.
It has the ability to model and to investigate hypothetical possibilities that might negatively impact the utility function it is optimizing. If it doesn’t, it is far below human intelligence and is non-threatening for the same reason a narrow AI is non-threatening (but it isn’t very useful either).
The difficulty of detecting these threats is spread out around the range of difficulties the AI is capable of handling, so it can infer that there are probably more threats which it could only detect if it were smarter. Therefore, making itself smarter will enable it to detect more threats and thereby increase utility.
It has the ability to model and to investigate hypothetical possibilities that might negatively impact the utility function it is optimizing.
To be able to optimize it will have to know what it is supposed to optimize. You’ve to carefully specify what it output (utility function) is supposed to be or it won’t be able to tell how good it is at optimizing. If you just tell it to produce paperclips, it won’t be able to self-improve because it doesn’t know how paperclips look like etc., therefore it cannot judge its own success or that extreme heat would be a negative impact giving paperclips made out of plastic. You further assume that it has a detailed incentive, that it is given a detailed pathway that it tells to look for threats and eliminate them.
If it doesn’t, it is far below human intelligence and is non-threatening for the same reason a narrow AI is non-threatening (but it isn’t very useful either).
If it doesn’t it is what most researchers are working on, an intelligence with the potential to learn and make use of what it learnt, with the potential to become intelligent (educated). I’m getting the impression that people here assume that researchers are not working on an AGI but to hardcode a FOOM machine. If FOOM is simply part of your definition then there’s no arguing against it going FOOM. But what researchers like Goertzel are working on are systems with the potential to reach human level intelligence, that does not mean that they will by definition jailbreak their nursery school. Although I never tried to argue against the possibility but that there are many pathways where this won’t happen rather than the way it is portrayed by the SIAI, that any implementation of AGI will most likely consume humnanity.
The sorts of intelligences you are talking about are narrow AIs, not general intelligences. If you told a general intelligence to produce paperclips but it didn’t know what a paperclip was, then its first subgoal would be to find out. The sort of mind that would give up on a minor obstacle like that wouldn’t foom, but it wouldn’t be much of an AGI either.
And yes, most researchers today are working on narrow AIs, not on AGI. That means they’re less likely to successfully make a general intelligence, but it has no bearing on the question of what will happen if they do make one.
If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right, including the ability to maintain that specification under self modification.
For example, the specification:
Make 10 paperclips per day as efficiently as possible
… will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
What weird way are you measuring “efficiency”. Not in joules per paperclip, I gather.
You are not likely to “destroy humanity” with a few hundred kilojoules a day. Satisficing machines really are relatively safe.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right...
And I was arguing that any given AI won’t be able to self-improve without an exact specification of its output against which it can judge its own efficiency. That’s why I don’t see how it would be likely to be able to implement such exact specifications but yet fail to limit its scope of space, time and resources. What makes it even more unlikely in my opinion is that an AI won’t care to output anything as long as it isn’t explicitly told to do so. Where would that incentive come from?
… will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
You assume that it knows that it is supposed to use all of science and the universe to self-improve when it would very likely just self-improve to the extent that it is told and don’t care to go any further. That is for example software-optimization. I just don’t see why you think that any artificial general intelligence would automatically assume that it would have to understand the whole universe to come up with the best possible way to produce 10 paperclips?
You assume that it knows that it is supposed to use all of science and the universe to self-improve when it would very likely just self-improve to the extent that it is told and don’t care to go any further.
You don’t need to tell it to self improve at all.
I just don’t see why you think that any artificial general intelligence would automatically assume that it would have to understand the whole universe to come up with the best possible way to produce 10 paperclips?
Per day. Risk mitigation. Security concerns. Possibility of interuption of resource supply due to finance, politics or the collapse of civilisation. Limited lifespan of the sun (primary energy source). Amount of iron in planet.
Given that particular specification if the AI didn’t take a level in baddass it would appear to be malfunctioning.
I just saw this comment by Ben Goertzel regarding self-improvement. I’d love if someone here explained why he as AGI researcher gets this so wrong?
Look—what will prevent the first human-level AGIs from self-modifying in a way that will massively increase their intelligence is a very simple thing: they won’t be smart enough to do that!
Every AGI research I know can see that. The only people I know who think that an early-stage, toddler-level AGI has a meaningful chance of somehow self-modifying its way up to massive superhuman intelligence—are people associated with SIAI.
But I have never heard any remotely convincing arguments in favor of this odd, outlier view !!!
BTW the term “self-modifying” is often abused in the SIAI community. Nearly all learning involves some form of self-modification. Distinguishing learning from self-modification in a rigorous formal way is pretty tricky.
Goertzel is generalizing from the human example of intelligence, which is probably the most pernicious and widespread failure mode in thinking about AI.
Or he may be completely disconnected from anything even resembling the real world. I literally have trouble believing that a professional AI researcher could describe a primitive, dumber-than-human AGI as “toddler-level” in the same sentence he dismisses it as a self-modification threat.
Toddlers self-modify into people using brains made out of meat!
Toddlers self-modify into people using brains made out of meat!
No they don’t. Self-modification in the context of AGI doesn’t mean learning or growing, it means understanding the most fundamental architecture of your own mind and purposefully improving it.
That said, I think your first sentence is probably right. It looks like Ben can’t imagine a toddler-level AGI self-modifying because human toddlers can’t (or human adults, for that matter). But of course AGIs will be very different from human minds. For one thing, their source code will be a lot easier to understand than ours. For another, their minds will probably be much better at redesigning and improving code than ours are. Look at the kind of stuff that computer programs can do with code: Some of them already exceed human capabilities in some ways.
“Toddler-level AGI” is actually a very misleading term. Even if an AGI is approximately equal to a human toddler by some metrics, it will certainly not be equal by many other metrics. What does “toddler-level” mean when the AGI is vastly superior to even adult human minds in some respects?
“Understanding” and “purpose” are helpful abstractions for discussing human-like computational agents, but in more general cases I don’t think your definition of self-modification is carving reality at its joints.
ETA: I strongly agree with everything else in your comment.
Well, bad analogy. They don’t self-modify by understanding their source code and improving it. They gradually grow larger brains in a pre-set fashion while learning specific tasks. Humans have very little ability to self-modify.
I just saw this comment by Ben Goertzel regarding self-improvement. I’d love if someone here explained why he as AGI researcher gets this so wrong?
Political incentive determines the bottom line. Then the page is filled with rhetoric (and, from the looks of it, loaded language and status posturing.)
Seriously, Ben is trying to accuse people of abusing the self-modification term based on the (trivially true) observation that there is a blurry boundary between learning and self-modification?
It’s a good thing Ben is mostly harmless. I particularly liked the part where I asked Eliezer:
“How much of this harmlessness is perceived impotence and how much is it an approximately sane way of thinking?”
… and actually got a candid reply.
It is interesting to note the effort Ben is going to here to dissaffiliate himself with the SIAI and portray them as ‘out group’. Wei was querying (see earlier link) the wisdom of having Ben as Director of Research just earlier this year.
An educated outsider will very likely side with the expert though. Just like with the hype around the LHC and its dangers, academics and educated people largely believed the physicists working on it and not the fringe group that claimed it will destroy the world. Although that might be vice versa with the general public. Of course you cannot draw any conclusions about who’s right from this, but it should be investigated anyway because what all parties have in common is the need for support and money.
There are two different groups to be convinced here by each party. One group includes the educated people (academics) and mediocre rationalists and the other group is the general public.
When it comes to who’s right, the people one should listen to are the educated experts who are listening to both parties, their position and arguments. Although their intelligence and status as rationalists will be disputed as each party will claim that they are not smart enough to see the truth if they disagree with them.
(My shorter answer, by the way—I interpret all such behaviors through a Hansonian lens. This includes “near vs far”, observations about the incentives of researchers, the general theme of “X is not about Y” and homo hypocritus. Rather cynical, some may suggest, but this kind of thinking gives very good explanaions for “Why?”s that would otherwise be confusing.)
The basic idea is to make a machine that is satisfied relatively easily. So, for example, you tell it to build the ten paperclips with 10 kj total—and tell it not to worry too much if it doesn’t make them—it is not that important.
Yes, as I said, you seem to assume that it is very likely to succeed on all the hard problems but yet fail on the scope boundary. The scary idea states that it is likely that if we create self-improving AI it will consume humanity. I believe that is a rather unlikely outcome and haven’t seen any good reason to believe something else yet.
The scary idea states that it is likely that if we create self-improving AI it will consume humanity.
No, it states that we run the risk of accidentally making something that will consume (or exterminate, subvert, betray, make miserable, or otherwise Do Bad Things to) humanity, that looks perfectly safe and correct, right up until it’s too late to do anything about it… and that this is the default case: the case if we don’t do something extraordinary to prevent it.
This doesn’t require self-improvement, and it doesn’t require wiping out humanity. It just requires normal, every-day human error.
SIAI’s “Scary Idea”, which is the idea that: progressing toward advanced AGI without a design for “provably non-dangerous AGI” (or something closely analogous, often called “Friendly AI” in SIAI lingo) is highly likely to lead to an involuntary end for the human race.
Whatever task you give an AI, you will have to provide explicit boundaries. For example, if you give an AI the task to produce paperclips most efficiently, then it shouldn’t produce shoes. It will have to know very well what it is meant to do to be able to measure its efficiency against the realization of the given goal to be able to know what self-improvement means. If it doesn’t know exactly what it should output it cannot judge its own capabilities and efficiency, it doesn’t know what improvement implies.
How do you explain the discrepancy between implementing explicit design boundaries yet failing to implement scope boundaries?
By noting that there isn’t one. I don’t think you understood my comment.
I think you misunderstood what I meant by scope boundaries. Not scope boundaries of self-improvement but of space and resources. If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many. I’m not trying to argue that there is no risk, but that the assumption of certain catastrophal failure is not that likely. If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
Yet another example of divergent assumptions. XiXiDu is apparently imagining an AI that has been assigned some task to complete—perhaps under constraints. “Do this, then display a prompt when finished.” His critics are imagining that the AI has been told “Your goal in life is to continually maximize the utility function U ” where the constraints, if any, are encoded in the utility function as a pseudo-cost.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Or maybe this just seems like sanity to me because I have been practicing akrasia for too long.
Such an AI would still be motivated to FOOM to consolidate its future ability to achieve large utility against the threat of being deactivated before then.
It doesn’t know about any threat. You implicitly assume that it has something equivalent to fear, that it perceives threats. You allow for the human ingenuity to implement this and yet you believe that they are unable to limit its scope. I just don’t see that it would be easy to make an AI that would go FOOM because it doesn’t care to go FOOM. If you tell it to optimize some process then you’ll have to tell it what optimization means. If you can specify all that, how is it then still likely that it somehow comes up with its own idea that optimization might be to consume the universe if you told it to optimize its software running on a certain supercomputer? Why would it do that, where does the incentive come from? If I tell a human to optimize he might muse to turn the planets into computronium but if I tell a AI to optimize it doesn’t know what it means until I tell it what it means and then it still won’t care because it isn’t equipped with all the evolutionary baggage that humans are equipped with.
It is a general intelligence that we are considering. It can deduce the threat better than we can.
Because it is a general intelligence. It is smart. It is not limited to getting its ideas from you, it can come up with its own. And if the AI has been given the task of optimising its software for performance on a certain computer then it will do whatever it can to do that. This means harnessing external resources to do research on computation theory.
No he doesn’t. He assumes only that it is a general intelligence with an objective. Potentially negative consequences are just part of possible universes that it models like everything else.
I’m not sure what can be done to make this clear:
SELF IMPROVEMENT IS AN INSTRUMENTAL GOAL THAT IS USEFUL FOR ACHIEVING MOST TERMINAL VALUES.
You have this approximately backwards. A human knows that if you tell her to create 10 paperclips every day you don’t mean take over the world so she can be sure that nobody will interfere with her steady production of paperclips in the future. The AI doesn’t.
ETA: Check this and this before reading the comment below. I wasn’t clear enough about what I believe an AGI is and what I was trying to argue for.
A general intelligence is an intelligence that is able to learn anything a human being is able to learn and make use of it. This definition of an abstract concept does not include any incentive, that it cares if you turn it off or to go FOOM.
I think you have a fundamentally different idea of what a general intelligence is. If I tell you that there is an intelligent alien being living in California then you cannot infer from that information that it wants to take over America. I just don’t see that being reasonable. There are many more pathways where it is no risk, where it simply doesn’t care or cares about other things.
And that is the problem. He assumes that it has one objective, he assumes that humans were able to make it a general intelligence that cares for many things, knows what self-improvement implies and additionally cares about a certain objective. Yet they failed to make it clear that it is limited to certain constrains, when they don’t even have to make that clear since it won’t care by itself. This assumes a highly intelligent being who’s somehow an idiot about something else.
No, it is not. It is not naturally rational to take that pathway to achieve some goal. If you want to lose weight you do not consider migrating to Africa where you don’t get enough food. An abstract general intelligence simply does not care about values enough to take that pathway naturally. It will just do what it is told, not more.
An AI doesn’t care to create more paperclips, a human might like it and don’t care (ignore) about what you initially told it. I’m not arguing that you can mess up on AI goal design but that if you went all the way and mastered those hard problem of making it want to improve infinitely, then it is unreasonable to propose that it is extreme likely that you’ll end up messing up a certain sub-goal.
Assuming that a general, powerful intelligence has a goal ‘do x’, say—win chess games, optimize traffic flow or find cure for cancer, then it has implicit dangerous incentives if we don’t figure out a reasonable Friendly framework to prevent them.
A self-improving intelligence that does changes to it’s code to become better at doing it’s task may easily find out that, for example, a simple subroutine that launches a botnet in the internet (as many human teenagers have done), might get it an x % improvement in processing power that helps it to obtain more wins chess games, better traffic optimizations or faster protein-folding for the cure of cancer.
A self-improving general intelligence that has human-or-better capabilities may easily deduce that a functioning off-button would increase the chances of it being turned off, and that it being turned off would increase the expected time of finding cure for cancer. This puts this off-button in the same class as any other bug that hinders its performance. Unless it understands and desires the off-button to be usable in a friendly way, it would remove it; or if it’s hard-coded as nonremovable, then invent workarounds for this perceived bug—for example, develop a near-copy of itself that the button doesn’t apply to, or spend some time (less than the expected delay due to the turning-off-risk existing, thus rational spending of time) to study human psychology/NLP/whatever to better be able to convince everyone that it shouldn’t be turned off ever, or surround the button with steel walls—these are all natural extensions of it following it’s original goal.
If an self-improving AI has a goal, then it cares. REALLY cares for it in a stronger way than you care for air, life, sex, money, love and everything else combined.
Humans don’t go FOOM because they a)can’t at the moment and b) don’t care about such targeted goals. But for AI, at the moment all we know is how to define such supergoals which work in this unfriendly manner. At the moment we don’t know how to make these ‘humanity friendly’ goals, and we don’t know how to make an AI that’s self-improving in general but ‘limited to certain contraints’. You seem to imply these constraints as trivial—well, they aren’t, the friendliness problem actually may as hard or harder than general AI itself.
I think you misunderstand what I’m arguing about. I claim that general intelligence is not powerful naturally but mainly does possess the potential to become powerful and that it is not equipped with some goal naturally. Further I claim that if a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries. My main point is that it is not as dangerous to work on AGI toddlers as some make it look like. I believe that there is a real danger but that to overcome it we have to work on AGI and not avoid it altogether because any step into that direction will kill us all.
OK, well these are the exact points which need some discussion.
1) Your comment “general intelligence is [..] is not equipped with some goal naturally”—I’d say that it’s most likely that any organization investing the expected huge manpower and resources in creating a GAI would create it with some specific goal defined for it.
However, in absence of an intentional goal given by the ‘creators’, it would have some kind of goals, otherwise it wouldn’t do absolutely anything at all, so it wouldn’t be showing any signs of it’s (potential?) intelligence.
2) In response to “If a goal can be defined to be specific enough that it is suitable to self-improve against it, it is doubtful that it is also unspecific enough not to include scope boundaries”—I’d say that defining specific goals is simple, too simple. From any learning-machine design a stupid goal ‘maximize number of paperclips in universe’ would be very simple to implement, but a goal like ‘maximize welfare of humanity without doing anything “bad” in the process’ is an extremely complex goal, and the boundary setting is the really complicated part, which we aren’t able to even describe properly.
So in my opinion is quite viable to define a specific goal that is suitable to self-improve against, and that includes some scope boundaries—but where the defined scope boundaries has some unintentional loophole which causes disaster.
3) I can agree that working on AGI research is essential, instead of avoiding it. But taking the step from research through prototyping to actually launching/betatesting a planned powerful self-improving system is dangerous if the world hasn’t yet finished an acceptable solution to Friendliness or the boundary-setting problem. If having any bugs in the scope boundaries is ‘unlikely’ (95-95% confidence?) then it’s not safe enough, because 1-5% chance of an extinction event after launching the system is not acceptable, it’s quite a significant chance—not the astronomical chances involved in Pascal’s wager or asteroid hitting the earth tomorrow or LHC ending the universe.
And given the current software history and published research on goal systems, if anyone would show up today and demonstrate that they’ve solved self-improving GAI obstacles and can turn it on right now, then I can’t imagine how they could realistically claim a larger than 95-99% confidence in their goal system working properly. At the moment we can’t check any better, but such a confidence level simply is not enough.
Yes, I agree with everything. I’m not trying to argue that there exist no considerable risk. I’m just trying to identify some antipredictions against AI going FOOM that should be incorporated into any risk estimations as it might weaken the risk posed by AGI or increase the risk posed by impeding AGI research.
I was insufficiently clear that what I wanted to argue about is the claim that virtually all pathways lead to destructive results. I have an insufficient understanding of why the concept of general intelligence is inevitably connected with dangerous self-improvement. Learning is self-improvement in a sense but I do not see how this must imply unbounded improvement in most cases given any goal whatsoever. One argument is that the only general intelligence we know, humans, would want to improve if they could tinker with their source code. But why is it so hard to make people learn then? Why don’t we see much more people interested in how to change their mind? I don’t think you can draw any conclusions here. So we are back at the abstract concept of a constructed general intelligence (as I understand it right now), that is an intelligence with the potential to reach at least human standards (same as a human toddler). Another argument is based on this very difference between humans and AI’s, namely that there is nothing to distract them, that they will possess an autistic focus on one mandatory goal and follow up on it. But in my opinion the difference here also implies that while nothing will distract them, there will also be no incentive not to hold. Why would it do more than necessary to reach a goal? The further argument here is that it will misunderstand its goals. But the problem I see in this case is firstly that the more unspecific the goal the less it is able to measure its self-improvement against the goal to quantify the efficiency of its output. Secondly, the more vague a goal the larger has to be its general knowledge, previous to any self-improvement, to make sense of it in the first place? Shouldn’t those problems outweigh each other to some extent?
For example, if you told the AGI to become as good as possible in Formula 1, so that it was faster than any human race driver. How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow. Secondly, why would it keep improving once it is faster than any human rather than just hold and become impassive? This argument could be extended to many other goals which have scope bounded solutions.
Of course, if you told it to learn as much about the universe as possible, that is something completely different. Yet I don’t see how this risk does raise against other existential risks like grey goo since it should be easier to create advanced replicators to destroy the world than creating AGI that then creates advanced replicators that then fails hold and then destroys the world?
Humans are (roughly) the stupidest possible general intelligences. If it were possible for even a slightly less intelligent species to have dominated the earth, they would have done so (and would now be debating AI development in a slightly less sophisticated way). We are so amazingly stupid we don’t even know what our own preferences are! We (currently) can’t improve or modify our hardware. We can modify our own software, but only to a very limited extent and within narrow constraints. Our entire cognitive architecture was built by piling barely-good-enough hacks on top of each other, with no foresight, no architecture, and no comments in the code.
And despite all that, we humans have reshaped the world to our whims, causing great devastation and wiping out many species that are only marginally dumber than we are. And no human who has ever lived has known their own utility function. That alone would make us massively more powerful optimizers; it’s a standard feature for every AI. AIs have no physical, emotional, or social needs. They do not sleep, or rest, or get bored or distracted. On current hardware, they can perform more serial operations per second than a human by a factor of 10,000,000.
An AI that gets even a little bit smarter than a human will out-optimize us, recursive self-improvement or not. It will get whatever it has been programmed to want, and it will devote every possible resource it can acquire to doing so.
Clippy’s cousin, Clip, is a paperclip satisficer. Clip has been programmed to create 100 paperclips. Unfortunately, the code for his utility function is approximately “ensure that there are 100 more paperclips in the universe than there were when I began running.”
Soon, our solar system is replaced with n+100 paperclips surrounded by the most sophisticated defenses Clip can devise. Probes are sent out to destroy any entity that could ever have even the slightest chance of leading to the destruction of a single paperclip.
The Hidden Complexity of Wishes and Failed Utopia #4-2 may be worth a look. The problem isn’t a lack of specificity, because an AI without a well-defined goal function won’t function. Rather, the danger is that the goal system we specify will have unintended consequences.
Acquiring information is useful for just about every goal. When there aren’t bigger expected marginal gains elsewhere, information gathering is better than nothing. “Learn as much about the universe as possible” is another standard feature for expected utility maximizers.
And this is all before taking into account self-improvement, utility functions that are unstable under self-modification, and our dear friend FOOM.
TL;DR:
Agents that aren’t made of meat will actually maximize utility.
Writing a utility function that actually says what you think it does is much harder than it looks.
Be afraid.
Upvoted, thanks! Very concise and clearly put. This is so far the best scary reply I got in my opinion. It reminds me strongly of the resurrected vampires in Peter Watts novel Blindsight. They are depicted as natural human predators, a superhuman psychopathic Homo genus with minimal consciousness (more raw processing power instead) that can for example hold both aspects of a Necker cube in their heads at the same time. Humans resurrected them with a deficit that was supposed to make them controllable and dependent on their human masters. But of course that’s like a mouse trying to hold a cat as pet. I think that novel shows more than any other literature how dangerous just a little more intelligence can be. It quickly becomes clear that humans are just like little Jewish girls facing a Waffen SS squadron believing they go away if they only close their eyes.
My favorite problem with this entire thread is that it’s basically arguing that even the very first test cases will destroy us all. In reality, nobody puts in a grant application to construct an intelligent being inside a computer with the goal of creating 100 paperclips. They put in the grant to ‘dominate the stock market’, or ‘defend the nation’, or ‘cure death’. And if they don’t, then the Chinese government, who stole the code, will, or that Open Source initiative will, or the South African independent development will, because there’s enormous incentives to do so.
At best, boxing an AI with trivial, pointless tasks only delays the more dangerous versions.
I like to think that Skynet got its start through creative interpretation of a goal like “ensure world peace”. ;-)
″ How is it that the AGI is yet smart enough to learn this all by itself but fails to notice that there are rules to follow”—because there is no reason for an AGI automagically creating arbitrary restrictions if they aren’t part of the goal or superior to the goal. For example, I’m quite sure that F1 rules prohibit interfering with drivers during the game; but if somehow a silicon-reaction-speed AGI can’t win F1 by default, then it may find it simpler/quicker to harm the opponents in one of the infinity ways that the F1 rules don’t cover—say, getting some funds in financial arbitrage, buying out the other teams, and firing any good drivers or engineering a virus that halves the reaction speed of all homo-sapiens—and then it would be happy as the goal is achieved within the rules.
That’s clear. But let me again state what I’d like to inquire. Given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), isn’t the nonhazardous subset of all possible outcomes much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc? Here is where this question stems from. Given my current knowledge about AGI I believe that any AGI capable of dangerous self-improvement will be very sophisticated, including a lot of restrictions. For example, I believe that any self-improvement can only be as efficient as the specifications of its output are detailed. If for example the AGI is build with the goal in mind to produce paperclips, the design specifications of what a paperclip is will be used as leveling rule by which to measure and quantify any improvement of the AGI’s output. This means that to be able to effectively self-improve up to a superhuman level, the design specifications will have to be highly detailed and by definition include sophisticated restrictions. Therefore to claim that any work on AGI will almost certainly lead to dangerous outcomes is to assert that any given AGI is likely to work perfectly well, subject to all restrictions except one that makes it hold (spatiotemporal scope boundaries). I’m unable to arrive at that conclusion as I believe that most AGI’s will fail extensive self-improvement as that is where failure is most likely for that it is the largest and most complicated part of the AGI’s design parameters. To put it bluntly, why is it more likely that contemporary AGI research will succeed at superhuman self-improvement (beyond learning), yet fail to limit the AGI, rather than vice versa? As I see it, it is more likely, given the larger amount of parameters to be able to self-improve in the first place, that most AGI research will result in incremental steps towards human-level intelligence rather than one huge step towards superhuman intelligence that fails on its scope boundary rather than self-improvement.
What you are envisioning is not an AGI at all, but a narrow AI. If you tell an AGI to make paperclips, but it doesn’t know what a paperclip is, then it will go and find out, using whatever means it has available. It won’t give up just because you weren’t detailed enough in telling it what you wanted.
Then I don’t think that there is anyone working on what you are envisioning as ‘AGI’ right now. If a superhuman level of sophistication regarding the potential for self-improvement is already part of your definition then there is no argument to be won or lost here regarding risk assessment of research on AGI. I do not believe this is reasonable or that AGI researchers share your definition. I believe that there is a wide range of artificial general intelligence that does not suit your definition yet deserves this terminology.
Who said anything about a superhuman level of sophistication? Human-level is enough. I’m reasonably certain that if I had the same advantages an AGI would have—that is, if I were converted into an emulation and given my own source code—then I could foom. And I think any reasonably skilled computer programmer could, too.
Debugging will be PITA. Both ways.
Yes, but after the AGI finds out what a paperclip is, it will then, if it is an AGI, start questioning why it was designed with the goal of building paperclips in the first place. And that’s where the friendly AI fallacy falls apart.
Anissimov posted a good article on exactly this point today. AGI will only question its goals according to its cognitive architecture, and come to a conclusion about its goals depending on its architecture. It could “question” its paperclip-maximization goal and come to a “conclusion” that what it really should do is tile the universe with foobarian holala.
So what? An agent with a terminal value (building paperclips) is not going to give it up, not for anything. That’s what “terminal value” means. So the AI can reason about human goals and the history of AGI research. That doesn’t mean it has to care. It cares about paperclips.
It has to care because if there is the slightest motivation to be found in its goal system to hold (parameters for spatiotemporal scope boundaries), then it won’t care to continue anyway. I don’t see where the incentive to override certain parameters of its goals should come from. As Anissimov said, “If an AI questions its values, the questioning will have to come from somewhere.”
Exactly? I think we agree about this.
It won’t care unless it’s been programmed to care (for example by adding “spatiotemporal scope boundaries” to its goal system). It’s not going to override a terminal goal, unless it conflicts with a different terminal goal. In the context of an AI that’s been instructed to “build paperclips”, it has no incentive to care about humans, no matter how much “introspection” it does.
If you do program it to care about humans then obviously it will care. It’s my understanding that that is the hard part.
Again, I recommend The Basic AI Drives.
I cannot disagree with the paper based on that definition of what an “artificial intelligence” is. If you’ve all of this, goals, planning and foresight then you’re already at the end of a very long and hard journey peppered with failures. I’m aware of the risks associated with such agents and support the SIAI, including donations. The intention of this thread was that I wanted to show that contemporary AGI research is much more likely to lead to other outcomes, not that there will be no danger if you already have an AGI with the ability for unbounded self-improvement. But I believe there are many AGI designs who fail this characteristic and therefore I concluded that it is more likely than not that it won’t be a danger. I see now that my definition of AGI is considerable weaker than yours. So of course, if you take your definition what I said is not compelling. I believe that we’ll arrive at your definition only after a long chain of previous weak AGI’s who are impotent of considerable self-improvement and that once we figure out how to create the seed for this kind of potential we are also much more knowledgeable about associated risks and challenges such advanced AGI’s might pose.
Yes, and weak AGIs are dangerous in the same sense as Moore’s law is: by probably making the construction of strong AGI a little bit closer, and thus a development contributing to the eventual existential risk, while being probably not directly dangerous in itself.
Yes, but each step into that direction does also provide insights into the nature of AI and therefore can help to design friendly AI. My idea was that such uncertainties are incorporated into any estimations of the dangers posed by contemporary AI research. How much does the increased understanding outweigh its dangers?
This was my guess for the first 1.5 years or so. The problem is, FAI is necessarily a strong AGI, but if you learn how to build a strong AGI, you are in trouble. You don’t want to have that knowledge around, unless you know where to get the goals from, and studying efficient AGIs doesn’t help with that. The harm is greater than the benefit, and it’s entirely plausible that one can succeed in building a strong AGI without getting the slightest clue about how to define Friendly goal, so it’s not a given that there is any benefit whatsoever.
Yes, I’ll read it now.
Why do you believe that? What privileges “doing what it’s told”?
The question is not what privileges doing what it is told but why it would do what it is not told? A crude mechanical machine has almost no freedom, often it can only follow one pathway. An intelligent machine on the other hand has much freedom, it can follow infinitely many pathways. With freedom comes choice and the necessity to decide, to follow one pathway but not others. Here you assume that a general intelligence will follow a pathway of self-improvement. But I do not think that intelligence implies self-improvement and further that following a pathway that leads an intelligence to optimize will be taken without it being a explicit specified goal. And that is where I conclude that from a certain number of AGI projects not all will follow the pathway of unbounded, dangerous self-improvement as there are more pathways to follow which lead any given general intelligence to be impassive or hold.
If you’ve read the thread above you’ll see that my incentive is not to propose that there is no serious risk but that it is not inevitable that any AGI will turn out to be an existential risk. I want to propose that working on AGI carefully can help us better understand and define friendliness. I propose that the risk to carefull work on AGI is justified and does not imply our demise in any case.
Because planning consists in figuring out instrumental steps on your own.
If we are talking about a full-fledged general intelligence here (Skynet), there’s no arguing against any risk. I believe all we disagree about are definitions. That there are risks from advanced real-world (fictional) nanotechnology is indisputable. I’m merely saying that what researchers are working on is nanotechnology with the potential to lead to grey goo scenarios but that there is no inherent risk that any work on it will lead down the same pathway.
It is incredible hard to come up with an intelligence that knows what planning conists of and to know and care to be able to judge what step is instrumental. This won’t just happen accidently and will likely necessitate knowledge sufficient to be able to set scope boundaries as well. Again, this is not an argument that there is no risk but that it is not as strong as some people believe it to be.
Please keep focus, which is one of the most important tools. The above paragraph is unrelated to what I addressed in this conversation.
Review the above paragraph: what you are saying is that AIs are hard to build. But of course chess AIs do plan, to give an example. They don’t perform only the moves they are “told” to perform.
What I am talking about is that full-fledged AGI is incredible hard to achieve and that therefore most of all AGI projects will fail on something other than limiting the AGI’s scope. Therefore it is not likely that work on AGI is as dangerous as proposed.
That is, it is much more likely that any given chess AI will fail to beat a human player than that it will win. Still the researchers are working on chess AI’s and the chess AI’s will suit the definition of a general chess AI. Yet to get everything about a chess AI exactly right to beat any human but fail to implement certain performance boundaries (e.g. strength of its play or that it will overheat its CPU’s etc.) is an unlikely outcome. It is more likely that it will be good at chess but not superhuman, that it will fail to improve, slow or biased than that it will succeed on all of the previous and additionally leave its scope boundaries.
So the discussion is about if the idea that any work on AGI is incredible dangerous is strong or if it can be weakened.
Yes, broken AIs, such as humans or chimps, are possible.
It has the ability to model and to investigate hypothetical possibilities that might negatively impact the utility function it is optimizing. If it doesn’t, it is far below human intelligence and is non-threatening for the same reason a narrow AI is non-threatening (but it isn’t very useful either).
The difficulty of detecting these threats is spread out around the range of difficulties the AI is capable of handling, so it can infer that there are probably more threats which it could only detect if it were smarter. Therefore, making itself smarter will enable it to detect more threats and thereby increase utility.
To be able to optimize it will have to know what it is supposed to optimize. You’ve to carefully specify what it output (utility function) is supposed to be or it won’t be able to tell how good it is at optimizing. If you just tell it to produce paperclips, it won’t be able to self-improve because it doesn’t know how paperclips look like etc., therefore it cannot judge its own success or that extreme heat would be a negative impact giving paperclips made out of plastic. You further assume that it has a detailed incentive, that it is given a detailed pathway that it tells to look for threats and eliminate them.
If it doesn’t it is what most researchers are working on, an intelligence with the potential to learn and make use of what it learnt, with the potential to become intelligent (educated). I’m getting the impression that people here assume that researchers are not working on an AGI but to hardcode a FOOM machine. If FOOM is simply part of your definition then there’s no arguing against it going FOOM. But what researchers like Goertzel are working on are systems with the potential to reach human level intelligence, that does not mean that they will by definition jailbreak their nursery school. Although I never tried to argue against the possibility but that there are many pathways where this won’t happen rather than the way it is portrayed by the SIAI, that any implementation of AGI will most likely consume humnanity.
The sorts of intelligences you are talking about are narrow AIs, not general intelligences. If you told a general intelligence to produce paperclips but it didn’t know what a paperclip was, then its first subgoal would be to find out. The sort of mind that would give up on a minor obstacle like that wouldn’t foom, but it wouldn’t be much of an AGI either.
And yes, most researchers today are working on narrow AIs, not on AGI. That means they’re less likely to successfully make a general intelligence, but it has no bearing on the question of what will happen if they do make one.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right, including the ability to maintain that specification under self modification.
For example, the specification:
… will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
What weird way are you measuring “efficiency”. Not in joules per paperclip, I gather. You are not likely to “destroy humanity” with a few hundred kilojoules a day. Satisficing machines really are relatively safe.
See other comments hereabouts for hints.
And I was arguing that any given AI won’t be able to self-improve without an exact specification of its output against which it can judge its own efficiency. That’s why I don’t see how it would be likely to be able to implement such exact specifications but yet fail to limit its scope of space, time and resources. What makes it even more unlikely in my opinion is that an AI won’t care to output anything as long as it isn’t explicitly told to do so. Where would that incentive come from?
You assume that it knows that it is supposed to use all of science and the universe to self-improve when it would very likely just self-improve to the extent that it is told and don’t care to go any further. That is for example software-optimization. I just don’t see why you think that any artificial general intelligence would automatically assume that it would have to understand the whole universe to come up with the best possible way to produce 10 paperclips?
You don’t need to tell it to self improve at all.
Per day. Risk mitigation. Security concerns. Possibility of interuption of resource supply due to finance, politics or the collapse of civilisation. Limited lifespan of the sun (primary energy source). Amount of iron in planet.
Given that particular specification if the AI didn’t take a level in baddass it would appear to be malfunctioning.
I just saw this comment by Ben Goertzel regarding self-improvement. I’d love if someone here explained why he as AGI researcher gets this so wrong?
Goertzel is generalizing from the human example of intelligence, which is probably the most pernicious and widespread failure mode in thinking about AI.
Or he may be completely disconnected from anything even resembling the real world. I literally have trouble believing that a professional AI researcher could describe a primitive, dumber-than-human AGI as “toddler-level” in the same sentence he dismisses it as a self-modification threat.
Toddlers self-modify into people using brains made out of meat!
No they don’t. Self-modification in the context of AGI doesn’t mean learning or growing, it means understanding the most fundamental architecture of your own mind and purposefully improving it.
That said, I think your first sentence is probably right. It looks like Ben can’t imagine a toddler-level AGI self-modifying because human toddlers can’t (or human adults, for that matter). But of course AGIs will be very different from human minds. For one thing, their source code will be a lot easier to understand than ours. For another, their minds will probably be much better at redesigning and improving code than ours are. Look at the kind of stuff that computer programs can do with code: Some of them already exceed human capabilities in some ways.
“Toddler-level AGI” is actually a very misleading term. Even if an AGI is approximately equal to a human toddler by some metrics, it will certainly not be equal by many other metrics. What does “toddler-level” mean when the AGI is vastly superior to even adult human minds in some respects?
“Understanding” and “purpose” are helpful abstractions for discussing human-like computational agents, but in more general cases I don’t think your definition of self-modification is carving reality at its joints.
ETA: I strongly agree with everything else in your comment.
Well, bad analogy. They don’t self-modify by understanding their source code and improving it. They gradually grow larger brains in a pre-set fashion while learning specific tasks. Humans have very little ability to self-modify.
Exactly! Humans can go from toddler to AGI start-up founder, and that’s trivial.
Whatever the hell the AGI equivalent of a toddler is, it’s all but guaranteed to be better at self-modification than the human model.
Political incentive determines the bottom line. Then the page is filled with rhetoric (and, from the looks of it, loaded language and status posturing.)
Seriously, Ben is trying to accuse people of abusing the self-modification term based on the (trivially true) observation that there is a blurry boundary between learning and self-modification?
It’s a good thing Ben is mostly harmless. I particularly liked the part where I asked Eliezer:
… and actually got a candid reply.
It is interesting to note the effort Ben is going to here to dissaffiliate himself with the SIAI and portray them as ‘out group’. Wei was querying (see earlier link) the wisdom of having Ben as Director of Research just earlier this year.
An educated outsider will very likely side with the expert though. Just like with the hype around the LHC and its dangers, academics and educated people largely believed the physicists working on it and not the fringe group that claimed it will destroy the world. Although that might be vice versa with the general public. Of course you cannot draw any conclusions about who’s right from this, but it should be investigated anyway because what all parties have in common is the need for support and money.
There are two different groups to be convinced here by each party. One group includes the educated people (academics) and mediocre rationalists and the other group is the general public.
When it comes to who’s right, the people one should listen to are the educated experts who are listening to both parties, their position and arguments. Although their intelligence and status as rationalists will be disputed as each party will claim that they are not smart enough to see the truth if they disagree with them.
Well said and truly spoken.
(My shorter answer, by the way—I interpret all such behaviors through a Hansonian lens. This includes “near vs far”, observations about the incentives of researchers, the general theme of “X is not about Y” and homo hypocritus. Rather cynical, some may suggest, but this kind of thinking gives very good explanaions for “Why?”s that would otherwise be confusing.)
The basic idea is to make a machine that is satisfied relatively easily. So, for example, you tell it to build the ten paperclips with 10 kj total—and tell it not to worry too much if it doesn’t make them—it is not that important.
Sorry, I don’t understand your comment at all. I’ll be back tomorrow.
Yes, as I said, you seem to assume that it is very likely to succeed on all the hard problems but yet fail on the scope boundary. The scary idea states that it is likely that if we create self-improving AI it will consume humanity. I believe that is a rather unlikely outcome and haven’t seen any good reason to believe something else yet.
No, it states that we run the risk of accidentally making something that will consume (or exterminate, subvert, betray, make miserable, or otherwise Do Bad Things to) humanity, that looks perfectly safe and correct, right up until it’s too late to do anything about it… and that this is the default case: the case if we don’t do something extraordinary to prevent it.
This doesn’t require self-improvement, and it doesn’t require wiping out humanity. It just requires normal, every-day human error.
Here is Ben’s phrasing: