Overall, very sensible. I’ll ignore minor quibbles (a ‘strong AI’ and a ‘thinking machine’ seem significantly different to me, since the former implies recursion but the latter doesn’t) and focus on the main points of disagreement.
The related question I care more about, though, is: In practice, which goals are likely to be allied with which kinds and levels of intelligence, in reality? What goals will very, very smart minds, existing in the actual universe rather than the domains of abstract mathematics and philosophy, be most likely to aim for?
Goertzel goes on to question how likely Omohundro’s basic AI drives are to be instantiated. Might an AI that doesn’t care for value-preservation outcompete an AI that does?
Overall this seems very worth thinking about, but I think Goertzel draws the wrong conclusions. If we have a ‘race-to-the-bottom’ of competition between AGI, that suggests evolutionary pressures to me, and evolutionary pressures seem to be the motivation for expecting the AI drives in the first place. Yes, an AGI that doesn’t have any sort of continuity impulses might be able to create a more powerful successor than an AGI that does have continuity impulses. But that’s the start of the race, not the end of the race—any AGI that doesn’t value continuity will edit itself out of existence pretty quickly, whereas those that do won’t.
The nightmare scenario, of course, is an AGI that improves rapidly in the fastest direction possible, and then gets stuck somewhere unpleasant for humans.
And since I used the phrase “nightmare scenario,” a major disagreement between Goertzel and Bostrom is over the role of uncertainty when it comes to danger. Much later, Goertzel brings up the proactionary principle and precautionary principle.
Bostrom’s emotional argument, matching the precautionary approach, seems to be “things might go well, they might go poorly, because there’s the possibility it could go poorly we must worry until we find a way to shut off that possibility.”
Goertzel’s emotional argument, matching the proactionary approach, seems to be “things might go well, they might go poorly, but why conclude that they will go poorly? We don’t know enough.” See, as an example, this quote:
Maybe AGIs that are sufficiently more advanced than humans will find some alternative playground that we humans can’t detect, and go there and leave us alone. We just can’t know, any more than ants can predict the odds that a human civilization, when moving onto a new continent, will destroy the ant colonies present there.
Earlier, Goertzel correctly observes that we’re not going to make a random mind, we’re going to make a mind in a specific way. But the Bostromian counterargument is that because we don’t know where that specific way leads us, we don’t have a guarantee that it’s different from making a random mind! It would be nice if we knew where safe destinations were, and how to create pathways to funnel intelligences towards those destinations.
Which also seems relevant here:
Many of Bostrom’s hints are not especially subtle; e.g. the title of Chapter 8 is “Is the default outcome doom?” The answer given in the chapter is basically “maybe – we can’t rule it out; and here are some various ways doom might happen.” But the chapter isn’t titled “Is doom a plausible outcome?”, even though this is basically what the chapter argues.
I view the Bostromian approach as saying “safety comes from principles; if we don’t follow those principles, disaster will result. We don’t know what principles will actually lead to safety.” Goertzel seems to respond with “yes, not following proper principles could lead to disaster, but we might end up accidentally following them as easily as we might end up accidentally violating them.” Which is on as solid a logical foundation as Bostrom’s position that things like the orthogonality thesis are true “in principle,” and which seems more plausible or attractive seems to be almost more a question of personal psychology or reasoning style than it is evidence or argumentation.
There are massive unknowns here, but it doesn’t seem sensible to simply assume that, for all these non-superintelligence threats, defenses will outpace offenses. It feels to me like Bostrom – in his choices of what to pay attention to, and his various phrasings throughout the book – downplays the risks of other advanced technologies and over-emphasizes the potential risks of AGI. Actually there are massive unknowns all around, and the hypothesis that advanced AGI may save humanity from risks posed by bad people making dangerous use of other technologies is much more plausible than Bostrom makes it seem.
This is, I think, a fairly common position—a decision on whether to risk the world on AGI should be made knowing that there are other background risks that the AGI might materially diminish. (Supposing one estimates that a particular AGI project is a 3 in a thousand chance of existential collapse, one still has work to do in determining whether or not that’s a lower or higher risk than not doing that particular AGI project.)
I don’t see any reason yet to think Bostrom’s ability to estimate probabilities in this area are any better than Goertzel’s, or vice versa; I think that the more AI safety research we do, the easier it is to pull the trigger on an AGI project, and the sooner we can do so. I agree with Goertzel that it’s not obvious that AI research slowdown is desirable, let alone possible, but it is obvious to me that AI safety research speedup is desirable.
I think Goertzel overstates the benefit of open AI development, but agree with him that Bostrom and Yudkowsky overstate the benefit of closed AI development.
I haven’t read about open-ended intelligence yet. My suspicion, from Goertzel’s description of it, is that I’ll find it less satisfying than the reward-based view. My personal model of intelligence is much more inspired by control theory. The following statement, for example, strikes me as somewhat bizarre:
But I differ from them in suspecting that these advances will also bring us beyond the whole paradigm of optimization.
I don’t see how you get rid of optimization without also getting rid of preferences, or choosing a very narrow definition of ‘optimization.’
I think that there’s something of a communication barrier between the Goertzelian approach of “development” and the Yudkowskyian approach of “value preservation.” On the surface, the two of those appear to contradict each other—a child who preserves their values will never become an adult—but I think the synthesis of the two is the correct approach—value preservation is what it looks like when a child matures into an adult, rather than into a tumor. If value is fragile, most processes of change are not the sort of maturation that we want, but are instead the sort of degeneration that we don’t want; and it’s important to learn the difference between them and make sure that we can engineer that difference.
Biology has already (mostly) done that work for us, and so makes it look easy—which the Bostromian camp thinks is a dangerous illusion.
Thank you for taking the time to write that up. I strongly disagree, as you probably know, but it provided a valuable perspective into understanding the difference in viewpoint.
No two rationalists can agree to disagree… but pragmatists sometimes must.
Did we meet at AAAI when it was in Austin, or am I thinking of another Mark? (I do remember our discussion here on LW, I’m just curious if we also talked outside of LW.)
No I’m afraid you’re confusing me with someone else. I haven’t had the chance yet to see the fair city of Austin or attend AAAI, although I would like to. My current day job isn’t in the AI field so it would sadly be an unjustifiable expense.
To elaborate on the prior point, I have for some time engaged with not just yourself, but other MIRI-affiliated researchers as well as Nate and Luke before him. MIRI, FHI, and now FLI have been frustrating to me as their PR engagements have set the narrative and in some cases taken money that otherwise would have gone towards creating the technology that will finally allow us to end pain and suffering in the world. But instead funds and researcher attention are going into basic maths and philosophy that have questionable relevance to the technologies at hand.
However the precautionary vs proactionary description sheds a different light. If you think precautionary approaches are defensible, in spite of overwhelming evidence of their ineffectiveness, then I don’t think this is a debate worth having.
in some cases taken money that otherwise would have gone towards creating the technology that will finally allow us to end pain and suffering in the world.
If one looks as AI systems as including machine learning development, I think the estimate is something like a thousand times as many resources are spent on development as on safety research. I don’t think taking all of the safety money and putting it into ‘full speed ahead!’ would make much difference in time to AGI creation, but I do think transferring funds in the reverse direction may make a big difference for what that pain and suffering is replaced with.
I’ll go back to proactively building AI.
So, in my day job I do build AI systems, but not the AGI variety. I don’t have the interest in mathematical logic necessary to do the sort of work MIRI does. I’m just glad that they are doing it, and hopeful that it turns out to make a difference.
If one looks as AI systems as including machine learning development, I think the estimate is something like a thousand times as many resources are spent on development as on safety research.
Because everyone is working on machine learning, but machine learning is not AGI. AI is the engineering techniques for making programs that act intelligently. AGI is the process for taking those components and actually constructing something useful. It is the difference between computer science and a computer scientist. Machine learning is very useful for doing inference. But AGI is so much more than that, and there are very few resources being spent on AGI issues.
By the way, you should consider joining ##hplusroadmap on Freenode IRC. There’s a community of pragmatic engineers there working on a variety of transhumanist projects, and you AI experience would be valued. Say hi to maaku or kanzure when you join.
Vaniver, 4 years on and I wonder if your opinion on this issue has evolved in the time elapsed? I respect you for your clear and level-headed thinking on this issue. My own thinking has changed somewhat, and I have a new appreciation for the value of AI safety work. However this is for reasons that I think are atypical for the LW or MIRI orthodox community. I wonder if your proximity to the Berkeley AI safety crowd and your ongoing work in narrow AI has caused your opinion to change since 2016?
Vaniver, 4 years on and I wonder if your opinion on this issue has evolved in the time elapsed?
My opinion definitely has more details than it did 4 years ago, but I don’t see anything in the grandparent (or great-grandparent) comment that I disagree with. I will confess to not keeping up with things Goertzel has published in the meantime, but I’d be happy to take a look at something more recent if there’s anything you recommend. I hear Deutsch is working on a paper that addresses whether or not Solomonoff Induction resolves the problem of induction, but to the best of my knowledge it’s not out yet.
However this is for reasons that I think are atypical for the LW or MIRI orthodox community.
I’d be interested in hearing about those reasons; one of the things that has happened is talking to many more people about their intuitions and models both for and against risk (or for or against risk being shaped particular ways).
I wasn’t actually asking about your views on Goertzel per se. In fact I don’t even know if he has published anything more recent, or what his current view are. Sorry for the confusion there.
I was wondering about your views on the topic as a whole, including the prior probability of a “nightmare scenario” arising from developing a not-provably-Friendly AI before solving the control problem, or the proactionary vs precautionary principle as applied here, etc. You are one of the few people I’ve met online or in person (we met at a CFAR-for-ML workshop some years back, if you recall) that is able to comprehend and articulate reasonable steelmans of both Bostrom and Goertzel’s views. In your comment above you seemed generally on the fence in terms of the hard evidence. Given that I’m puzzling though a few large updates to my own mental model on this subject, anything that has caused you to update in the time since would be highly relevant to me. So I thought I’d ask.
> However this is for reasons that I think are atypical for the LW or MIRI orthodox community.
I’d be interested in hearing about those reasons
Okay. I’m concerned there’s a large inferential gap. Let’s see if I can compactly cross it, and let me know if any steps don’t make sense. My apologies for the length.
First, I only ever came to care about AGI because of the idea of the Singularity. I personally want to live a hundred billion years, to explore the universe and experience the splintering of humanity into uncountable different diverse cultures and ways of life. (And I want to do so without some nanny-AGI enforcing some frozen extrapolated ideal human ethics we exist today.) To personally experience that requires longevity escape velocity, and to achieve that in the few decades remaining of my current lifetime requires something like a Vernor Vinge-style Singularity.
I also want to end all violent conflict, cure all diseases, create abundance so everyone can live their full life potential, and stop Death from snatching my friends and loved ones. But I find it more honest and less virtue signaling to focus on my selfish reasons, which is that I read to much sci-fi as a kid and want to see it happen for myself.
So far, so good. I expect that’s not controversial or even unusual around here. But the point is that my interest in AGI is largely instrumental. I need the Singularity, and the Singularity is started by the development of true artificial general intelligence, in the standard view.
…
Second, I’m actually quite concerned that if any AGI were to “FOOM” (even and perhaps especially a so-called “Friendly” AI), then we would be stuck in what is, by my standards, a less than optimal future where a superintelligence infringes on our post-human freedom to self-modify, creating the unconstrained, diverse shards of humanity I mentioned earlier. Wishing for a nanny-AGI to solve our problems is like wishing to live in a police state, just one where the police are trustworthy and moral. But it’s still a police state. I need a frontier to be happy.
It’s on the above second point that I anticipate disagreement. That my notion of Friendliness is off, that negative utility outcomes are definitionally impossible when guided by a so-called Friendly AI, etc. Because I don’t want this to go too long, I will merely point out that there is a difference between individual utility functions and (extrapolated, coherent) societal utility functions. Maybe, just maybe, it’s not possible for everyone to achieve maximal happiness, and some must suffer for the good of the many. As chronic iconoclast, I fear being stomped by the boot of progress. In any case, if you object on this point then please don’t get stuck here. Just presume it and move on; it is important but not a lynchpin of my position.
…
So as the reasoning goes, I need superintelligent tool AI. And Friendly AI, which is necessarily agent-y, is actually an anti-goal.
So the first question on my quest: is it possible to create tool AGI, without the world ending as a bunch of smart people on LW seem to think would happen? I dove deep into this and came to the conclusion of: “No, it is quite possible to build AGI that does not destroy the world without it being provably Friendly. There are outlines of adequate safety measures that once fully fleshed out could be employed to safeguard so-called tool/oracle AI that is used to jumpstart a Singularity, but still leaves humans, or our transhuman descendants at the top of the metaphorical food chain.”
Again, I’m sorry that I’m skipping justification of this point, but this is a necro comment to a years-old discussion thread not a full post, or the sequence of posts that would be required. When I later decided that LW’s largely non-evidential approach to philosophy was what had obscured reality here, I decided to leave and go about building this AI rather than discussing it further.
It was not long after when I belatedly discovered the obvious fact that the arguments I made against the possibility of a “FOOM” moving fast enough to cause existential risk also argued against the utility of AGI for jumpstarting a real Singularity, of the world-altering Vernor Vinge type, which I had decided was my life’s purpose.
“Oops…”
…is that sound we make when we realized we wasted years of our lives on an important-sounding problem that turned out to be actually irrelevant to our cause. Oh well. Back to working on the problem of medical nanotechnology directly.
But upon leaving LW and pronouncing the Sequences to be info hazards, I had set a 4-year timer to remind myself to come back and re-evaluate that decision. My inner jury is still out on that point, but in reviewing some posts related to AI safety it occurred to me that solving the control problem also solves most of the mundane problems that I expect AGI projects to encounter.
One of my core objections to the “nightmare scenario” of UFAI is that AGI approaches which are likely to be tried in practice (as opposed to abstract models like AIXI) are far more likely to get stuck early, far far before they reach anything near take-over-the-world levels of power. Probably before they even reach the “figure out how to brew coffee” level. Probably before they even know what coffee is. Debugging an AI in such a stuck state would require manual intervention, which is both a timeline extender and a strong safety property. Doing something non-trivial with the first AGI is likely to take years of iterated development, with plenty of opportunity to introspect and alter course.
However a side effect of solving the control problem is that it necessarily must involve being able to reason about the effects of self-modification on future behavior.. which lets the AI avoid getting stuck at all!
If true, this is both good news and bad news.
The good is that a Vingeian Singularity is back on the table! We can solve the worlds problems and usher in an age of abundance and post-human states of being in one generation with the power of AI.
The bad is that there is a weird sort of uncanny-valley like situation where AI today is basically safe, but once a partial solution is found to the tiling problem, and perhaps a few other aspects of the AI safety problem, it does become possible to write an UFAI that can “FOOM” with unpredictable consequences.
So I still think the AI x-risk crowd has seriously overblown the issue today. Deepmind’s creations are not going to take over the world and turn us all into paperclips. But, ironically, if MIRI is at least partially successful in their research, then that work could be applied to make a real Clippy-like entity with all the scary consequences.
That said, I don’t expect this to seriously alter my prediction that tool/oracle AI is achievable. So UFAI + partial control solution could be deployed with appropriate boxing safeguards to get us the Singularity with humans at the helm. But I’m still in the midst of a deep cache purge to update my own feelings of likelihood here.
But yeah, I doubt many at MIRI are working on the control problem explicitly because it is necessary to create the scary kind of UFAI (albeit also the kind that can assist humans to hastily solve their mass of problems!).
If you have time for another, I’d be interested in your response to Goertzel’s critique of Superintelligence:
http://jetpress.org/v25.2/goertzel.htm
Overall, very sensible. I’ll ignore minor quibbles (a ‘strong AI’ and a ‘thinking machine’ seem significantly different to me, since the former implies recursion but the latter doesn’t) and focus on the main points of disagreement.
Goertzel goes on to question how likely Omohundro’s basic AI drives are to be instantiated. Might an AI that doesn’t care for value-preservation outcompete an AI that does?
Overall this seems very worth thinking about, but I think Goertzel draws the wrong conclusions. If we have a ‘race-to-the-bottom’ of competition between AGI, that suggests evolutionary pressures to me, and evolutionary pressures seem to be the motivation for expecting the AI drives in the first place. Yes, an AGI that doesn’t have any sort of continuity impulses might be able to create a more powerful successor than an AGI that does have continuity impulses. But that’s the start of the race, not the end of the race—any AGI that doesn’t value continuity will edit itself out of existence pretty quickly, whereas those that do won’t.
The nightmare scenario, of course, is an AGI that improves rapidly in the fastest direction possible, and then gets stuck somewhere unpleasant for humans.
And since I used the phrase “nightmare scenario,” a major disagreement between Goertzel and Bostrom is over the role of uncertainty when it comes to danger. Much later, Goertzel brings up the proactionary principle and precautionary principle.
Bostrom’s emotional argument, matching the precautionary approach, seems to be “things might go well, they might go poorly, because there’s the possibility it could go poorly we must worry until we find a way to shut off that possibility.”
Goertzel’s emotional argument, matching the proactionary approach, seems to be “things might go well, they might go poorly, but why conclude that they will go poorly? We don’t know enough.” See, as an example, this quote:
Earlier, Goertzel correctly observes that we’re not going to make a random mind, we’re going to make a mind in a specific way. But the Bostromian counterargument is that because we don’t know where that specific way leads us, we don’t have a guarantee that it’s different from making a random mind! It would be nice if we knew where safe destinations were, and how to create pathways to funnel intelligences towards those destinations.
Which also seems relevant here:
I view the Bostromian approach as saying “safety comes from principles; if we don’t follow those principles, disaster will result. We don’t know what principles will actually lead to safety.” Goertzel seems to respond with “yes, not following proper principles could lead to disaster, but we might end up accidentally following them as easily as we might end up accidentally violating them.” Which is on as solid a logical foundation as Bostrom’s position that things like the orthogonality thesis are true “in principle,” and which seems more plausible or attractive seems to be almost more a question of personal psychology or reasoning style than it is evidence or argumentation.
This is, I think, a fairly common position—a decision on whether to risk the world on AGI should be made knowing that there are other background risks that the AGI might materially diminish. (Supposing one estimates that a particular AGI project is a 3 in a thousand chance of existential collapse, one still has work to do in determining whether or not that’s a lower or higher risk than not doing that particular AGI project.)
I don’t see any reason yet to think Bostrom’s ability to estimate probabilities in this area are any better than Goertzel’s, or vice versa; I think that the more AI safety research we do, the easier it is to pull the trigger on an AGI project, and the sooner we can do so. I agree with Goertzel that it’s not obvious that AI research slowdown is desirable, let alone possible, but it is obvious to me that AI safety research speedup is desirable.
I think Goertzel overstates the benefit of open AI development, but agree with him that Bostrom and Yudkowsky overstate the benefit of closed AI development.
I haven’t read about open-ended intelligence yet. My suspicion, from Goertzel’s description of it, is that I’ll find it less satisfying than the reward-based view. My personal model of intelligence is much more inspired by control theory. The following statement, for example, strikes me as somewhat bizarre:
I don’t see how you get rid of optimization without also getting rid of preferences, or choosing a very narrow definition of ‘optimization.’
I think that there’s something of a communication barrier between the Goertzelian approach of “development” and the Yudkowskyian approach of “value preservation.” On the surface, the two of those appear to contradict each other—a child who preserves their values will never become an adult—but I think the synthesis of the two is the correct approach—value preservation is what it looks like when a child matures into an adult, rather than into a tumor. If value is fragile, most processes of change are not the sort of maturation that we want, but are instead the sort of degeneration that we don’t want; and it’s important to learn the difference between them and make sure that we can engineer that difference.
Biology has already (mostly) done that work for us, and so makes it look easy—which the Bostromian camp thinks is a dangerous illusion.
Thank you for taking the time to write that up. I strongly disagree, as you probably know, but it provided a valuable perspective into understanding the difference in viewpoint.
No two rationalists can agree to disagree… but pragmatists sometimes must.
You’re welcome!
Did we meet at AAAI when it was in Austin, or am I thinking of another Mark? (I do remember our discussion here on LW, I’m just curious if we also talked outside of LW.)
No I’m afraid you’re confusing me with someone else. I haven’t had the chance yet to see the fair city of Austin or attend AAAI, although I would like to. My current day job isn’t in the AI field so it would sadly be an unjustifiable expense.
To elaborate on the prior point, I have for some time engaged with not just yourself, but other MIRI-affiliated researchers as well as Nate and Luke before him. MIRI, FHI, and now FLI have been frustrating to me as their PR engagements have set the narrative and in some cases taken money that otherwise would have gone towards creating the technology that will finally allow us to end pain and suffering in the world. But instead funds and researcher attention are going into basic maths and philosophy that have questionable relevance to the technologies at hand.
However the precautionary vs proactionary description sheds a different light. If you think precautionary approaches are defensible, in spite of overwhelming evidence of their ineffectiveness, then I don’t think this is a debate worth having.
I’ll go back to proactively building AI.
If one looks as AI systems as including machine learning development, I think the estimate is something like a thousand times as many resources are spent on development as on safety research. I don’t think taking all of the safety money and putting it into ‘full speed ahead!’ would make much difference in time to AGI creation, but I do think transferring funds in the reverse direction may make a big difference for what that pain and suffering is replaced with.
So, in my day job I do build AI systems, but not the AGI variety. I don’t have the interest in mathematical logic necessary to do the sort of work MIRI does. I’m just glad that they are doing it, and hopeful that it turns out to make a difference.
Because everyone is working on machine learning, but machine learning is not AGI. AI is the engineering techniques for making programs that act intelligently. AGI is the process for taking those components and actually constructing something useful. It is the difference between computer science and a computer scientist. Machine learning is very useful for doing inference. But AGI is so much more than that, and there are very few resources being spent on AGI issues.
By the way, you should consider joining ##hplusroadmap on Freenode IRC. There’s a community of pragmatic engineers there working on a variety of transhumanist projects, and you AI experience would be valued. Say hi to maaku or kanzure when you join.
Vaniver, 4 years on and I wonder if your opinion on this issue has evolved in the time elapsed? I respect you for your clear and level-headed thinking on this issue. My own thinking has changed somewhat, and I have a new appreciation for the value of AI safety work. However this is for reasons that I think are atypical for the LW or MIRI orthodox community. I wonder if your proximity to the Berkeley AI safety crowd and your ongoing work in narrow AI has caused your opinion to change since 2016?
Thanks!
My opinion definitely has more details than it did 4 years ago, but I don’t see anything in the grandparent (or great-grandparent) comment that I disagree with. I will confess to not keeping up with things Goertzel has published in the meantime, but I’d be happy to take a look at something more recent if there’s anything you recommend. I hear Deutsch is working on a paper that addresses whether or not Solomonoff Induction resolves the problem of induction, but to the best of my knowledge it’s not out yet.
I’d be interested in hearing about those reasons; one of the things that has happened is talking to many more people about their intuitions and models both for and against risk (or for or against risk being shaped particular ways).
I wasn’t actually asking about your views on Goertzel per se. In fact I don’t even know if he has published anything more recent, or what his current view are. Sorry for the confusion there.
I was wondering about your views on the topic as a whole, including the prior probability of a “nightmare scenario” arising from developing a not-provably-Friendly AI before solving the control problem, or the proactionary vs precautionary principle as applied here, etc. You are one of the few people I’ve met online or in person (we met at a CFAR-for-ML workshop some years back, if you recall) that is able to comprehend and articulate reasonable steelmans of both Bostrom and Goertzel’s views. In your comment above you seemed generally on the fence in terms of the hard evidence. Given that I’m puzzling though a few large updates to my own mental model on this subject, anything that has caused you to update in the time since would be highly relevant to me. So I thought I’d ask.
Okay. I’m concerned there’s a large inferential gap. Let’s see if I can compactly cross it, and let me know if any steps don’t make sense. My apologies for the length.
First, I only ever came to care about AGI because of the idea of the Singularity. I personally want to live a hundred billion years, to explore the universe and experience the splintering of humanity into uncountable different diverse cultures and ways of life. (And I want to do so without some nanny-AGI enforcing some frozen extrapolated ideal human ethics we exist today.) To personally experience that requires longevity escape velocity, and to achieve that in the few decades remaining of my current lifetime requires something like a Vernor Vinge-style Singularity.
I also want to end all violent conflict, cure all diseases, create abundance so everyone can live their full life potential, and stop Death from snatching my friends and loved ones. But I find it more honest and less virtue signaling to focus on my selfish reasons, which is that I read to much sci-fi as a kid and want to see it happen for myself.
So far, so good. I expect that’s not controversial or even unusual around here. But the point is that my interest in AGI is largely instrumental. I need the Singularity, and the Singularity is started by the development of true artificial general intelligence, in the standard view.
…
Second, I’m actually quite concerned that if any AGI were to “FOOM” (even and perhaps especially a so-called “Friendly” AI), then we would be stuck in what is, by my standards, a less than optimal future where a superintelligence infringes on our post-human freedom to self-modify, creating the unconstrained, diverse shards of humanity I mentioned earlier. Wishing for a nanny-AGI to solve our problems is like wishing to live in a police state, just one where the police are trustworthy and moral. But it’s still a police state. I need a frontier to be happy.
It’s on the above second point that I anticipate disagreement. That my notion of Friendliness is off, that negative utility outcomes are definitionally impossible when guided by a so-called Friendly AI, etc. Because I don’t want this to go too long, I will merely point out that there is a difference between individual utility functions and (extrapolated, coherent) societal utility functions. Maybe, just maybe, it’s not possible for everyone to achieve maximal happiness, and some must suffer for the good of the many. As chronic iconoclast, I fear being stomped by the boot of progress. In any case, if you object on this point then please don’t get stuck here. Just presume it and move on; it is important but not a lynchpin of my position.
…
So as the reasoning goes, I need superintelligent tool AI. And Friendly AI, which is necessarily agent-y, is actually an anti-goal.
So the first question on my quest: is it possible to create tool AGI, without the world ending as a bunch of smart people on LW seem to think would happen? I dove deep into this and came to the conclusion of: “No, it is quite possible to build AGI that does not destroy the world without it being provably Friendly. There are outlines of adequate safety measures that once fully fleshed out could be employed to safeguard so-called tool/oracle AI that is used to jumpstart a Singularity, but still leaves humans, or our transhuman descendants at the top of the metaphorical food chain.”
Again, I’m sorry that I’m skipping justification of this point, but this is a necro comment to a years-old discussion thread not a full post, or the sequence of posts that would be required. When I later decided that LW’s largely non-evidential approach to philosophy was what had obscured reality here, I decided to leave and go about building this AI rather than discussing it further.
It was not long after when I belatedly discovered the obvious fact that the arguments I made against the possibility of a “FOOM” moving fast enough to cause existential risk also argued against the utility of AGI for jumpstarting a real Singularity, of the world-altering Vernor Vinge type, which I had decided was my life’s purpose.
“Oops…”
…is that sound we make when we realized we wasted years of our lives on an important-sounding problem that turned out to be actually irrelevant to our cause. Oh well. Back to working on the problem of medical nanotechnology directly.
But upon leaving LW and pronouncing the Sequences to be info hazards, I had set a 4-year timer to remind myself to come back and re-evaluate that decision. My inner jury is still out on that point, but in reviewing some posts related to AI safety it occurred to me that solving the control problem also solves most of the mundane problems that I expect AGI projects to encounter.
One of my core objections to the “nightmare scenario” of UFAI is that AGI approaches which are likely to be tried in practice (as opposed to abstract models like AIXI) are far more likely to get stuck early, far far before they reach anything near take-over-the-world levels of power. Probably before they even reach the “figure out how to brew coffee” level. Probably before they even know what coffee is. Debugging an AI in such a stuck state would require manual intervention, which is both a timeline extender and a strong safety property. Doing something non-trivial with the first AGI is likely to take years of iterated development, with plenty of opportunity to introspect and alter course.
However a side effect of solving the control problem is that it necessarily must involve being able to reason about the effects of self-modification on future behavior.. which lets the AI avoid getting stuck at all!
If true, this is both good news and bad news.
The good is that a Vingeian Singularity is back on the table! We can solve the worlds problems and usher in an age of abundance and post-human states of being in one generation with the power of AI.
The bad is that there is a weird sort of uncanny-valley like situation where AI today is basically safe, but once a partial solution is found to the tiling problem, and perhaps a few other aspects of the AI safety problem, it does become possible to write an UFAI that can “FOOM” with unpredictable consequences.
So I still think the AI x-risk crowd has seriously overblown the issue today. Deepmind’s creations are not going to take over the world and turn us all into paperclips. But, ironically, if MIRI is at least partially successful in their research, then that work could be applied to make a real Clippy-like entity with all the scary consequences.
That said, I don’t expect this to seriously alter my prediction that tool/oracle AI is achievable. So UFAI + partial control solution could be deployed with appropriate boxing safeguards to get us the Singularity with humans at the helm. But I’m still in the midst of a deep cache purge to update my own feelings of likelihood here.
But yeah, I doubt many at MIRI are working on the control problem explicitly because it is necessary to create the scary kind of UFAI (albeit also the kind that can assist humans to hastily solve their mass of problems!).