Mark_Friedenbach comments on Open Thread March 21 - March 27, 2016

Mark_Friedenbach 22 Dec 2019 12:51 UTC
6 points
I wasn’t actually asking about your views on Goertzel per se. In fact I don’t even know if he has published anything more recent, or what his current view are. Sorry for the confusion there.
I was wondering about your views on the topic as a whole, including the prior probability of a “nightmare scenario” arising from developing a not-provably-Friendly AI before solving the control problem, or the proactionary vs precautionary principle as applied here, etc. You are one of the few people I’ve met online or in person (we met at a CFAR-for-ML workshop some years back, if you recall) that is able to comprehend and articulate reasonable steelmans of both Bostrom and Goertzel’s views. In your comment above you seemed generally on the fence in terms of the hard evidence. Given that I’m puzzling though a few large updates to my own mental model on this subject, anything that has caused you to update in the time since would be highly relevant to me. So I thought I’d ask.
> However this is for reasons that I think are atypical for the LW or MIRI orthodox community.
I’d be interested in hearing about those reasons
Okay. I’m concerned there’s a large inferential gap. Let’s see if I can compactly cross it, and let me know if any steps don’t make sense. My apologies for the length.
First, I only ever came to care about AGI because of the idea of the Singularity. I personally want to live a hundred billion years, to explore the universe and experience the splintering of humanity into uncountable different diverse cultures and ways of life. (And I want to do so without some nanny-AGI enforcing some frozen extrapolated ideal human ethics we exist today.) To personally experience that requires longevity escape velocity, and to achieve that in the few decades remaining of my current lifetime requires something like a Vernor Vinge-style Singularity.
I also want to end all violent conflict, cure all diseases, create abundance so everyone can live their full life potential, and stop Death from snatching my friends and loved ones. But I find it more honest and less virtue signaling to focus on my selfish reasons, which is that I read to much sci-fi as a kid and want to see it happen for myself.
So far, so good. I expect that’s not controversial or even unusual around here. But the point is that my interest in AGI is largely instrumental. I need the Singularity, and the Singularity is started by the development of true artificial general intelligence, in the standard view.
…
Second, I’m actually quite concerned that if any AGI were to “FOOM” (even and perhaps especially a so-called “Friendly” AI), then we would be stuck in what is, by my standards, a less than optimal future where a superintelligence infringes on our post-human freedom to self-modify, creating the unconstrained, diverse shards of humanity I mentioned earlier. Wishing for a nanny-AGI to solve our problems is like wishing to live in a police state, just one where the police are trustworthy and moral. But it’s still a police state. I need a frontier to be happy.
It’s on the above second point that I anticipate disagreement. That my notion of Friendliness is off, that negative utility outcomes are definitionally impossible when guided by a so-called Friendly AI, etc. Because I don’t want this to go too long, I will merely point out that there is a difference between individual utility functions and (extrapolated, coherent) societal utility functions. Maybe, just maybe, it’s not possible for everyone to achieve maximal happiness, and some must suffer for the good of the many. As chronic iconoclast, I fear being stomped by the boot of progress. In any case, if you object on this point then please don’t get stuck here. Just presume it and move on; it is important but not a lynchpin of my position.
…
So as the reasoning goes, I need superintelligent tool AI. And Friendly AI, which is necessarily agent-y, is actually an anti-goal.

So the first question on my quest: is it possible to create tool AGI, without the world ending as a bunch of smart people on LW seem to think would happen? I dove deep into this and came to the conclusion of: “No, it is quite possible to build AGI that does not destroy the world without it being provably Friendly. There are outlines of adequate safety measures that once fully fleshed out could be employed to safeguard so-called tool/oracle AI that is used to jumpstart a Singularity, but still leaves humans, or our transhuman descendants at the top of the metaphorical food chain.”
Again, I’m sorry that I’m skipping justification of this point, but this is a necro comment to a years-old discussion thread not a full post, or the sequence of posts that would be required. When I later decided that LW’s largely non-evidential approach to philosophy was what had obscured reality here, I decided to leave and go about building this AI rather than discussing it further.
It was not long after when I belatedly discovered the obvious fact that the arguments I made against the possibility of a “FOOM” moving fast enough to cause existential risk also argued against the utility of AGI for jumpstarting a real Singularity, of the world-altering Vernor Vinge type, which I had decided was my life’s purpose.
“Oops…”
…is that sound we make when we realized we wasted years of our lives on an important-sounding problem that turned out to be actually irrelevant to our cause. Oh well. Back to working on the problem of medical nanotechnology directly.
But upon leaving LW and pronouncing the Sequences to be info hazards, I had set a 4-year timer to remind myself to come back and re-evaluate that decision. My inner jury is still out on that point, but in reviewing some posts related to AI safety it occurred to me that solving the control problem also solves most of the mundane problems that I expect AGI projects to encounter.
One of my core objections to the “nightmare scenario” of UFAI is that AGI approaches which are likely to be tried in practice (as opposed to abstract models like AIXI) are far more likely to get stuck early, far far before they reach anything near take-over-the-world levels of power. Probably before they even reach the “figure out how to brew coffee” level. Probably before they even know what coffee is. Debugging an AI in such a stuck state would require manual intervention, which is both a timeline extender and a strong safety property. Doing something non-trivial with the first AGI is likely to take years of iterated development, with plenty of opportunity to introspect and alter course.
However a side effect of solving the control problem is that it necessarily must involve being able to reason about the effects of self-modification on future behavior.. which lets the AI avoid getting stuck at all!
If true, this is both good news and bad news.
The good is that a Vingeian Singularity is back on the table! We can solve the worlds problems and usher in an age of abundance and post-human states of being in one generation with the power of AI.
The bad is that there is a weird sort of uncanny-valley like situation where AI today is basically safe, but once a partial solution is found to the tiling problem, and perhaps a few other aspects of the AI safety problem, it does become possible to write an UFAI that can “FOOM” with unpredictable consequences.
So I still think the AI x-risk crowd has seriously overblown the issue today. Deepmind’s creations are not going to take over the world and turn us all into paperclips. But, ironically, if MIRI is at least partially successful in their research, then that work could be applied to make a real Clippy-like entity with all the scary consequences.
That said, I don’t expect this to seriously alter my prediction that tool/oracle AI is achievable. So UFAI + partial control solution could be deployed with appropriate boxing safeguards to get us the Singularity with humans at the helm. But I’m still in the midst of a deep cache purge to update my own feelings of likelihood here.
But yeah, I doubt many at MIRI are working on the control problem explicitly because it is necessary to create the scary kind of UFAI (albeit also the kind that can assist humans to hastily solve their mass of problems!).