Yeah, Archipelago is one of the designs that can probably be made acceptable to almost everyone. It would need to be specified in a lot more detail though, down to how physics works (so people don’t build dangerous AI all over again). The whole point is to let human values survive the emergence of stronger than human intelligence.
For an entity that has build and runs the simulation (aka God) it should be trivial to enforce limits on power/complexity in its simulation (the Tower of Babel case :-D)
The whole point is to let human values survive
I don’t see how that exercise helps. If someone controls your reality, you’re powerless.
Well, baseline humans in a world with superhuman intelligences will be powerless almost by definition. So I guess you can only be satisfied by stopping all superintelligences or upgrading all humans. The first seems unrealistic. The second might work and I’ve thought about it for a long time, but this post is exploring a different scenario where we build a friendly superintelligence to keep others at bay.
Sure, but “human values survive” only because the FAI maintains them—and that returns us back to square one of the “how to make FAI have appropriate values” problem.
The post proposed to build an arbitrary general AI with a goal of making all conscious experiences in reality match {unmodified human brains + this coarse-grained VR utopia designed by us}. This plan wastes tons of potential value and requires tons of research, but it seems much simpler than solving FAI. For example, it skips figuring out how all human preferences should be extracted, extrapolated, and mapped to true physics. (It does still require solving consciousness though, and many other things.)
Mostly I intended the plan to serve as a lower bound for outcome of intelligence explosion that’s better than “everyone dies” but less vague than CEV, because I haven’t seen too many such lower bounds before. Of course I’d welcome any better plan.
So, that “arbitrary general AI” is not an agent? It’s going to be tool AI? I’m not quite sure how do you envisage it being smart enough to do all that you want it to do (e.g. deal with an angsty teenager: “I want the world to BURN!”) and yet have no agency of its own and no system of values.
lower bound for outcome of intelligence explosion
Lower bound in which sense? A point where the intelligence explosion will stop on its own? Or one which the humans will be able to enforce? Or what?
The idea is that if the problem of consciousness is solved (which is admittedly a tall order), “make all consciousness in the universe reflect this particular VR utopia with these particular human brains and evolve it faithfully from there” becomes a formalizable goal, akin to paperclips, which you can hand to an unfriendly agent AI. You don’t need to solve all the other philosophical problems usually required for FAI. Note that solving the problem of consciousness is a key requirement, you can’t just say “simulate these uploaded brains in this utopia forever and nevermind what consciousness means”, because that could open the door to huge suffering happening elsewhere (e.g. due to the AI simulating many scenarios). You really need the “all consciousness in the universe” part.
Lower bound means that before writing this post, I didn’t know any halfway specific plan for navigating the intelligence explosion that didn’t kill everyone. Now I know that we can likely achieve something as good as this, though it isn’t very good. It’s a lower bound on what’s achievable.
Those are not potshots—at a meta level what’s happening is that your picture of this particular piece of the world doesn’t quite match my picture and I’m trying to figure out where exactly the mismatch is and is it mostly a terms/definitions problem or there’s something substantive there. That involves pointing at pieces which stick out or which look to be holes and asking you questions about them. The point is not to destroy the structure, but to make it coherent in my mind.
That said… :-)
which you can hand to an unfriendly agent AI
Isn’t a major point of the Sequences that you can NOT hand anything to a UFAI because it will always find ways to fuck you over? Once you have a UFAI up and running, it’s done, your goose is cooked.
But my point was different: before you get to your formalizable goal, you need to have that VR utopia up and running. Something will have to run it which will include things like preventing some humans from creating virtual hells, waging war on neighbours, etc. etc. That something will have to be an AI. You’re implicitly claiming that this AI will not be an agent (is that so?) and so harmless. I am expressing doubt that you can have an AI with sufficient capabilities and have it be harmless at the same time.
As to intelligence explosion, are you saying that its result will be the the non-agent AI to handle the VR utopia? Most scenarios, I think, assume that the explosion itself will be uncontrolled: you create a seed and that seed recursively self-improves to become a god. Under these assumptions there is no lower bound.
And if there’s not going to be recursive self-improvement, it’s no longer an explosion—the growth is likely to be slow, finicky, and incremental.
Isn’t a major point of the Sequences that you can NOT hand anything to a UFAI because it will always find ways to fuck you over? Once you have a UFAI up and running, it’s done, your goose is cooked.
The sequences don’t say an AI will always fail to optimize a formal goal. The problem is more of mismatch between the formal goal and what humans want. My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist in the universe (isomorphic to a given set of unmodified human brains experiencing a given VR setup, sometimes creating new brains according to given rules, all of which was defined without AI involvement). Then we’re okay with recursive self-improvement and all sorts of destruction in the pursuit of that goal. It can eat the whole universe if it likes.
Something will have to run it which will include things like preventing some humans from creating virtual hells, waging war on neighbours, etc. etc. That something will have to be an AI.
My idea was to make humans set all the rules, while defining the VR utopia, before giving it to the UFAI. It’d be like writing a video game. It seems possible to write a video game that doesn’t let people create hells (including virtual hells, because the VR can be very coarse-grained). Similar for the problem of pain, just give people some control buttons that other people can’t take away. I think hand-coding a toy universe that feels livable long term and has no sharp tools is well within mankind’s ability.
My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist
You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system. And being isomorphic to a “set of unmodified human brains” just gives you the whole humanity as it is: some people’s fantasies involve rainbows and unicorns, and some—pain and domination. There are people who do want hells, virtual or not—so you either will have them in your utopia or you will have to filter such desires out and that involves a value system to decide what’s acceptable in the utopia and what’s not.
My idea was to make humans set all the rules, while defining the VR utopia, before even starting the AI.
That’s called politics and is equivalent to setting the rules for real-life societies on real-life Earth. I don’t see why you would expect it to go noticeably better this time around—you’re still deciding on rules for reality, just with a detour through VR. And how would that work in practice? A UN committee or something? How will disagreements be resolved?
just give people some control buttons that other people can’t take away
To take a trivial example, consider internet harassment. Everyone has a control button that online trolls cannot take away: the off switch on your computer (or even the little X in the top corner of your window). You think it works that well?
You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system.
The hope is that encoding the idea of consciousness will be strictly easier than encoding everything that humans value, including the idea of consciousness (and pleasure, pain, love, population ethics, etc). It’s an assumption of the post.
That’s called politics and is equivalent to setting the rules for real-life societies on real-life Earth.
Correct. My idea doesn’t aim to solve all human problems forever. It aims to solve the problem that right now we’re sitting on a powder keg, with many ways for smarter than human intelligences to emerge, most of which kill everyone. Once we’ve resolved that danger, we can take our time to solve things like politics, internet harassment, or reconciling people’s fantasies.
I agree that defining the VR is itself a political problem, though. Maybe we should do it with a UN committee! It’s a human-scale decision, and even if we get it wrong and a bunch of people suffer, that might be still preferable to killing everyone.
Once we’ve resolved that danger, we can take our time to solve things
I don’t know—I think that once you hand off the formalized goal to the UFAI, you’re stuck: you snapshotted the desired state and you can’t change anything any more. If you can change things, well, that UFAI will make sure things will get changed in the direction it wants.
I think it should be possible to define a game that gives people tools to peacefully resolve disagreements, without giving them tools for intelligence explosion. The two don’t seem obviously connected.
So then, basically, the core of your idea is to move all humans to a controlled reality (first VR, then physical) where an intelligence explosion is impossible? It’s not really supposed to solve any problems, just prevent the expected self-destruction?
Yeah. At quite high cost, too. Like I said, it’s intended as a lower bound of what’s achievable, and I wouldn’t have posted it if any better lower bound was known.
Yeah, Archipelago is one of the designs that can probably be made acceptable to almost everyone. It would need to be specified in a lot more detail though, down to how physics works (so people don’t build dangerous AI all over again). The whole point is to let human values survive the emergence of stronger than human intelligence.
For an entity that has build and runs the simulation (aka God) it should be trivial to enforce limits on power/complexity in its simulation (the Tower of Babel case :-D)
I don’t see how that exercise helps. If someone controls your reality, you’re powerless.
Well, baseline humans in a world with superhuman intelligences will be powerless almost by definition. So I guess you can only be satisfied by stopping all superintelligences or upgrading all humans. The first seems unrealistic. The second might work and I’ve thought about it for a long time, but this post is exploring a different scenario where we build a friendly superintelligence to keep others at bay.
Sure, but “human values survive” only because the FAI maintains them—and that returns us back to square one of the “how to make FAI have appropriate values” problem.
The post proposed to build an arbitrary general AI with a goal of making all conscious experiences in reality match {unmodified human brains + this coarse-grained VR utopia designed by us}. This plan wastes tons of potential value and requires tons of research, but it seems much simpler than solving FAI. For example, it skips figuring out how all human preferences should be extracted, extrapolated, and mapped to true physics. (It does still require solving consciousness though, and many other things.)
Mostly I intended the plan to serve as a lower bound for outcome of intelligence explosion that’s better than “everyone dies” but less vague than CEV, because I haven’t seen too many such lower bounds before. Of course I’d welcome any better plan.
So, that “arbitrary general AI” is not an agent? It’s going to be tool AI? I’m not quite sure how do you envisage it being smart enough to do all that you want it to do (e.g. deal with an angsty teenager: “I want the world to BURN!”) and yet have no agency of its own and no system of values.
Lower bound in which sense? A point where the intelligence explosion will stop on its own? Or one which the humans will be able to enforce? Or what?
The idea is that if the problem of consciousness is solved (which is admittedly a tall order), “make all consciousness in the universe reflect this particular VR utopia with these particular human brains and evolve it faithfully from there” becomes a formalizable goal, akin to paperclips, which you can hand to an unfriendly agent AI. You don’t need to solve all the other philosophical problems usually required for FAI. Note that solving the problem of consciousness is a key requirement, you can’t just say “simulate these uploaded brains in this utopia forever and nevermind what consciousness means”, because that could open the door to huge suffering happening elsewhere (e.g. due to the AI simulating many scenarios). You really need the “all consciousness in the universe” part.
Lower bound means that before writing this post, I didn’t know any halfway specific plan for navigating the intelligence explosion that didn’t kill everyone. Now I know that we can likely achieve something as good as this, though it isn’t very good. It’s a lower bound on what’s achievable.
Those are not potshots—at a meta level what’s happening is that your picture of this particular piece of the world doesn’t quite match my picture and I’m trying to figure out where exactly the mismatch is and is it mostly a terms/definitions problem or there’s something substantive there. That involves pointing at pieces which stick out or which look to be holes and asking you questions about them. The point is not to destroy the structure, but to make it coherent in my mind.
That said… :-)
Isn’t a major point of the Sequences that you can NOT hand anything to a UFAI because it will always find ways to fuck you over? Once you have a UFAI up and running, it’s done, your goose is cooked.
But my point was different: before you get to your formalizable goal, you need to have that VR utopia up and running. Something will have to run it which will include things like preventing some humans from creating virtual hells, waging war on neighbours, etc. etc. That something will have to be an AI. You’re implicitly claiming that this AI will not be an agent (is that so?) and so harmless. I am expressing doubt that you can have an AI with sufficient capabilities and have it be harmless at the same time.
As to intelligence explosion, are you saying that its result will be the the non-agent AI to handle the VR utopia? Most scenarios, I think, assume that the explosion itself will be uncontrolled: you create a seed and that seed recursively self-improves to become a god. Under these assumptions there is no lower bound.
And if there’s not going to be recursive self-improvement, it’s no longer an explosion—the growth is likely to be slow, finicky, and incremental.
The sequences don’t say an AI will always fail to optimize a formal goal. The problem is more of mismatch between the formal goal and what humans want. My idea tries to make that mismatch small, by making the goal say directly which conscious experiences should exist in the universe (isomorphic to a given set of unmodified human brains experiencing a given VR setup, sometimes creating new brains according to given rules, all of which was defined without AI involvement). Then we’re okay with recursive self-improvement and all sorts of destruction in the pursuit of that goal. It can eat the whole universe if it likes.
My idea was to make humans set all the rules, while defining the VR utopia, before giving it to the UFAI. It’d be like writing a video game. It seems possible to write a video game that doesn’t let people create hells (including virtual hells, because the VR can be very coarse-grained). Similar for the problem of pain, just give people some control buttons that other people can’t take away. I think hand-coding a toy universe that feels livable long term and has no sharp tools is well within mankind’s ability.
You think you can formalize a goal which specifies which conscious experiences should exist? It looks to me to be equivalent to formalizing the human value system. And being isomorphic to a “set of unmodified human brains” just gives you the whole humanity as it is: some people’s fantasies involve rainbows and unicorns, and some—pain and domination. There are people who do want hells, virtual or not—so you either will have them in your utopia or you will have to filter such desires out and that involves a value system to decide what’s acceptable in the utopia and what’s not.
That’s called politics and is equivalent to setting the rules for real-life societies on real-life Earth. I don’t see why you would expect it to go noticeably better this time around—you’re still deciding on rules for reality, just with a detour through VR. And how would that work in practice? A UN committee or something? How will disagreements be resolved?
To take a trivial example, consider internet harassment. Everyone has a control button that online trolls cannot take away: the off switch on your computer (or even the little X in the top corner of your window). You think it works that well?
The hope is that encoding the idea of consciousness will be strictly easier than encoding everything that humans value, including the idea of consciousness (and pleasure, pain, love, population ethics, etc). It’s an assumption of the post.
Correct. My idea doesn’t aim to solve all human problems forever. It aims to solve the problem that right now we’re sitting on a powder keg, with many ways for smarter than human intelligences to emerge, most of which kill everyone. Once we’ve resolved that danger, we can take our time to solve things like politics, internet harassment, or reconciling people’s fantasies.
I agree that defining the VR is itself a political problem, though. Maybe we should do it with a UN committee! It’s a human-scale decision, and even if we get it wrong and a bunch of people suffer, that might be still preferable to killing everyone.
I don’t know—I think that once you hand off the formalized goal to the UFAI, you’re stuck: you snapshotted the desired state and you can’t change anything any more. If you can change things, well, that UFAI will make sure things will get changed in the direction it wants.
I think it should be possible to define a game that gives people tools to peacefully resolve disagreements, without giving them tools for intelligence explosion. The two don’t seem obviously connected.
So then, basically, the core of your idea is to move all humans to a controlled reality (first VR, then physical) where an intelligence explosion is impossible? It’s not really supposed to solve any problems, just prevent the expected self-destruction?
Yeah. At quite high cost, too. Like I said, it’s intended as a lower bound of what’s achievable, and I wouldn’t have posted it if any better lower bound was known.
Like, “Please, create a new higher bar that we can expect a truly super-intelligent being to be able to exceed.”?
A bar like “whew, now we can achieve an outcome at least this good, instead of killing everyone. Let’s think how we can do better.”