Thinking isn’t magic. You need empiricism to find out if your thoughts are correct.
Mere humans, exploring the space of possible hypotheses in a dramatically suboptimal way, have nevertheless hit upon the idea that we live in a simulation, and have ideas about how to confirm or disprove it.
The ways people have invented to confirm or disprove that we live in a simulation are mostly bullshit and generally rely on the completely unrealistic assumption that the simulating universe looks a lot like our universe, in particular in terms of the way computation is done (on discrete processors, etc.)
Yeah, but the AI can use empiricism within its simulated world. If it’s smarter than us, in a probably-less-convincing-than-reality world, I would not want to bet at strong odds against the AI figuring things out.
Boxing is potentially a useful component of real safety design, in the same way that seatbelts are a useful component of car design: it might save you, but it also has ways to fail.
The problem with AI safety proposals is that they usually take the form of “Instead of figuring out Friendliness, why don’t we just do X?” where X is something handwavey that has some obvious ways to fail. The usual response, here, is to point out the obvious ways that it can fail, hopefully so that the proposer notices they haven’t obviated solving the actual problem.
If you’re just looking at ways to make the least-imperfect box you can, rather than claiming your box is perfect, I don’t think I’m actually disagreeing with you here.
The idea isn’t to make a box that looks like our world, because, as you pointed out, that would be pretty unconvincing. The idea is to make a radically different and macroscopically slightly similar but much simpler world that it can be in.
The purpose isn’t to make friendliness unnecessary but instead to test if the basics of the AI works even if we aren’t sure if it’s intelligent and possibly, depending on how the AI is designed, provide a space for testing friendliness. Just turning the AI on and seeing what happens would obviously be dangerous, hence boxing.
If you’re just looking at ways to make the least-imperfect box you can, rather than claiming your box is perfect, I don’t think I’m actually disagreeing with you here.
I’m claiming the box is perfect. You can’t escape from a prison if you don’t know it exists, and you can’t figure out it exists if it’s hidden in the laws of physics.
and you can’t figure out it exists if it’s hidden in the laws of physics.
Respectfully, I think you’re just shoving all your complexity under the rug here. Unless you have a concrete proposal on how to actually do this, just asserting that your box won’t be figure-out-able is dodging the question.
At first glance, I was also skeptical of tailcalled’s idea, but now I find I’m starting to warm up to it. Since you didn’t ask for a practical proposal, just a concrete one, I give you this:
Limit the computational power the box has, so that if the AI begins engaging in recursive self-improvement, it’ll run more and more slowly from our perspective, so we’ll have ample time to shut it off. (Of course, from the AI’s perspective, time will run as quickly as it always does, since the whole world will slow down with it.)
(optional) Create multiple human-level intelligences in the world (ignoring ethical constraints here), and see how the AI interacts with them. Run the simulation until you are reasonably certain (for a very stringent definition of “reasonably”) from the AI’s behavior that it is Friendly.
The problem with this is that even if you can determine with certainty that an AI is friendly, there is no certainty that it will stay that way. There could be a series of errors as it goes about daily life, each acting as a mutation, serving to evolve the “Friendly” AI into a less friendly one
I would probably only include it as part of a batch of tests and proofs. It would be pretty foolish to rely on only one method to check if something that will destroy the world if it fails works correctly.
Pick or design a game that contains some aspect of reality that you care about in terms of AI. All games have some element of learning, a lot have an element of planning and some even have varying degrees of programming.
As an example, I will pick Factorio, a game that involves learning, planning and logistics. Wire up the AI to this game, with appropriate reward channels etc. etc.. Now you can test how good the AI is at getting stuff done; producing goods, killing aliens (which isn’t morally problematic, as the aliens don’t act as personlike morally relevant things) and generally learning about the universe.
The step with morality depends on how the AI is designed. If it’s designed to use heuristics to identify a group of entities as humans and help them, you might get away with throwing it in a procedurally generated RPG. If it uses more general, actually morally relevant criteria (such as intelligence, self-awareness, etc.), you might need a very different setup.
However, speculating at exactly what setup is needed for testing morality is probably very unproductive until we decide how we’re actually going to implement morality.
Thinking isn’t magic. You need empiricism to find out if your thoughts are correct.
The ways people have invented to confirm or disprove that we live in a simulation are mostly bullshit and generally rely on the completely unrealistic assumption that the simulating universe looks a lot like our universe, in particular in terms of the way computation is done (on discrete processors, etc.)
Yeah, but the AI can use empiricism within its simulated world. If it’s smarter than us, in a probably-less-convincing-than-reality world, I would not want to bet at strong odds against the AI figuring things out.
Boxing is potentially a useful component of real safety design, in the same way that seatbelts are a useful component of car design: it might save you, but it also has ways to fail.
The problem with AI safety proposals is that they usually take the form of “Instead of figuring out Friendliness, why don’t we just do X?” where X is something handwavey that has some obvious ways to fail. The usual response, here, is to point out the obvious ways that it can fail, hopefully so that the proposer notices they haven’t obviated solving the actual problem.
If you’re just looking at ways to make the least-imperfect box you can, rather than claiming your box is perfect, I don’t think I’m actually disagreeing with you here.
The idea isn’t to make a box that looks like our world, because, as you pointed out, that would be pretty unconvincing. The idea is to make a radically different and macroscopically slightly similar but much simpler world that it can be in.
The purpose isn’t to make friendliness unnecessary but instead to test if the basics of the AI works even if we aren’t sure if it’s intelligent and possibly, depending on how the AI is designed, provide a space for testing friendliness. Just turning the AI on and seeing what happens would obviously be dangerous, hence boxing.
I’m claiming the box is perfect. You can’t escape from a prison if you don’t know it exists, and you can’t figure out it exists if it’s hidden in the laws of physics.
Respectfully, I think you’re just shoving all your complexity under the rug here. Unless you have a concrete proposal on how to actually do this, just asserting that your box won’t be figure-out-able is dodging the question.
At first glance, I was also skeptical of tailcalled’s idea, but now I find I’m starting to warm up to it. Since you didn’t ask for a practical proposal, just a concrete one, I give you this:
Implement an AI in Conway’s Game of Life.
Don’t interact with it in any way.
Limit the computational power the box has, so that if the AI begins engaging in recursive self-improvement, it’ll run more and more slowly from our perspective, so we’ll have ample time to shut it off. (Of course, from the AI’s perspective, time will run as quickly as it always does, since the whole world will slow down with it.)
(optional) Create multiple human-level intelligences in the world (ignoring ethical constraints here), and see how the AI interacts with them. Run the simulation until you are reasonably certain (for a very stringent definition of “reasonably”) from the AI’s behavior that it is Friendly.
Profit.
The problem with this is that even if you can determine with certainty that an AI is friendly, there is no certainty that it will stay that way. There could be a series of errors as it goes about daily life, each acting as a mutation, serving to evolve the “Friendly” AI into a less friendly one
Hm. That does sound more workable than I had thought.
I would probably only include it as part of a batch of tests and proofs. It would be pretty foolish to rely on only one method to check if something that will destroy the world if it fails works correctly.
Yes, I agree with you on that. (Step 5 was intended as a joke/reference.)
Pick or design a game that contains some aspect of reality that you care about in terms of AI. All games have some element of learning, a lot have an element of planning and some even have varying degrees of programming.
As an example, I will pick Factorio, a game that involves learning, planning and logistics. Wire up the AI to this game, with appropriate reward channels etc. etc.. Now you can test how good the AI is at getting stuff done; producing goods, killing aliens (which isn’t morally problematic, as the aliens don’t act as personlike morally relevant things) and generally learning about the universe.
The step with morality depends on how the AI is designed. If it’s designed to use heuristics to identify a group of entities as humans and help them, you might get away with throwing it in a procedurally generated RPG. If it uses more general, actually morally relevant criteria (such as intelligence, self-awareness, etc.), you might need a very different setup.
However, speculating at exactly what setup is needed for testing morality is probably very unproductive until we decide how we’re actually going to implement morality.