What situation do you envision in which the UFAI would expect to gain utility by building an FAI?
The situation I described: cooperation between FAI and UFAI. Two unrelated AIs are never truly antagonistic, so they have something to gain from cooperation.
And it seems a little strange to accuse me of offering solutions before the problem is fully explored, when I was responding to a proposal for a solution (using UFAI to build FAI).
The same problem on both accounts, confident assertions about a confusing issue. This happened twice in a row, because the discussion shared the common confusing topic, so it’s not very surprising.
Unless you are actually saying that the way to get an UFAI to build an FAI is to build the FAI ourselves, locate the UFAI in a different universe, and have some sort of rift between the universe with contrived rules about what sort of interaction it allows, I still do not understand the situation you are talking about.
Two unrelated AIs are never truly antagonistic, so they have something to gain from cooperation.
An AI that wants to tile the solar system with molecular smiley faces and an AI that wants to tile the solar system with paperclips are going to have conflicts. Either of them would have conflicts with an FAI that wants to use the resources of the solar system to create a rich life experience for humanity. Maybe these AI’s are not what you call “unrelated”, but if so, I doubt the UFAI and the FAI we want it to build can be considered to be unrelated.
The same problem on both accounts, confident assertions about a confusing issue.
Are you asking me to have less confidence in the difficulty of us outsmarting things that are smarter than us?
Among the two options “UFAI doesn’t do anything, and so we terminate/won’t build it”, and “UFAI builds/explains FAI, and gets—simplifying -- 1/100th of the universe”, the second option is preferable to both us and the UFAI, and so if these are the only options, it’ll take it.
Yes, I’m asking you to have less confidence in any conclusion you are drawing here. “Outsmarting” isn’t exactly a tangible event.
First of all, the “won’t build it” option does not make any sense. It is not like the UFAI is going to do anything before it exists.
So then, going back to the point that you claim doesn’t address your counterexample, supposing we actually have the UFAI in a situation where we can terminate it at will, and it can’t persuade anyone to free it, how are we supposed to verify that the thing it is helping us build is actually an FAI, (and not another UFAI with the same values) if we do not know how to build an FAI on our own? If we can’t tell the difference, isn’t that what the UFAI, with its values, should do?
That’s why I never proposed that we ask the UFAI to create us an FAI; I proposed that we ask the UFAI to help us create an FAI. Doing something like proving a mathematical theorem for us, or creating something that fulfills a template we’ve created ourselves, leaves no room for betrayal, barring any “third options” like manipulating the EM waves produced by its CPU to send arbitrary messages over the Internet.
Suppose the UFAI figures out a fake theorem that it would like us to believe, because it would lead us down a path of creating an AI it likes. If we were to ask it to prove this fake theorem, it would give back to us something that looks very much like a proof, so that we would miss the point where it goes wrong. Maybe we require a machine verifiable proof, but it takes advantage of a flaw in our automatic verifiers, or the way we interpret the results. So how does it get us to ask about this fake theorem? It might manipulate its proof of a theorem we do ask about to inspire us to ask that question. It might respond to our request for a proof with, “Well, that is not quite right, but if you make these modifications...”. Keep in mind, this is a hostile intelligence that is way beyond us. It will take any opportunity to subtly manipulate us that it gets. And these have only been the ideas that I, a mere human, could come up with.
I am not sure what sort of template you mean, but I suspect that it will have the same problem. Basically, room for the UFAI to use its superior intelligence to help us is room for betrayal.
Step back and check what are you arguing. The discussion is whether a scenario where the UFAI is helpful is at all plausible. Of course all sorts of stuff can go wrong. Of course it isn’t a good idea. The arguments saying “but this could go wrong too” don’t advance the discussion a bit, as it’s already understood.
I am not talking about some slim chance of something happening to go wrong. I am talking about a hostile super intelligence systematically arranging for us to make dangerous critical mistakes. I don’t know exactly what the UFAI will do anymore than I can predict which move Kasparov will make in a chess game, but I can predict with greater confidence than I can predict Kasparov would win a chess game against a mere grand master, that, given that we build and use a UFAI, we will be the bug and the UFAI will be the windshield. Splat!
If your standard here is if something is “at all plausible”, then the discussion is trivial, nothing has probability 0. However, none of the proposals for using UFAI discussed here are plans that I expect positive utility from. (Which is putting it mildly, I expect them all to result in the annihilation of humanity with high probability, and to produce good results only in the conjunction of many slim chances of things happening to go right.)
You should speak for yourself about what is already understood, what you think is not a good idea. Warrigal seems to think using UFAI can be a good idea.
Good, that is a critical insight. I will explore some of the implications of considering what kind of FAI it is.
By “FAI”, we refer to an AI that we can predict with high confidence will be friendly from our deep understanding of how it works. (Deep understanding is not actually part of the definition, but, without it, we are not likely to achieve high (well calibrated) confidence).
So, a kind of FAI that is built out of a UFAI subject to some constraints (or other means of making it friendly), would require us to understand the constraints and the UFAI sufficiently to predict that the UFAI would not be able to break the constraints, which it would of course try to do. (One might imagine constraints that are sufficient not to be broken by any level of intelligence, but I find this unlikely, as a major vulnerability is the possibility of giving us misinformation with regards to issues we don’t understand.)
Which is easier, to understand the UFAI we built without understanding it, and its constraints, to predict the system will be “friendly” (and to actually produce a system, including the UFAI, such that the prediction would be correct), or to directly build an FAI, that is designed from the ground up to be predictably friendly? And which system is more likely to wipe us out if we are wrong, rather than fail to do anything interesting?
I prefer the solution where friendliness is not a patch, but is a critical part of the intelligence itself.
You might be able to recognize the right solution (e.g. a theory) when you see it, while unable to generate it yourself (as fast). If you are sensitive enough to attempts of UFAI to confuse you into doing the wrong thing, just going with the deal may be the best option for it.
First of all, the “won’t build it” option does not make any sense. It is not like the UFAI is going to do anything before it exists.
You decide whether to build something new before it exists, based on its properties. AI’s decisions are such properties.
If you are sensitive enough to attempts of UFAI to confuse you into doing the wrong thing, just going with the deal may be the best option for it.
And if we are not sensitive enough, our molecules get reused for paperclips. You are talking about matching wits with something that is orders of magnitude smarter than us, thinks orders of magnitude faster than us, has a detailed model of our minds, and doesn’t think we are worth the utilons it can build out of our quarks. Yes, we would think we recognize the right solution, that it is obvious now that it’s been pointed out, and we would be wrong. An argument than a UFAI figured out would be persuasive to us is nowhere near as trustworthy as an argument we figure out ourselves. While we can be wrong about our own arguments, the UFAI will present arguments that we will be systematically wrong about in very dangerous ways.
You decide whether to build something based on its properties before it exists.
And for our next trick, we will just ask Omega to build an FAI for us.
No AI, friendly or unfriendly, will ever have a model of our minds as detailed as the model we have of its mind, because we can pause it and inspect its source code while it can’t do anything analogous to us.
I have written plenty of mere desktop applications that are a major pain for a human mind to understand in a debugger. And have you ever written programs that generate machine code for another program? And then tried to inspect that machine code when something went wrong to figure out why?
Well, that stuff is nothing compared to the difficulty of debugging or otherwise understanding an AI, even given full access to inspect its implementation. An unfriendly AI is likely to be built by throwing lots of parallel hardware together until something sticks. If the designers actually knew what they were doing, they would figure out they need to make it friendly. So, yes, you could pause the whole thing, and look at how the billions of CPU’s are interconnected, and the state of the terabytes of memory, but you are not logically omniscient, and you will not make sense of it.
It’s not like you could just inspect the evil bit.
The situation I described: cooperation between FAI and UFAI. Two unrelated AIs are never truly antagonistic, so they have something to gain from cooperation.
The same problem on both accounts, confident assertions about a confusing issue. This happened twice in a row, because the discussion shared the common confusing topic, so it’s not very surprising.
Unless you are actually saying that the way to get an UFAI to build an FAI is to build the FAI ourselves, locate the UFAI in a different universe, and have some sort of rift between the universe with contrived rules about what sort of interaction it allows, I still do not understand the situation you are talking about.
An AI that wants to tile the solar system with molecular smiley faces and an AI that wants to tile the solar system with paperclips are going to have conflicts. Either of them would have conflicts with an FAI that wants to use the resources of the solar system to create a rich life experience for humanity. Maybe these AI’s are not what you call “unrelated”, but if so, I doubt the UFAI and the FAI we want it to build can be considered to be unrelated.
Are you asking me to have less confidence in the difficulty of us outsmarting things that are smarter than us?
Among the two options “UFAI doesn’t do anything, and so we terminate/won’t build it”, and “UFAI builds/explains FAI, and gets—simplifying -- 1/100th of the universe”, the second option is preferable to both us and the UFAI, and so if these are the only options, it’ll take it.
Yes, I’m asking you to have less confidence in any conclusion you are drawing here. “Outsmarting” isn’t exactly a tangible event.
First of all, the “won’t build it” option does not make any sense. It is not like the UFAI is going to do anything before it exists.
So then, going back to the point that you claim doesn’t address your counterexample, supposing we actually have the UFAI in a situation where we can terminate it at will, and it can’t persuade anyone to free it, how are we supposed to verify that the thing it is helping us build is actually an FAI, (and not another UFAI with the same values) if we do not know how to build an FAI on our own? If we can’t tell the difference, isn’t that what the UFAI, with its values, should do?
That’s why I never proposed that we ask the UFAI to create us an FAI; I proposed that we ask the UFAI to help us create an FAI. Doing something like proving a mathematical theorem for us, or creating something that fulfills a template we’ve created ourselves, leaves no room for betrayal, barring any “third options” like manipulating the EM waves produced by its CPU to send arbitrary messages over the Internet.
Suppose the UFAI figures out a fake theorem that it would like us to believe, because it would lead us down a path of creating an AI it likes. If we were to ask it to prove this fake theorem, it would give back to us something that looks very much like a proof, so that we would miss the point where it goes wrong. Maybe we require a machine verifiable proof, but it takes advantage of a flaw in our automatic verifiers, or the way we interpret the results. So how does it get us to ask about this fake theorem? It might manipulate its proof of a theorem we do ask about to inspire us to ask that question. It might respond to our request for a proof with, “Well, that is not quite right, but if you make these modifications...”. Keep in mind, this is a hostile intelligence that is way beyond us. It will take any opportunity to subtly manipulate us that it gets. And these have only been the ideas that I, a mere human, could come up with.
I am not sure what sort of template you mean, but I suspect that it will have the same problem. Basically, room for the UFAI to use its superior intelligence to help us is room for betrayal.
Step back and check what are you arguing. The discussion is whether a scenario where the UFAI is helpful is at all plausible. Of course all sorts of stuff can go wrong. Of course it isn’t a good idea. The arguments saying “but this could go wrong too” don’t advance the discussion a bit, as it’s already understood.
I am not talking about some slim chance of something happening to go wrong. I am talking about a hostile super intelligence systematically arranging for us to make dangerous critical mistakes. I don’t know exactly what the UFAI will do anymore than I can predict which move Kasparov will make in a chess game, but I can predict with greater confidence than I can predict Kasparov would win a chess game against a mere grand master, that, given that we build and use a UFAI, we will be the bug and the UFAI will be the windshield. Splat!
If your standard here is if something is “at all plausible”, then the discussion is trivial, nothing has probability 0. However, none of the proposals for using UFAI discussed here are plans that I expect positive utility from. (Which is putting it mildly, I expect them all to result in the annihilation of humanity with high probability, and to produce good results only in the conjunction of many slim chances of things happening to go right.)
You should speak for yourself about what is already understood, what you think is not a good idea. Warrigal seems to think using UFAI can be a good idea.
When you figure out how to arrange the usage of UFAI that is a good idea, the whole contraption becomes a kind of FAI.
Good, that is a critical insight. I will explore some of the implications of considering what kind of FAI it is.
By “FAI”, we refer to an AI that we can predict with high confidence will be friendly from our deep understanding of how it works. (Deep understanding is not actually part of the definition, but, without it, we are not likely to achieve high (well calibrated) confidence).
So, a kind of FAI that is built out of a UFAI subject to some constraints (or other means of making it friendly), would require us to understand the constraints and the UFAI sufficiently to predict that the UFAI would not be able to break the constraints, which it would of course try to do. (One might imagine constraints that are sufficient not to be broken by any level of intelligence, but I find this unlikely, as a major vulnerability is the possibility of giving us misinformation with regards to issues we don’t understand.)
Which is easier, to understand the UFAI we built without understanding it, and its constraints, to predict the system will be “friendly” (and to actually produce a system, including the UFAI, such that the prediction would be correct), or to directly build an FAI, that is designed from the ground up to be predictably friendly? And which system is more likely to wipe us out if we are wrong, rather than fail to do anything interesting?
I prefer the solution where friendliness is not a patch, but is a critical part of the intelligence itself.
You might be able to recognize the right solution (e.g. a theory) when you see it, while unable to generate it yourself (as fast). If you are sensitive enough to attempts of UFAI to confuse you into doing the wrong thing, just going with the deal may be the best option for it.
You decide whether to build something new before it exists, based on its properties. AI’s decisions are such properties.
And if we are not sensitive enough, our molecules get reused for paperclips. You are talking about matching wits with something that is orders of magnitude smarter than us, thinks orders of magnitude faster than us, has a detailed model of our minds, and doesn’t think we are worth the utilons it can build out of our quarks. Yes, we would think we recognize the right solution, that it is obvious now that it’s been pointed out, and we would be wrong. An argument than a UFAI figured out would be persuasive to us is nowhere near as trustworthy as an argument we figure out ourselves. While we can be wrong about our own arguments, the UFAI will present arguments that we will be systematically wrong about in very dangerous ways.
And for our next trick, we will just ask Omega to build an FAI for us.
No AI, friendly or unfriendly, will ever have a model of our minds as detailed as the model we have of its mind, because we can pause it and inspect its source code while it can’t do anything analogous to us.
I have written plenty of mere desktop applications that are a major pain for a human mind to understand in a debugger. And have you ever written programs that generate machine code for another program? And then tried to inspect that machine code when something went wrong to figure out why?
Well, that stuff is nothing compared to the difficulty of debugging or otherwise understanding an AI, even given full access to inspect its implementation. An unfriendly AI is likely to be built by throwing lots of parallel hardware together until something sticks. If the designers actually knew what they were doing, they would figure out they need to make it friendly. So, yes, you could pause the whole thing, and look at how the billions of CPU’s are interconnected, and the state of the terabytes of memory, but you are not logically omniscient, and you will not make sense of it.
It’s not like you could just inspect the evil bit.