I am not talking about some slim chance of something happening to go wrong. I am talking about a hostile super intelligence systematically arranging for us to make dangerous critical mistakes. I don’t know exactly what the UFAI will do anymore than I can predict which move Kasparov will make in a chess game, but I can predict with greater confidence than I can predict Kasparov would win a chess game against a mere grand master, that, given that we build and use a UFAI, we will be the bug and the UFAI will be the windshield. Splat!
If your standard here is if something is “at all plausible”, then the discussion is trivial, nothing has probability 0. However, none of the proposals for using UFAI discussed here are plans that I expect positive utility from. (Which is putting it mildly, I expect them all to result in the annihilation of humanity with high probability, and to produce good results only in the conjunction of many slim chances of things happening to go right.)
You should speak for yourself about what is already understood, what you think is not a good idea. Warrigal seems to think using UFAI can be a good idea.
Good, that is a critical insight. I will explore some of the implications of considering what kind of FAI it is.
By “FAI”, we refer to an AI that we can predict with high confidence will be friendly from our deep understanding of how it works. (Deep understanding is not actually part of the definition, but, without it, we are not likely to achieve high (well calibrated) confidence).
So, a kind of FAI that is built out of a UFAI subject to some constraints (or other means of making it friendly), would require us to understand the constraints and the UFAI sufficiently to predict that the UFAI would not be able to break the constraints, which it would of course try to do. (One might imagine constraints that are sufficient not to be broken by any level of intelligence, but I find this unlikely, as a major vulnerability is the possibility of giving us misinformation with regards to issues we don’t understand.)
Which is easier, to understand the UFAI we built without understanding it, and its constraints, to predict the system will be “friendly” (and to actually produce a system, including the UFAI, such that the prediction would be correct), or to directly build an FAI, that is designed from the ground up to be predictably friendly? And which system is more likely to wipe us out if we are wrong, rather than fail to do anything interesting?
I prefer the solution where friendliness is not a patch, but is a critical part of the intelligence itself.
I am not talking about some slim chance of something happening to go wrong. I am talking about a hostile super intelligence systematically arranging for us to make dangerous critical mistakes. I don’t know exactly what the UFAI will do anymore than I can predict which move Kasparov will make in a chess game, but I can predict with greater confidence than I can predict Kasparov would win a chess game against a mere grand master, that, given that we build and use a UFAI, we will be the bug and the UFAI will be the windshield. Splat!
If your standard here is if something is “at all plausible”, then the discussion is trivial, nothing has probability 0. However, none of the proposals for using UFAI discussed here are plans that I expect positive utility from. (Which is putting it mildly, I expect them all to result in the annihilation of humanity with high probability, and to produce good results only in the conjunction of many slim chances of things happening to go right.)
You should speak for yourself about what is already understood, what you think is not a good idea. Warrigal seems to think using UFAI can be a good idea.
When you figure out how to arrange the usage of UFAI that is a good idea, the whole contraption becomes a kind of FAI.
Good, that is a critical insight. I will explore some of the implications of considering what kind of FAI it is.
By “FAI”, we refer to an AI that we can predict with high confidence will be friendly from our deep understanding of how it works. (Deep understanding is not actually part of the definition, but, without it, we are not likely to achieve high (well calibrated) confidence).
So, a kind of FAI that is built out of a UFAI subject to some constraints (or other means of making it friendly), would require us to understand the constraints and the UFAI sufficiently to predict that the UFAI would not be able to break the constraints, which it would of course try to do. (One might imagine constraints that are sufficient not to be broken by any level of intelligence, but I find this unlikely, as a major vulnerability is the possibility of giving us misinformation with regards to issues we don’t understand.)
Which is easier, to understand the UFAI we built without understanding it, and its constraints, to predict the system will be “friendly” (and to actually produce a system, including the UFAI, such that the prediction would be correct), or to directly build an FAI, that is designed from the ground up to be predictably friendly? And which system is more likely to wipe us out if we are wrong, rather than fail to do anything interesting?
I prefer the solution where friendliness is not a patch, but is a critical part of the intelligence itself.