Good, that is a critical insight. I will explore some of the implications of considering what kind of FAI it is.
By “FAI”, we refer to an AI that we can predict with high confidence will be friendly from our deep understanding of how it works. (Deep understanding is not actually part of the definition, but, without it, we are not likely to achieve high (well calibrated) confidence).
So, a kind of FAI that is built out of a UFAI subject to some constraints (or other means of making it friendly), would require us to understand the constraints and the UFAI sufficiently to predict that the UFAI would not be able to break the constraints, which it would of course try to do. (One might imagine constraints that are sufficient not to be broken by any level of intelligence, but I find this unlikely, as a major vulnerability is the possibility of giving us misinformation with regards to issues we don’t understand.)
Which is easier, to understand the UFAI we built without understanding it, and its constraints, to predict the system will be “friendly” (and to actually produce a system, including the UFAI, such that the prediction would be correct), or to directly build an FAI, that is designed from the ground up to be predictably friendly? And which system is more likely to wipe us out if we are wrong, rather than fail to do anything interesting?
I prefer the solution where friendliness is not a patch, but is a critical part of the intelligence itself.
Good, that is a critical insight. I will explore some of the implications of considering what kind of FAI it is.
By “FAI”, we refer to an AI that we can predict with high confidence will be friendly from our deep understanding of how it works. (Deep understanding is not actually part of the definition, but, without it, we are not likely to achieve high (well calibrated) confidence).
So, a kind of FAI that is built out of a UFAI subject to some constraints (or other means of making it friendly), would require us to understand the constraints and the UFAI sufficiently to predict that the UFAI would not be able to break the constraints, which it would of course try to do. (One might imagine constraints that are sufficient not to be broken by any level of intelligence, but I find this unlikely, as a major vulnerability is the possibility of giving us misinformation with regards to issues we don’t understand.)
Which is easier, to understand the UFAI we built without understanding it, and its constraints, to predict the system will be “friendly” (and to actually produce a system, including the UFAI, such that the prediction would be correct), or to directly build an FAI, that is designed from the ground up to be predictably friendly? And which system is more likely to wipe us out if we are wrong, rather than fail to do anything interesting?
I prefer the solution where friendliness is not a patch, but is a critical part of the intelligence itself.