I’m not sure that scientific talent is the relevant variable here. More talented folk are more likely to achieve both positive and negative outcomes.
Let’s assume that all the other variables are already optimized for to minimize the risk of creating an UFAI. It seems to me that the the relationship between the ability level of the FAI team and probabilities of the possible outcomes must then look something like this:
This chart isn’t meant to communicate my actual estimates of the probabilities and crossover points, but just the overall shapes of the curves. Do you disagree with them? (If you want to draw your own version, click here and then click on “Modify This Chart”.)
Folk do try to estimate and reduce the risks you mentioned, and to investigate alternative non-FAI interventions.
Has anyone posted SIAI’s estimates of those risks?
I would also worry that signalling issues with a diverse external audience can hinder accurate discussion of important topics
That seems reasonable, and given that I’m more interested in the “strategic” as opposed to “tactical” reasoning within SIAI, I’d be happy for it to be communicated through some other means.
If we condition on having all other variables optimized, I’d expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this “halt, melt, and catch fire”) that cannot be shown to be safe (given limited human ability, incentives and bias). If that works (and it’s a separate target in team construction rather than a guarantee, but you specified optimized non-talent variables) then I would expect a big shift of probability from “UFAI” to “null.”
What I’m afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of “safety” used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace “safety” with “security”. These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I’m sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There’s good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I’m coming from.
Your comment has finally convinced me to study some practical crypto because it seems to have fruitful analogies to FAI. It’s especially awesome that one of the references in the linked article is “An Attack Against SSH2 Protocol” by W. Dai.
From where I’m sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations.
Correct me if I’m wrong, but it doesn’t seem as though “proofs” of algorithm correctness fail as frequently as “proofs” of cryptosystem unbreakableness.
Where does your intuition that friendliness proofs are on the order of reliability of cryptosystem proofs come from?
Interesting question. I guess proofs of algorithm correctness fail less often because:
It’s easier to empirically test algorithms to weed out the incorrect ones, so there are fewer efforts to prove conjectures of correctness that are actually false.
It’s easier to formalize what it means for an algorithm to be correct than for a cryptosystem to be secure.
In both respects, proving Friendliness seems even worse than proving security.
What I’m afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or that the formalization of the notion of “safety” used by the proof is wrong.
I can’t count myself “world class” on the raw ability axis, but I’m pretty sure that probability of a team of people like me producing UFAI is very low (in absolute value), as I know when I understand something and when I yet don’t, and I think this property would be even more reliable if I had better raw ability. That is a much more relevant safety factor than ability (but seems harder to test) that changes the shape of UFAI curve. A couple of levels worse than myself, I wouldn’t trust someone’s ability to disbelieve wrong things, so the maximum should probably be in this range, not centered on “world class” in particular.
Could you elaborate on the ability axis. Could you name some people that you perceive to be of world class ability in their field. Could you further explain if you believe that there are people who are sufficiently above that class.
For example, what about Terence Tao? What about the current SIAI team?
Let’s assume that all the other variables are already optimized for to minimize the risk of creating an UFAI. It seems to me that the the relationship between the ability level of the FAI team and probabilities of the possible outcomes must then look something like this:
This chart isn’t meant to communicate my actual estimates of the probabilities and crossover points, but just the overall shapes of the curves. Do you disagree with them? (If you want to draw your own version, click here and then click on “Modify This Chart”.)
Has anyone posted SIAI’s estimates of those risks?
That seems reasonable, and given that I’m more interested in the “strategic” as opposed to “tactical” reasoning within SIAI, I’d be happy for it to be communicated through some other means.
I like this chart.
If we condition on having all other variables optimized, I’d expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this “halt, melt, and catch fire”) that cannot be shown to be safe (given limited human ability, incentives and bias). If that works (and it’s a separate target in team construction rather than a guarantee, but you specified optimized non-talent variables) then I would expect a big shift of probability from “UFAI” to “null.”
What I’m afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of “safety” used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace “safety” with “security”. These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I’m sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There’s good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I’m coming from.
Your comment has finally convinced me to study some practical crypto because it seems to have fruitful analogies to FAI. It’s especially awesome that one of the references in the linked article is “An Attack Against SSH2 Protocol” by W. Dai.
More than fruitful analogies, I’d say: http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/
Correct me if I’m wrong, but it doesn’t seem as though “proofs” of algorithm correctness fail as frequently as “proofs” of cryptosystem unbreakableness.
Where does your intuition that friendliness proofs are on the order of reliability of cryptosystem proofs come from?
Interesting question. I guess proofs of algorithm correctness fail less often because:
It’s easier to empirically test algorithms to weed out the incorrect ones, so there are fewer efforts to prove conjectures of correctness that are actually false.
It’s easier to formalize what it means for an algorithm to be correct than for a cryptosystem to be secure.
In both respects, proving Friendliness seems even worse than proving security.
Thanks for clarifying.
I agree.
I can’t count myself “world class” on the raw ability axis, but I’m pretty sure that probability of a team of people like me producing UFAI is very low (in absolute value), as I know when I understand something and when I yet don’t, and I think this property would be even more reliable if I had better raw ability. That is a much more relevant safety factor than ability (but seems harder to test) that changes the shape of UFAI curve. A couple of levels worse than myself, I wouldn’t trust someone’s ability to disbelieve wrong things, so the maximum should probably be in this range, not centered on “world class” in particular.
Could you elaborate on the ability axis. Could you name some people that you perceive to be of world class ability in their field. Could you further explain if you believe that there are people who are sufficiently above that class.
For example, what about Terence Tao? What about the current SIAI team?