The probability I assign to achieving a capability state where it is (1) possible to prove a mind Friendly even if it has been constructed by a hostile superintelligence, (2) possible to build a hostile superintelligence, and (3) not possible to build a Friendly AI directly, is very low.
Could you elaborate on this? Your mere assertion is enough to make me much less confident than I was when I posted this comment. But I would be interested in a more object-level argument. (The fact that your own approach to building an FAI wouldn’t pass through such a stage doesn’t seem like enough to drive the probability “very low”.)
The FAI theory required to build a proof for 1 would have to be very versatile, you have to understand friendliness very well to do it. 2 requires some understanding of the nature of intelligence. (especially if we know with well enough to put it in a box that we’re building a superintelligence) If you understand friendliness that well, and intellignece that well, then friendly intelligence should be easy.
We never built space suits for horses, because by the time we figured out how to get to the moon, we also figured out electric rovers.
If you understand friendliness that well, and intellignece that well, then friendly intelligence should be easy.
Eliezer has spent years making the case that FAI is far, far, far more specific than AI. A theory of intelligence adequate to building an AI could still be very far from a theory of Friendliness adequate to building an FAI, couldn’t it?
So, suppose that we know how to build an AI, but we’re smart enough not to build one until we have a theory of Friendliness. You seem to be saying that, at this point, we should consider the problem of constructing a certifier of Friendliness to be essentially no easier than constructing FAI source code. Why? What is the argument for thinking that FAI is very likely to be one of those problems were certifying a solution is no easier than solving from scratch?
We never built space suits for horses, because by the time we figured out how to get to the moon, we also figured out electric rovers.
This doesn’t seem analogous at all. It’s hard to imagine how we could have developed the technology to get to the moon without having built electric land vehicles along the way. I hope that I’m not indulging in too much hindsight bias when I say that, conditioned on our getting to the moon, our getting to electric-rover technology first was very highly probable. No one had to take special care to make sure that the technologies were developed in that order.
But, if I understand Eliezer’s position correctly, we could easily solve the problem of AGI while still being very far from a theory of Friendliness. That is the scenario that he has dedicated his life to avoiding, isn’t it?
Could you elaborate on this? Your mere assertion is enough to make me much less confident than I was when I posted this comment. But I would be interested in a more object-level argument. (The fact that your own approach to building an FAI wouldn’t pass through such a stage doesn’t seem like enough to drive the probability “very low”.)
The FAI theory required to build a proof for 1 would have to be very versatile, you have to understand friendliness very well to do it. 2 requires some understanding of the nature of intelligence. (especially if we know with well enough to put it in a box that we’re building a superintelligence) If you understand friendliness that well, and intellignece that well, then friendly intelligence should be easy.
We never built space suits for horses, because by the time we figured out how to get to the moon, we also figured out electric rovers.
Eliezer has spent years making the case that FAI is far, far, far more specific than AI. A theory of intelligence adequate to building an AI could still be very far from a theory of Friendliness adequate to building an FAI, couldn’t it?
So, suppose that we know how to build an AI, but we’re smart enough not to build one until we have a theory of Friendliness. You seem to be saying that, at this point, we should consider the problem of constructing a certifier of Friendliness to be essentially no easier than constructing FAI source code. Why? What is the argument for thinking that FAI is very likely to be one of those problems were certifying a solution is no easier than solving from scratch?
This doesn’t seem analogous at all. It’s hard to imagine how we could have developed the technology to get to the moon without having built electric land vehicles along the way. I hope that I’m not indulging in too much hindsight bias when I say that, conditioned on our getting to the moon, our getting to electric-rover technology first was very highly probable. No one had to take special care to make sure that the technologies were developed in that order.
But, if I understand Eliezer’s position correctly, we could easily solve the problem of AGI while still being very far from a theory of Friendliness. That is the scenario that he has dedicated his life to avoiding, isn’t it?