A) Is there video? Is it going up on Rob Miles’ third channel?
B) I’m not sure I agree with you about where the Christ analogy goes. By the definition, AI “Christ” makes the same value decisions as me for the same reasons. That’s the thing that there’s just one way to do (more or less). But that’s not what I want, because I want an AI that can tackle complicated situations that I’d have trouble understanding, and I want an AI that will sometimes make decisions the way I want them made, not the way I actually make them.
A) There is a video, but it’s not super high quality and I think the transcript is better. If you really want to listen to it, though, you can take a look here.
B) Yeah, I agree with that. Perhaps the thing I said in the talk was too strong—the thing I mean is a model where the objective is essentially the same as what you want, but the optimization process and world model are potentially quite superior. I still think there’s still approximately only one of those, though, since you have to get the objective to exactly match onto what you want.
I still think there’s still approximately only one of those, though, since you have to get the objective to exactly match onto what you want.
Once you’re trying to extrapolate me rather than just copy me as-is, there are multiple ways to do the extrapolation. But I’d agree it’s still way less entropy than deceptive alignment.
A) Is there video? Is it going up on Rob Miles’ third channel?
B) I’m not sure I agree with you about where the Christ analogy goes. By the definition, AI “Christ” makes the same value decisions as me for the same reasons. That’s the thing that there’s just one way to do (more or less). But that’s not what I want, because I want an AI that can tackle complicated situations that I’d have trouble understanding, and I want an AI that will sometimes make decisions the way I want them made, not the way I actually make them.
A) There is a video, but it’s not super high quality and I think the transcript is better. If you really want to listen to it, though, you can take a look here.
B) Yeah, I agree with that. Perhaps the thing I said in the talk was too strong—the thing I mean is a model where the objective is essentially the same as what you want, but the optimization process and world model are potentially quite superior. I still think there’s still approximately only one of those, though, since you have to get the objective to exactly match onto what you want.
Once you’re trying to extrapolate me rather than just copy me as-is, there are multiple ways to do the extrapolation. But I’d agree it’s still way less entropy than deceptive alignment.