I think one of the central issues is that even when we’re trying to role-play as superintelligent AI, we’re still drawing situations and responses from a very different distribution than what we want to actually apply superintelligent AI to. And so you’re still faced with all the same problems of generalization (including inner alignment failure).
Curious if you saw this post. I’m not against big datasets of highly ethical text (BDOHET), but I think that if we’re going to be tackling the problem of generalizing how humans want to be generalized anyhow, we can probably get away with the BDOHET being pretty mundane.
I think one of the central issues is that even when we’re trying to role-play as superintelligent AI, we’re still drawing situations and responses from a very different distribution than what we want to actually apply superintelligent AI to. And so you’re still faced with all the same problems of generalization (including inner alignment failure).
Curious if you saw this post. I’m not against big datasets of highly ethical text (BDOHET), but I think that if we’re going to be tackling the problem of generalizing how humans want to be generalized anyhow, we can probably get away with the BDOHET being pretty mundane.