just because it’s possible in theory doesn’t mean we are anywhere close to doing it
that’s a good point, but then you have to explain why it would be hard to make a functional digital copy of a human given that we can make AIs like ChatGPT-o1 that are at 99th percentile human performance on most short-term tasks. What is the blocker?
Of course this question can be settled empirically.…
It sounds like you’re asking why inner alignment is hard (or maybe why it’s harder than outer alignment?). I’m pretty new here—I don’t think I can explain that any better than the top posts in the tag.
Re: o1, it’s not clear to me that o1 is an instantiation of a creator’s highly specific vision. It seems more to me like we tried something, didn’t know exactly where it would end up, but it sure is nice that it ended up in a useful place. It wasn’t planned in advance exactly what o1 would be good at/bad at, and to what extent—the way that if you were copying a human, you’d have to be way more careful to consider and copy a lot of details.
I don’t think “inner alignment” is applicable here.
If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
Right, I agree on that. The problem is, “behaves indistinguishably” for how long? You can’t guarantee whether it will stop acting that way in the future, which is what is predicted by deceptive alignment.
that’s a good point, but then you have to explain why it would be hard to make a functional digital copy of a human given that we can make AIs like ChatGPT-o1 that are at 99th percentile human performance on most short-term tasks. What is the blocker?
Of course this question can be settled empirically.…
It sounds like you’re asking why inner alignment is hard (or maybe why it’s harder than outer alignment?). I’m pretty new here—I don’t think I can explain that any better than the top posts in the tag.
Re: o1, it’s not clear to me that o1 is an instantiation of a creator’s highly specific vision. It seems more to me like we tried something, didn’t know exactly where it would end up, but it sure is nice that it ended up in a useful place. It wasn’t planned in advance exactly what o1 would be good at/bad at, and to what extent—the way that if you were copying a human, you’d have to be way more careful to consider and copy a lot of details.
I don’t think “inner alignment” is applicable here.
If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
Right, I agree on that. The problem is, “behaves indistinguishably” for how long? You can’t guarantee whether it will stop acting that way in the future, which is what is predicted by deceptive alignment.