Roko comments on Turing-Test-Passing AI implies Aligned AI

Roko 2 Jan 2025 1:17 UTC
2 points
−1

asking why inner alignment is hard

I don’t think “inner alignment” is applicable here.

If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
- Spencer Ericson 2 Jan 2025 20:44 UTC
  3 points
  2
  Parent
  If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
  Right, I agree on that. The problem is, “behaves indistinguishably” for how long? You can’t guarantee whether it will stop acting that way in the future, which is what is predicted by deceptive alignment.