It would want to, because it’s goal is defined as “tell the truth”.
You have to differentiate between the goal we are trying to find (the optimal one) and the goal that is actually controlling what the AI does (“tell the truth”), while we are still looking for what that optimal goal could be.
the optimal goal is only implemented later, when we are sure that there are no bugs.
It would want to, because it’s goal is defined as “tell the truth”.
You have to differentiate between the goal we are trying to find (the optimal one) and the goal that is actually controlling what the AI does (“tell the truth”), while we are still looking for what that optimal goal could be.
the optimal goal is only implemented later, when we are sure that there are no bugs.