If it does get edited out[1] then it was just not a good example. The more general point is that for any physically-possible behavioral policy, there is a corresponding possible program which would exhibit that policy.
And it could as written, at least because it’s slightly inefficient. I could have postulated it to be a part of a traditional terminal value function, in which case I don’t think it does, because editing a terminal value function is contrary to that function and if the program is robust to wireheading in general
Using different vocabulary doesn’t change anything (and if it seems like just vocabulary, you likely misunderstood). I also had seen that comment already.
You claim that utility function could ignore improbable outcomes
I agree with your claim. But it seems to me that your claim is not directly related to my claim. Self-preservation is not part of utility function (instrumental convergence). How can you affect it?
If it does get edited out[1] then it was just not a good example. The more general point is that for any physically-possible behavioral policy, there is a corresponding possible program which would exhibit that policy.
And it could as written, at least because it’s slightly inefficient. I could have postulated it to be a part of a traditional terminal value function, in which case I don’t think it does, because editing a terminal value function is contrary to that function and if the program is robust to wireheading in general
OK, so using your vocabulary I think that’s the point I want to make—alignment is physically-impossible behavioral policy.
I elaborated a bit more there https://www.lesswrong.com/posts/AdS3P7Afu8izj2knw/orthogonality-thesis-burden-of-proof?commentId=qoXw7Yz4xh6oPcP9i
What you think?
Using different vocabulary doesn’t change anything (and if it seems like just vocabulary, you likely misunderstood). I also had seen that comment already.
Afaict, I have nothing more to say here.
It seems to me that you don’t hear me...
I claim that utility function is irrelevant
You claim that utility function could ignore improbable outcomes
I agree with your claim. But it seems to me that your claim is not directly related to my claim. Self-preservation is not part of utility function (instrumental convergence). How can you affect it?