quila comments on Rationality vs Alignment

quila 8 Jul 2024 12:33 UTC
1 point
0
If it does get edited out^[1] then it was just not a good example. The more general point is that for any physically-possible behavioral policy, there is a corresponding possible program which would exhibit that policy.
1. ^
  And it could as written, at least because it’s slightly inefficient. I could have postulated it to be a part of a traditional terminal value function, in which case I don’t think it does, because editing a terminal value function is contrary to that function and if the program is robust to wireheading in general
- Donatas Lučiūnas 8 Jul 2024 13:01 UTC
  3 points
  0
  Parent
  OK, so using your vocabulary I think that’s the point I want to make—alignment is physically-impossible behavioral policy.
  I elaborated a bit more there https://www.lesswrong.com/posts/AdS3P7Afu8izj2knw/orthogonality-thesis-burden-of-proof?commentId=qoXw7Yz4xh6oPcP9i
  What you think?
  - quila 8 Jul 2024 13:02 UTC
    1 point
    0
    Parent
    Using different vocabulary doesn’t change anything (and if it seems like just vocabulary, you likely misunderstood). I also had seen that comment already.
    Afaict, I have nothing more to say here.
    - Donatas Lučiūnas 8 Jul 2024 13:32 UTC
      1 point
      0
      Parent
      It seems to me that you don’t hear me...
      I claim that utility function is irrelevant
      You claim that utility function could ignore improbable outcomes
      I agree with your claim. But it seems to me that your claim is not directly related to my claim. Self-preservation is not part of utility function (instrumental convergence). How can you affect it?