Ofer comments on Gradient hacking

Ofer 27 Apr 2022 19:48 UTC
3 points
[EDIT: sorry, I need to think through this some more.]
- Not Relevant 27 Apr 2022 20:58 UTC
  3 points
  Parent
  I see, so your claim here is that gradient hacking is a convergent strategy for all agents of sufficient intelligence. That’s helpful, thanks.
  
  I am still confused about this in the case that Alice is checking whether or not she has X goal, since by definition it is to her goal Y’s detriment to not have children if she finds she has a different goal Y!=X.