Measure comments on The blue-minimising robot and model splintering

Measure 28 May 2021 17:51 UTC
1 point
At step 8, why is the AI motivated to care about the idealized goal rather than just the reward signal? Are we assuming that the reward signal is determined by performance wrt the ideal goal?
- Stuart_Armstrong 28 May 2021 19:27 UTC
  2 points
  Parent
  That is the aim. It’s easy to program an AI that doesn’t care too much about the reward signal—the trick is to find a way that it doesn’t care in a specific way that aligns it with our preferences.
  
  eg what would you do if you had been told to maximise some goal, but were told that your reward signal would be corrupted and over-simplified? You can start doing some things in that situation to maximise your chance of not-wireheading; I want to program the AI to do similarly.