Are you bringing up wireheading to answer yes or no to my question (of whether RL is more prone to gradient hacking)? To me, it sounds like you’re suggesting a no, but I think it’s in support of the idea that RL might be prone to gradient hacking. The AI, like me, avoids wireheading itself and so will never be modified by gradient descent towards wireheading because gradient descent doesn’t know anything about wireheading until it’s been tried. So that is an example of gradient hacking itself, isn’t it? Unlike in a supervised learning setup where the gradient descent ‘knows’ about all possible options and will modify any subagents that avoid giving the right answer.
So am I a gradient hacker whenever I just say no to drugs?
Are you bringing up wireheading to answer yes or no to my question (of whether RL is more prone to gradient hacking)? To me, it sounds like you’re suggesting a no, but I think it’s in support of the idea that RL might be prone to gradient hacking. The AI, like me, avoids wireheading itself and so will never be modified by gradient descent towards wireheading because gradient descent doesn’t know anything about wireheading until it’s been tried. So that is an example of gradient hacking itself, isn’t it? Unlike in a supervised learning setup where the gradient descent ‘knows’ about all possible options and will modify any subagents that avoid giving the right answer.
So am I a gradient hacker whenever I just say no to drugs?