In the TAISU unconference the original poster asked for some feedback:
I recently wrote a blog post with some others from the DM safety team on specification gaming. We were aiming for a framing of the problem that makes sense to reinforcement learning researchers as well as AI safety researchers. Haven’t received much feedback on it since it came out, so it would be great to hear whether people here found it useful / interesting.
My thoughts: I feel that engaging/reaching out to the wider community of RL researchers is an open problem, in terms of scaling work on AGI safety. So great to see a blog post that also tries to frame this particular problem for a RL researcher audience.
As a member of the AGI safety researcher audience, I echo the comments of johnswenthworth : well-written, great graphics, but mostly stuff that was already obvious. I do like picture ‘spectrum of unexpected solutions’ a lot, this is an interesting way of framing the issues. So, can I read this post as a call to action for AGI safety researchers? Yes, because it identifies two open problem areas, ‘reward design’ and ‘avoidance of reward tampering’, with links.
Can I read the post as a call to action for RL researchers? Short answer: no.
If try to read the post from the standpoint of an RL researcher, what I notice most is the implication that work on ‘RL algorithm design’, on the right in the `aligned RL agent design’ illustrations has an arrow pointing to ‘specification gaming is valid’. If I were an RL algorithm designer, I would read this as saying there is nothing I could contribute, if I stay in my own area of RL algorithm design expertise, to the goal of ‘aligned RL agent design’.
So, is this the intended message that the blog post authors want to send to the RL researcher community? A non-call-to-action? Not sure. So this leaves me puzzled.
[Edited to add:]
In the TAISU discussion we concluded that there is indeed one call to action for RL algorithm designers: the message that, if they are ever making plans to deploy an RL-based system to the real world, it is a good idea to first talk to some AI/AGI safety people about specification gaming risks.
Thanks Koen for your feedback! You make a great point about a clearer call to action for RL researchers. I think an immediate call to action is to be aware of the following:
there is a broader scope of aligned RL agent design
there are difficult unsolved problems in this broader scope
for sufficiently advanced agents, these problems need general solutions rather than ad-hoc ones
Then a long-term call to action (if/when they are in the position to deploy an advanced AI system) is to consider the broader scope and look for general solutions to specification problems rather than deploying ad-hoc solutions. For those general solutions, they could refer to the safety literature and/or consult the safety community.
In the TAISU unconference the original poster asked for some feedback:
My thoughts: I feel that engaging/reaching out to the wider community of RL researchers is an open problem, in terms of scaling work on AGI safety. So great to see a blog post that also tries to frame this particular problem for a RL researcher audience.
As a member of the AGI safety researcher audience, I echo the comments of johnswenthworth : well-written, great graphics, but mostly stuff that was already obvious. I do like picture ‘spectrum of unexpected solutions’ a lot, this is an interesting way of framing the issues. So, can I read this post as a call to action for AGI safety researchers? Yes, because it identifies two open problem areas, ‘reward design’ and ‘avoidance of reward tampering’, with links.
Can I read the post as a call to action for RL researchers? Short answer: no.
If try to read the post from the standpoint of an RL researcher, what I notice most is the implication that work on ‘RL algorithm design’, on the right in the `aligned RL agent design’ illustrations has an arrow pointing to ‘specification gaming is valid’. If I were an RL algorithm designer, I would read this as saying there is nothing I could contribute, if I stay in my own area of RL algorithm design expertise, to the goal of ‘aligned RL agent design’.
So, is this the intended message that the blog post authors want to send to the RL researcher community? A non-call-to-action? Not sure. So this leaves me puzzled.
[Edited to add:]
In the TAISU discussion we concluded that there is indeed one call to action for RL algorithm designers: the message that, if they are ever making plans to deploy an RL-based system to the real world, it is a good idea to first talk to some AI/AGI safety people about specification gaming risks.
Thanks Koen for your feedback! You make a great point about a clearer call to action for RL researchers. I think an immediate call to action is to be aware of the following:
there is a broader scope of aligned RL agent design
there are difficult unsolved problems in this broader scope
for sufficiently advanced agents, these problems need general solutions rather than ad-hoc ones
Then a long-term call to action (if/when they are in the position to deploy an advanced AI system) is to consider the broader scope and look for general solutions to specification problems rather than deploying ad-hoc solutions. For those general solutions, they could refer to the safety literature and/or consult the safety community.