Megan Kinniment comments on Exploring Mild Behaviour in Embedded Agents

Megan Kinniment 1 Jul 2022 20:37 UTC
LW: 3 AF: 1
AF
1. How does this relate to speed prior and stuff like that?
I list this in the concluding section as something I haven’t thought about much but would think about more if I spent more time on it.
2. If the agent figures out how to build another agent...
Yes, tackling these kinds of issues is the point of this post. I think efficient thinking measures would be very difficult / impossible to actually specify well, and I use compute usage as an example of a crappy efficient thinking measure. The point is that even if the measure is crap, it might still be able to induce some degree of mild optimisation, and this mild optimisation could help protect the measure (alongside the rest of the specification) from the kind of gaming behaviour you describe. In the ‘Potential for Self-Protection Against Gaming’ section, I go through how this works when an agent with a crap efficient thinking measure has the option to perform a ‘gaming’ action such as delegating or making a successor agent.