Another general way to look at it is think about what a policy IS.
A policy is the set of rules any AI system uses to derive the input from the output. It’s how a human or humanoid robot walks, or talks, or any intelligent act. (Since there are rules that update the rules)
Well any non trivial task you realize the policy HAS to account for thousands of variables, including intermediates generated during the policy calculation. It trivially, for any “competitive” policy that does complex things, can exceed the complexity that a human being can grok.
So no matter the method you use to generate a policy it will exceed your ability to review it, and COULD contain “if condition do bad thing” in it.
Another general way to look at it is think about what a policy IS.
A policy is the set of rules any AI system uses to derive the input from the output. It’s how a human or humanoid robot walks, or talks, or any intelligent act. (Since there are rules that update the rules)
Well any non trivial task you realize the policy HAS to account for thousands of variables, including intermediates generated during the policy calculation. It trivially, for any “competitive” policy that does complex things, can exceed the complexity that a human being can grok.
So no matter the method you use to generate a policy it will exceed your ability to review it, and COULD contain “if condition do bad thing” in it.