More formally, we can operationalize the behavioral objective as the objective recovered from perfect inverse reinforcement learning (IRL).
Just want to note that I think this is extremely far from a formal definition. I don’t know what perfect IRL would be. Does perfect IRL assume that the agent is perfectly optimal, or can it have biases? How do you determine what the action space is? How do you break ties between reward functions that are equally good on the training data?
I get that definitions are hard—the main thing bothering me here is the “more formally” phrase, not the definition itself. This gives it a veneer of precision that it really doesn’t have.
(I’m pedantic about this because similar implied false precision about the importance of utility functions confused me for half a year.)
Just want to note that I think this is extremely far from a formal definition. I don’t know what perfect IRL would be. Does perfect IRL assume that the agent is perfectly optimal, or can it have biases? How do you determine what the action space is? How do you break ties between reward functions that are equally good on the training data?
I get that definitions are hard—the main thing bothering me here is the “more formally” phrase, not the definition itself. This gives it a veneer of precision that it really doesn’t have.
(I’m pedantic about this because similar implied false precision about the importance of utility functions confused me for half a year.)
You’re completely right; I don’t think we meant to have ‘more formally’ there.