This is not how IRL works at all. The utility function does not come from a special reward channel controlled by a human. There is no button.
To reiterate my description earlier, IRL is based on inferring the unknown utility function of an agent given examples of the agent’s behaviour in terms of observations and actions. The utility function is entirely an internal component of the model.
This is not how IRL works at all. The utility function does not come from a special reward channel controlled by a human. There is no button.
To reiterate my description earlier, IRL is based on inferring the unknown utility function of an agent given examples of the agent’s behaviour in terms of observations and actions. The utility function is entirely an internal component of the model.