Yes, the former. If the agent takes actions and receives reward, assuming it can see the reward, then it will gain evidence about its utility function.
Probably you already know this, but the framework known as reinforcement learning is very relevant here. In particular, there are probably web pages that describe how to compute the expected utility of a (strategy, reward function) pair.
Yes, the former. If the agent takes actions and receives reward, assuming it can see the reward, then it will gain evidence about its utility function.
Probably you already know this, but the framework known as reinforcement learning is very relevant here. In particular, there are probably web pages that describe how to compute the expected utility of a (strategy, reward function) pair.