I think I might actually be happy to take e.g. the Bellman equation, a fundamental equation in RL, as a basic expression of consistent utilities and thereby claim value iteration, Q-learning, and deep Q-learning all as predictions/applications of utility theory. Certainly this seems fair if you claim applications of the central limit theorem for probability theory.
To expand a bit, the Bellman equation only expresses a certain consistency condition among utilities. The expected utility of this state must equal its immediate utility plus the best expected utility among each possible next state I may choose. Start with some random utilities assigned to states, gradually update them to be consistent, and you get optimal behavior. Huge parts of RL are centered around this equation, including e.g. DeepMind using DQNs to crack Atari games.
I understand Eliezer’s frustration in answering this question. The response to “What predictions/applications does utility theory have?” in regards to intelligent behavior is, essentially, “Everything and nothing.”
I think I might actually be happy to take e.g. the Bellman equation, a fundamental equation in RL, as a basic expression of consistent utilities and thereby claim value iteration, Q-learning, and deep Q-learning all as predictions/applications of utility theory. Certainly this seems fair if you claim applications of the central limit theorem for probability theory.
To expand a bit, the Bellman equation only expresses a certain consistency condition among utilities. The expected utility of this state must equal its immediate utility plus the best expected utility among each possible next state I may choose. Start with some random utilities assigned to states, gradually update them to be consistent, and you get optimal behavior. Huge parts of RL are centered around this equation, including e.g. DeepMind using DQNs to crack Atari games.
I understand Eliezer’s frustration in answering this question. The response to “What predictions/applications does utility theory have?” in regards to intelligent behavior is, essentially, “Everything and nothing.”