Thank you! I worked my way through it, and the level of formalism is fine. As you say, it is not meant to include the motivation. I’d appreciate an article that includes the motivation for each element of the formalism.
Also, some concepts were not defined, like “execution history.” If “programs” are pure functions (stateless), I am not sure what a history is. Or maybe there is a temporal model here, like the one in the work of Hutter, Legg etc?
Actually, if I understand correctly, the “programs” P1, P2,.… represent the environment (as expressed in Hutter’s formalism). (Or perhaps P1, P2, … represent different programs the agent could run inside itself?) If P1, P2… are the environment, why have multiple programs, …, when we could combine them into one thing called “environment”? In your article there is a utility function, and Hutter’s model has rewards coming from the environment according to an unknown reward function. But I don’t understand the essential difference between approaches here. Since the final choice is a maxarg, I still haven’t figured out what this definition of UDT adds to the trivial idea “make the choice with highest expected utility.”
The article is great for what it is intended to be , and I am glad we have it. But I’d like to see an intro/overview to UDT.
Here’s a brief write-up of the basic idea of UDT that I wrote awhile back.
Thank you! I worked my way through it, and the level of formalism is fine. As you say, it is not meant to include the motivation. I’d appreciate an article that includes the motivation for each element of the formalism.
Also, some concepts were not defined, like “execution history.” If “programs” are pure functions (stateless), I am not sure what a history is. Or maybe there is a temporal model here, like the one in the work of Hutter, Legg etc?
Actually, if I understand correctly, the “programs” P1, P2,.… represent the environment (as expressed in Hutter’s formalism). (Or perhaps P1, P2, … represent different programs the agent could run inside itself?) If P1, P2… are the environment, why have multiple programs, …, when we could combine them into one thing called “environment”? In your article there is a utility function, and Hutter’s model has rewards coming from the environment according to an unknown reward function. But I don’t understand the essential difference between approaches here. Since the final choice is a maxarg, I still haven’t figured out what this definition of UDT adds to the trivial idea “make the choice with highest expected utility.”
The article is great for what it is intended to be , and I am glad we have it. But I’d like to see an intro/overview to UDT.