Thank you! I worked my way through it, and the level of formalism is fine. As you say, it is not meant to include the motivation. I’d appreciate an article that includes the motivation for each element of the formalism.
Also, some concepts were not defined, like “execution history.” If “programs” are pure functions (stateless), I am not sure what a history is. Or maybe there is a temporal model here, like the one in the work of Hutter, Legg etc?
Actually, if I understand correctly, the “programs” P1, P2,.… represent the environment (as expressed in Hutter’s formalism). (Or perhaps P1, P2, … represent different programs the agent could run inside itself?) If P1, P2… are the environment, why have multiple programs, …, when we could combine them into one thing called “environment”? In your article there is a utility function, and Hutter’s model has rewards coming from the environment according to an unknown reward function. But I don’t understand the essential difference between approaches here. Since the final choice is a maxarg, I still haven’t figured out what this definition of UDT adds to the trivial idea “make the choice with highest expected utility.”
The article is great for what it is intended to be , and I am glad we have it. But I’d like to see an intro/overview to UDT.
Just read Daniel Hintze’s BA thesis (Arizona State University). It is the best intro to UDT and TDT I have seen so far.
(My understanding of Hintze’s writing is partly based on lots of other reading on TDT and UDT that I didn’t understand as well, but I think that even if I did not have that background, it would be the best intro.)
I just want to understand UDT, I often need several articles, both popular and more formal, before I really understand something like this.
There have been plenty of articles on UDT, but not an overview.
Here’s a brief write-up of the basic idea of UDT that I wrote awhile back.
Thank you! I worked my way through it, and the level of formalism is fine. As you say, it is not meant to include the motivation. I’d appreciate an article that includes the motivation for each element of the formalism.
Also, some concepts were not defined, like “execution history.” If “programs” are pure functions (stateless), I am not sure what a history is. Or maybe there is a temporal model here, like the one in the work of Hutter, Legg etc?
Actually, if I understand correctly, the “programs” P1, P2,.… represent the environment (as expressed in Hutter’s formalism). (Or perhaps P1, P2, … represent different programs the agent could run inside itself?) If P1, P2… are the environment, why have multiple programs, …, when we could combine them into one thing called “environment”? In your article there is a utility function, and Hutter’s model has rewards coming from the environment according to an unknown reward function. But I don’t understand the essential difference between approaches here. Since the final choice is a maxarg, I still haven’t figured out what this definition of UDT adds to the trivial idea “make the choice with highest expected utility.”
The article is great for what it is intended to be , and I am glad we have it. But I’d like to see an intro/overview to UDT.
Just read Daniel Hintze’s BA thesis (Arizona State University). It is the best intro to UDT and TDT I have seen so far.
(My understanding of Hintze’s writing is partly based on lots of other reading on TDT and UDT that I didn’t understand as well, but I think that even if I did not have that background, it would be the best intro.)