Thanks. I had one question about your Toward Idealized Decision Theory paper.
I can’t say I fully understand UDT, but the ‘updateless’ part does seem very similar to the “act as if you had precommitted to any action that you’d have wanted to precommit to” core idea of NDT. It’s not clear to me that the super powerful UDT would make the wrong decision in the game where two players pick numbers between 0-10 and get payouts based on their pick and the total sum.
Wouldn’t the UDT reason as follows? “If my algorithm were such that I wouldn’t just pick 1 when the human player forced me into it by picking 9 (for instance maybe I always pick 5 in this game), then I may still have a reputation as a powerful predictor but it’s much more likely that I’d also have a reputation as an entity that can’t be bullied like this, so the human would be less likely to pick 9. That state of the world is better for me, so I shouldn’t be the type of agent that makes the greedy choice to pick 1 when I predict the human will pick 9.”
The argument in your paper seems to rely on the human assuming the UDT will reason like a CDT once it knows the human will pick 9.
the ‘updateless’ part does seem very similar to the “act as if you had precommitted to any action that you’d have wanted to precommit to” core idea of NDT
Yep, that’s a common intuition pump people use in order to understand the “updateless” part of UDT.
It’s not clear to me that the super powerful UDT would make the wrong decision in the game where two players pick numbers between 0-10
A proof-based UDT agent would—this follows from the definition of proof-based UDT. Intuitively, we surely want a decision theory that reasons as you said, but the question is, can you write down a decision algorithm that actually reasons like that?
Most people agree with you on the philosophy of how an idealized decision theory should act, but the hard part is formalizing a decision theory that actually does the right things. The difficult part isn’t in the philosophy, the difficult part is turning the philosophy into math :-)
Thanks. I had one question about your Toward Idealized Decision Theory paper.
I can’t say I fully understand UDT, but the ‘updateless’ part does seem very similar to the “act as if you had precommitted to any action that you’d have wanted to precommit to” core idea of NDT. It’s not clear to me that the super powerful UDT would make the wrong decision in the game where two players pick numbers between 0-10 and get payouts based on their pick and the total sum.
Wouldn’t the UDT reason as follows? “If my algorithm were such that I wouldn’t just pick 1 when the human player forced me into it by picking 9 (for instance maybe I always pick 5 in this game), then I may still have a reputation as a powerful predictor but it’s much more likely that I’d also have a reputation as an entity that can’t be bullied like this, so the human would be less likely to pick 9. That state of the world is better for me, so I shouldn’t be the type of agent that makes the greedy choice to pick 1 when I predict the human will pick 9.”
The argument in your paper seems to rely on the human assuming the UDT will reason like a CDT once it knows the human will pick 9.
Yep, that’s a common intuition pump people use in order to understand the “updateless” part of UDT.
A proof-based UDT agent would—this follows from the definition of proof-based UDT. Intuitively, we surely want a decision theory that reasons as you said, but the question is, can you write down a decision algorithm that actually reasons like that?
Most people agree with you on the philosophy of how an idealized decision theory should act, but the hard part is formalizing a decision theory that actually does the right things. The difficult part isn’t in the philosophy, the difficult part is turning the philosophy into math :-)