Dagon comments on Vulnerabilities in CDT and TI-unaware agents

Dagon 10 Mar 2020 16:59 UTC
5 points
I like this line of thinking—the impact of awareness of future changes in utility function is under-studied. I do wish we’d stop bothering with the strawman of naive-CDT, it’s distracting and wasteful to dismiss this thing that nobody is seriously arguing for.
It’s probably time we start to get more formal about what a reward is—are we modeling it as point-in-time desirability of the state of the universe (I hope), or as an average over time or cumulative value over time (more complicates, and probably unnecessary)?
And that leads to a modeling question of what to optimize when you think a reachable universe state will have positive utility for some time an negative utility for some time. Expected inconsistency really breaks a whole lot of foundational assumptions of decision theory.