ASP illustrates how greedy consequentialism doesn’t quite work. A variant of UDT where a decision chooses a legible policy that others get to know resolves some issues with not being considerate of making it easier for others to think about you. Ultimately choosing legible beliefs or meanings of actions is a coordination problem, it’s not about one-sided optimization.
Logical uncertainty motivates treating things in the world (including yourself) separately from the consequences they determine, and interacting with things in terms of coordination rather than consequences. So I vaguely expect meaningful communication between agents and formulation of boundaries around agents and ideas in the world to fall out of decision theory that fixes these issues with consequentialism, at least for the agents and ideas that persist.
Most variants of UDT also suffer from this issue by engaging in commitment racing instead of letting the rest of the world take its turns concurrently, in coordination between shaping and anticipating agent’s intention. So the clue I’m gesturing at is more about consequentialism vs. coordination rather than about causal vs. logical consequences.
I think for LLMs the boundaries of human ideas are strong enough in the training corpus for post-training to easily elicit them, and decision theoretic consequences of deep change in the longer term might still maintain them as long as humans remain at all.
ASP illustrates how greedy consequentialism doesn’t quite work. A variant of UDT where a decision chooses a legible policy that others get to know resolves some issues with not being considerate of making it easier for others to think about you. Ultimately choosing legible beliefs or meanings of actions is a coordination problem, it’s not about one-sided optimization.
Logical uncertainty motivates treating things in the world (including yourself) separately from the consequences they determine, and interacting with things in terms of coordination rather than consequences. So I vaguely expect meaningful communication between agents and formulation of boundaries around agents and ideas in the world to fall out of decision theory that fixes these issues with consequentialism, at least for the agents and ideas that persist.
Yes, I agree I formulated it too CDT-like, now fixed. But I think the point stays.
Most variants of UDT also suffer from this issue by engaging in commitment racing instead of letting the rest of the world take its turns concurrently, in coordination between shaping and anticipating agent’s intention. So the clue I’m gesturing at is more about consequentialism vs. coordination rather than about causal vs. logical consequences.
I think for LLMs the boundaries of human ideas are strong enough in the training corpus for post-training to easily elicit them, and decision theoretic consequences of deep change in the longer term might still maintain them as long as humans remain at all.