I’d love your feedback on my thoughts on decision theory.
If you’re trying to get a sense of my approach in order to determine whether it’s interesting enough to be worth your time, I’d suggest starting with this article (3 minute read).
I’m also considering applying for funding to create a conceptual alignment course.
I don’t really understand your approach yet. Let’s call your decision theory CLDT. You say counterfactuals in CLDT should correspond to consistent universes. For example, the counterfactual “what if a CLDT agent two-boxed in Newcomb’s problem” should correspond to a consistent universe where a CLDT agent two-boxes on Newcomb’s problem. Can you describe that universe in more detail?
I don’t want to universally claim that we should always work with consistent universes, but I think we should have a strong bias toward reformulating problems to be consistent when we can. This involves imagining that the past is different in the counterfactual.
You say counterfactuals in CLDT should correspond to consistent universes
That’s not quite what I wrote in this article:
However, this now seems insufficient as I haven’t explained why we should maintain the consistency conditions over comparability after making the ontological shift. In the past, I might have said that these consistency conditions are what define the problem and that if we dropped them it would no longer be Newcomb’s Problem… My current approach now tends to put more focus on the evolutionary process that created the intuitions and instincts underlying these incompatible demands as I believe that this will help us figure out the best way to stitch them together.
I’ll respond to the other component of your question later.
I’d love your feedback on my thoughts on decision theory.
If you’re trying to get a sense of my approach in order to determine whether it’s interesting enough to be worth your time, I’d suggest starting with this article (3 minute read).
I’m also considering applying for funding to create a conceptual alignment course.
I don’t really understand your approach yet. Let’s call your decision theory CLDT. You say counterfactuals in CLDT should correspond to consistent universes. For example, the counterfactual “what if a CLDT agent two-boxed in Newcomb’s problem” should correspond to a consistent universe where a CLDT agent two-boxes on Newcomb’s problem. Can you describe that universe in more detail?
I don’t want to universally claim that we should always work with consistent universes, but I think we should have a strong bias toward reformulating problems to be consistent when we can. This involves imagining that the past is different in the counterfactual.
That’s not quite what I wrote in this article:
I’ll respond to the other component of your question later.