I have also been leaning towards the existence of a theory more general than probability theory, based on a few threads of thinking.
One thread is anthropic reasoning, where it is sometimes clear how to make decision, yet probabilities don’t make sense and it feels to me that the information available in some anthropic situations just “doesn’t decompose” into probabilities. Stuart Armstrong’s paper on the sleeping beauty problem is, I think, valuable and greatly overlooked here.
Another thread is the limited-computation issue. We would all like to have a theory that pins down ideal reasoning, and then work out how to efficiently approximate that theory in a turing machine as a completely separate problem. My intuition is that things just don’t decompose this way. I think that a complete theory of reasoning will make direct reference to models of computation.
This site has collected quite a repertoire of decision problems that challenge causal decision theory. They all share the following property (including your example in the comment above): that in a causal graph containing as a node, there are links from to that do not go via your (for newcomb-like problems) or that do not go via (anthropic problems). Or in other words, your decisions are not independent of your beliefs about the world. The UDT solution says: “instead of drawing a graph containing , draw one that contains and you will see that the independence between beliefs and decisions is restored!”. This feels to me like a patch rather than a full solution, similar to saying “if your variables are correlated and you don’t know how to deal with correlated distributions, try a linear change of variables—maybe you’ll find one that de-correlates them!”. This only works if you’re lucky enough to find a de-correlating change of variables. An alternate approach would be to work out how to deal with non-independent beliefs/decision directly.
One thought experiment I like to do is to ask probability theory to justify itself in a non-circular way. For example, let’s say I propose the following Completely Stupid Theory Of Reasoning. In CSTOR, belief states are represented by a large sheet of paper where I write down everything that I have ever observed. What is my belief state at time t, you ask? Why, it is simply the contents of the entire sheet of paper. But what is my belief state about a specific event? Again, the contents of the entire sheet of paper. How does CSTOR update on new evidence? Easy! I simply add a line of writing to the bottom of the sheet. How does CSTOR marginalize? It doesn’t! Marginalization is just for dummies who use probability theory, and, as you can see, CSTOR can do all the things that a theory of reasoning should do without need for silly marginalization.
So what really distinguishes CSTOR from probability theory? I think the best non-circular answer is that probability theory gives rise to a specific algorithm for making decisions, where CSTOR doesn’t. So I think we should look at decision making as primary and then figure out how to decompose decision making into some abstract belief representation plus abstract notion of utility, plus some abstract algorithm for making decisions.
The UDT solution says: “instead of drawing a graph containing , draw one that contains and you will see that the independence between beliefs and decisions is restored!”
Can you try to come up with a situation where that independence is not restored? If we follow the analogy with correlations, it’s always possible to find a linear map that decorrelates variables...
Ha, indeed. I should have made the analogy with finding a linear change of variables such that the result is decomposable into a product of independent distributions—ie if (x,y) is distributed on a narrow band about the unit circle in R^2 then there is no linear change of variables that renders this distribution independent, yet a (nonlinear) change to polar coordinates does give independence.
Perhaps the way to construct a counterexample to UDT is to try to create causal links between and of the same nature as the links between and the in e.g. Newcomb’s problem. I haven’t thought this through any further.
So I think we should look at decision making as primary and then figure out how to decompose decision making into some abstract belief representation plus abstract notion of utility, plus some abstract algorithm for making decisions.
L. J. Savage does this in his book “The Foundations of Statistics.” This was mentioned by pragmatist upthread, and is summarised here. This is written in 1954, and so it doesn’t deal with weird LW-style situations, but it does found probability in decision theory.
Just for reference, Wei has pointed out that VNM doesn’t work for indexical uncertainty because the axiom of independence is violated. I guess Savage’s theory fails for the same reason. Maybe it’s worthwhile to figure out what mathematical structures would appear if we dropped the axiom of independence, and if there’s any other axiom that can pin down a unique such structure for LW-style problems. I’m trying to think in that direction now, but it’s difficult.
I have also been leaning towards the existence of a theory more general than probability theory, based on a few threads of thinking.
One thread is anthropic reasoning, where it is sometimes clear how to make decision, yet probabilities don’t make sense and it feels to me that the information available in some anthropic situations just “doesn’t decompose” into probabilities. Stuart Armstrong’s paper on the sleeping beauty problem is, I think, valuable and greatly overlooked here.
Another thread is the limited-computation issue. We would all like to have a theory that pins down ideal reasoning, and then work out how to efficiently approximate that theory in a turing machine as a completely separate problem. My intuition is that things just don’t decompose this way. I think that a complete theory of reasoning will make direct reference to models of computation.
This site has collected quite a repertoire of decision problems that challenge causal decision theory. They all share the following property (including your example in the comment above): that in a causal graph containing as a node, there are links from to that do not go via your (for newcomb-like problems) or that do not go via (anthropic problems). Or in other words, your decisions are not independent of your beliefs about the world. The UDT solution says: “instead of drawing a graph containing , draw one that contains and you will see that the independence between beliefs and decisions is restored!”. This feels to me like a patch rather than a full solution, similar to saying “if your variables are correlated and you don’t know how to deal with correlated distributions, try a linear change of variables—maybe you’ll find one that de-correlates them!”. This only works if you’re lucky enough to find a de-correlating change of variables. An alternate approach would be to work out how to deal with non-independent beliefs/decision directly.
One thought experiment I like to do is to ask probability theory to justify itself in a non-circular way. For example, let’s say I propose the following Completely Stupid Theory Of Reasoning. In CSTOR, belief states are represented by a large sheet of paper where I write down everything that I have ever observed. What is my belief state at time t, you ask? Why, it is simply the contents of the entire sheet of paper. But what is my belief state about a specific event? Again, the contents of the entire sheet of paper. How does CSTOR update on new evidence? Easy! I simply add a line of writing to the bottom of the sheet. How does CSTOR marginalize? It doesn’t! Marginalization is just for dummies who use probability theory, and, as you can see, CSTOR can do all the things that a theory of reasoning should do without need for silly marginalization.
So what really distinguishes CSTOR from probability theory? I think the best non-circular answer is that probability theory gives rise to a specific algorithm for making decisions, where CSTOR doesn’t. So I think we should look at decision making as primary and then figure out how to decompose decision making into some abstract belief representation plus abstract notion of utility, plus some abstract algorithm for making decisions.
Can you try to come up with a situation where that independence is not restored? If we follow the analogy with correlations, it’s always possible to find a linear map that decorrelates variables...
Ha, indeed. I should have made the analogy with finding a linear change of variables such that the result is decomposable into a product of independent distributions—ie if (x,y) is distributed on a narrow band about the unit circle in R^2 then there is no linear change of variables that renders this distribution independent, yet a (nonlinear) change to polar coordinates does give independence.
Perhaps the way to construct a counterexample to UDT is to try to create causal links between and of the same nature as the links between and the in e.g. Newcomb’s problem. I haven’t thought this through any further.
L. J. Savage does this in his book “The Foundations of Statistics.” This was mentioned by pragmatist upthread, and is summarised here. This is written in 1954, and so it doesn’t deal with weird LW-style situations, but it does found probability in decision theory.
Just for reference, Wei has pointed out that VNM doesn’t work for indexical uncertainty because the axiom of independence is violated. I guess Savage’s theory fails for the same reason. Maybe it’s worthwhile to figure out what mathematical structures would appear if we dropped the axiom of independence, and if there’s any other axiom that can pin down a unique such structure for LW-style problems. I’m trying to think in that direction now, but it’s difficult.