It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
Consequentialism is not in the index.
Decision rule is, a little bit.
I don’t think this book contains a proof mentioning consequentialism. Do you disagree? Give a page or section?
It looks like what they are doing is defining a decision rule in a special way. So, by definition, it has to be a mathematical thing to do with probability. Then after that, I’m sure it’s rather easy to prove that you should use bayes’ theorem rather than some other math.
But none of that is about decisions rules in the sense of methods human beings use for making decisions. It’s just if you define them in a particular way—so that Bayes’ is basically the only option—then you can prove it.
see e.g. page 19 where they give a definition. A Popperian approach to making decisions simply wouldn’t fit within the scope of their definition, so the conclusion of any proof like you claimed existed (which i haven’t found in this book) would not apply to Popperian ideas.
Maybe there is a lesson here about believing stuff is proven when you haven’t seen the proof, listening to hearsay about what books contain, and trying to apply proofs you aren’t familiar with (they often have limits on scope).
It says a decision rule (their term) is a function of the sample space, mapping something like complete sets of possible data to things people do. (I think it needs to be complete sets of all your data to be applied to real world human decision making. They don’t explain what they are talking about in the type of way I think is good and clear. I think that’s due to having in mind different problems they are trying to solve than I have. We have different goals without even very much overlap. They both involve “decisions” but we mean different things by the word.)
In real life, people use many different decision rules (my term, not theirs). And people deal with clashes between them.
You may claim that my multiple decision rules can be combined into one mathematical function. That is so. But the result isn’t a smooth function so when they start talking about estimation they have big problems! And this is the kind of thing I would expect to get acknowledgement and discussion if they were trying to talk about how humans make decisions, in practice, rather than just trying to define some terms (chosen to sound like they have something to do with what humans do) and then proceed with math.
e.g. they try to talk about estimating amount of error. if you know error bars on your data, and you have a smooth function, you’re maybe kind of OK with imperfect data. but if your function has a great many jumps in it, what are you to do? what if, within the margin for error on something, there’s several discontinuities? i think they are conceiving of the decision rule function as being smooth and not thinking about what happens when it’s very messy. Maybe they specified some assumptions so that it has to be which I missed, but anyway human beings have tons of contradictory and not-yet-integrated ideas in their head—mistakes and separate topics they haven’t connected yet, and more—and so it’s not smooth.
On a similar note they talk about the median and mean which also don’t mean much when it’s not smooth. Who cares what the mean is over an infinitely large sample space where you get all sorts of unrepresentative results in large unrealistic portions of it? So again I think they are looking at the issues differently than me. They expect things like mathematically friendly distributions (for which means and medians are useful); I don’t.
Moving on to a different issue, they conceive of a decision rule which takes input and then gives output. I do not conceive of people starting with the input and then deciding the output. I think decision making is more complicated. While thinking about the input, people create more input—their thoughts. The input is constantly being changed during the decision process, it’s not a fixed quantity to have a function of. Also being changed during any significant decision is the decision rule itself—it too isn’t a static function even for purposes of doing one decision (at least in the normal sense. maybe they would want to call every step in the process a decision. so when you’re deciding a flavor of ice cream that might involve 50 decisions, with updates to the decisions rules and inputs in between them. if they want to do something like that they do not explain how it works.)
They conceive of the input to decisions as “data”. But I conceive of much thinking as not using much empirical data, if any. I would pick a term that emphasizes it. The input to all decision making is really ideas, some of which are about empirical data and some of which aren’t. Data is a special case, not the right term for the general case. From this I take that they are empiricists. You can find a refutation of empiricism in The Beginning of Infinity by David Deutsch but anyway it’s a difference between us.
A Popperian approach to decision making would focus more on philosophical problems, and their solutions. It would say things like: consider what problem you’re trying to solve, and consider what actions may solve it. And criticize your ideas to eliminate errors. And … well no short summary does it justice. I’ve tried a few times here. But Popperian ways of thinking are not intuitive to people with the justificationist biases dominant in our culture. Maybe if you like everything I said I’ll try to explain more, but in that case I don’t know why you wouldn’t read some books which are more polished than what I would type in. If you have a specific, narrow question I can see that answering that would make sense.
Thank you for that detailed reply. I just have a few comments:
“data” could be any observable property of the world
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
there’s no requirement that the decision function be smooth—it’s just useful to look at such functions first for pedagogical reasons. All of the math continues to work in the presence of discontinuities.
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
“data” could be any observable property of the world
Yes but using it to refer to a person’s ideas, without clarification, would be bizarre and many readers wouldn’t catch on.
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
Straight to the final, perfect truth? lol… That’s extremely unPopperian. We don’t expect progress to just end like that. We don’t expect you get so far and then there’s nothing further. We don’t think the scope for reason is so bounded, nor do we think fallibility is so easily defeated.
In practice searches for optimal things of this kind always involve many premises with have substantial philosophical meaning. (Which is often, IMO, wrong.)
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
Does it use an infinite set of all possible actions? I would have thought it wouldn’t rely on knowing what each action actually is, but would just broadly specify the set of all actions and move on.
@smooth: what good is a mean or median with no smoothness? And for margins of error, with a non-smooth function, what do you do?
With a smooth region of a function, taking the midpoint of the margin of error region is reasonable enough. But when there is a discontinuity, there’s no way to average it and get a good result. Mixing different ideas is a hard process if you want anything useful to result. If you just do it in a simple way like averaging you end up with a result that none of the ideas think will work and shouldn’t be surprised when it doesn’t. It’s kind of like how if you have half an army do one general’s plan, and half do another, the result is worse than doing either one.
Consequentialism is not in the index.
Decision rule is, a little bit.
I don’t think this book contains a proof mentioning consequentialism. Do you disagree? Give a page or section?
It looks like what they are doing is defining a decision rule in a special way. So, by definition, it has to be a mathematical thing to do with probability. Then after that, I’m sure it’s rather easy to prove that you should use bayes’ theorem rather than some other math.
But none of that is about decisions rules in the sense of methods human beings use for making decisions. It’s just if you define them in a particular way—so that Bayes’ is basically the only option—then you can prove it.
see e.g. page 19 where they give a definition. A Popperian approach to making decisions simply wouldn’t fit within the scope of their definition, so the conclusion of any proof like you claimed existed (which i haven’t found in this book) would not apply to Popperian ideas.
Maybe there is a lesson here about believing stuff is proven when you haven’t seen the proof, listening to hearsay about what books contain, and trying to apply proofs you aren’t familiar with (they often have limits on scope).
In what way would the Popperian approach fail to fit the decision rule approach on page 19 of Bickel and Doksum?
It says a decision rule (their term) is a function of the sample space, mapping something like complete sets of possible data to things people do. (I think it needs to be complete sets of all your data to be applied to real world human decision making. They don’t explain what they are talking about in the type of way I think is good and clear. I think that’s due to having in mind different problems they are trying to solve than I have. We have different goals without even very much overlap. They both involve “decisions” but we mean different things by the word.)
In real life, people use many different decision rules (my term, not theirs). And people deal with clashes between them.
You may claim that my multiple decision rules can be combined into one mathematical function. That is so. But the result isn’t a smooth function so when they start talking about estimation they have big problems! And this is the kind of thing I would expect to get acknowledgement and discussion if they were trying to talk about how humans make decisions, in practice, rather than just trying to define some terms (chosen to sound like they have something to do with what humans do) and then proceed with math.
e.g. they try to talk about estimating amount of error. if you know error bars on your data, and you have a smooth function, you’re maybe kind of OK with imperfect data. but if your function has a great many jumps in it, what are you to do? what if, within the margin for error on something, there’s several discontinuities? i think they are conceiving of the decision rule function as being smooth and not thinking about what happens when it’s very messy. Maybe they specified some assumptions so that it has to be which I missed, but anyway human beings have tons of contradictory and not-yet-integrated ideas in their head—mistakes and separate topics they haven’t connected yet, and more—and so it’s not smooth.
On a similar note they talk about the median and mean which also don’t mean much when it’s not smooth. Who cares what the mean is over an infinitely large sample space where you get all sorts of unrepresentative results in large unrealistic portions of it? So again I think they are looking at the issues differently than me. They expect things like mathematically friendly distributions (for which means and medians are useful); I don’t.
Moving on to a different issue, they conceive of a decision rule which takes input and then gives output. I do not conceive of people starting with the input and then deciding the output. I think decision making is more complicated. While thinking about the input, people create more input—their thoughts. The input is constantly being changed during the decision process, it’s not a fixed quantity to have a function of. Also being changed during any significant decision is the decision rule itself—it too isn’t a static function even for purposes of doing one decision (at least in the normal sense. maybe they would want to call every step in the process a decision. so when you’re deciding a flavor of ice cream that might involve 50 decisions, with updates to the decisions rules and inputs in between them. if they want to do something like that they do not explain how it works.)
They conceive of the input to decisions as “data”. But I conceive of much thinking as not using much empirical data, if any. I would pick a term that emphasizes it. The input to all decision making is really ideas, some of which are about empirical data and some of which aren’t. Data is a special case, not the right term for the general case. From this I take that they are empiricists. You can find a refutation of empiricism in The Beginning of Infinity by David Deutsch but anyway it’s a difference between us.
A Popperian approach to decision making would focus more on philosophical problems, and their solutions. It would say things like: consider what problem you’re trying to solve, and consider what actions may solve it. And criticize your ideas to eliminate errors. And … well no short summary does it justice. I’ve tried a few times here. But Popperian ways of thinking are not intuitive to people with the justificationist biases dominant in our culture. Maybe if you like everything I said I’ll try to explain more, but in that case I don’t know why you wouldn’t read some books which are more polished than what I would type in. If you have a specific, narrow question I can see that answering that would make sense.
Thank you for that detailed reply. I just have a few comments:
“data” could be any observable property of the world
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
there’s no requirement that the decision function be smooth—it’s just useful to look at such functions first for pedagogical reasons. All of the math continues to work in the presence of discontinuities.
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
Yes but using it to refer to a person’s ideas, without clarification, would be bizarre and many readers wouldn’t catch on.
Straight to the final, perfect truth? lol… That’s extremely unPopperian. We don’t expect progress to just end like that. We don’t expect you get so far and then there’s nothing further. We don’t think the scope for reason is so bounded, nor do we think fallibility is so easily defeated.
In practice searches for optimal things of this kind always involve many premises with have substantial philosophical meaning. (Which is often, IMO, wrong.)
Does it use an infinite set of all possible actions? I would have thought it wouldn’t rely on knowing what each action actually is, but would just broadly specify the set of all actions and move on.
@smooth: what good is a mean or median with no smoothness? And for margins of error, with a non-smooth function, what do you do?
With a smooth region of a function, taking the midpoint of the margin of error region is reasonable enough. But when there is a discontinuity, there’s no way to average it and get a good result. Mixing different ideas is a hard process if you want anything useful to result. If you just do it in a simple way like averaging you end up with a result that none of the ideas think will work and shouldn’t be surprised when it doesn’t. It’s kind of like how if you have half an army do one general’s plan, and half do another, the result is worse than doing either one.