Median utility by itself doesn’t work. I came up with an algorithm that compromises between them. In everyday circumstances it behaves like expected utility. In extreme cases, it behaves like median utility. And it has tunable parameters:
sample n counterfactuals from your probability distribution. Then take the average of these n outcomes, [EDIT: and do this an infinite amount of times, and take the median of all these means]. E.g. so 50% of the time the average of the n outcomes is higher, and 50% of the time it’s lower.
As n approaches infinity it becomes equivalent to expected utility, and as it approaches 1 it becomes median expected utility. A reasonable value is probably a few hundred. So that you select outcomes where you come out ahead the vast majority of the time, but still take low probability risks or ignore low probability rewards.
I believe this more closely matches how humans actually make decisions, and what we actually want, than expected utility. But I am no longer certain of this. Someone suggested that you can deal with most of the expected utility issues by modifying the utility function. And that is somewhat more elegant than this.
As for inconsistency, I proposed a way of dealing with that too. EU is consistent at every single point in time. It’s memoryless. If you can precommit yourself to doing certain things in the future, you don’t need this property. You can maintain consistency by committing yourself to only take actions that are consistent with your current decision theory.
This is basically the same thing as your policy selection idea.
I came up with an algorithm that compromises between them.
I am not sure of the point. If you can “sample … from your probability distribution” then you fully know your probability distribution including all of its statistics—mean, median, etc. And then you proceed to generate some sample estimates which just add noise but, as far as I can see, do nothing else useful.
If you want something more robust than the plain old mean, check out M-estimators which are quite flexible.
If you can “sample … from your probability distribution” then you fully know your probability distribution
That’s not true. (Though it might well be in all practical cases.) In particular, there are good algorithms for sampling from unknown or uncomputable probability distributions. Of course, any method that lets you sample from it lets you sample the parameters as well, but that’s exactly the process the parent comment is suggesting.
It might be more amenable to MCMC sampling than you think. MCMC basically is a series of operations of the form “make a small change and compare the result to the status quo”, which now that I phrase it that way sounds a lot like human ethical reasoning. (Maybe the real problem with philosophy is that we don’t consider enough hypothetical cases? I kid… mostly...)
In practice, the symmetry constraint isn’t as nasty as it looks. For example, you can do MH to sample a random node from a graph, knowing only local topology (you need some connectivity constraints to get a good walk length to get good diffusion properties). Basically, I posit that the hard part is coming up with a sane definition for “nearby possible world” (and that the symmetry constraint and other parts are pretty easy after that).
Maybe the real problem with philosophy is that we don’t consider enough hypothetical cases? I kid… mostly...
In that case we can have wonderful debates about which sub-space to sample our hypotheticals from, and once a bright-eyed and bushy-tailed acolyte breates out “ALL of it!” we can pontificate about the boundaries of all :-)
P.S. In about a century philosophy will discover the curse of dimensionality and there will be much rending of clothes and gnashing of teeth...
I should have explained it better. You take n samples, and calculate the mean of those samples. You do that a bunch of times, and create a new distribution of those means of samples. Then you take the median of that.
This gives a tradeoff between mean and median. As n goes to infinity, you just get the mean. As n goes to 1, you just get the median. Values in between are a compromise. n = 100 will roughly ignore things that have less than 1% chance of happening (as opposed to less than 50% chance of happening, like the standard median.)
There is a variety of ways to get a tradeoff between the mean and the median (or, more generally, between an efficient but not robust estimator and a robust but not efficient estimator). The real question is how do you decide what a good tradeoff is.
Basically if your mean and your median are different, your distribution is asymmetric. If you want a single-point summary of the entire distribution, you need to decide how to deal with that asymmetry. Until you specify some criteria under which you’ll be optimizing your single-point summary you can’t really talk about what’s better and what’s worse.
This is just one of many possible algorithms which trade off between median and mean. Unfortunately there is no objective way to determine which one is best (or the setting of the hyperparameter.)
The criteria we are optimizing is just “how closely does it match the behavior we actually want.”
I posted this exact idea a few months ago. There was a lot of discussion about it which you might find interesting. We also discussed it recently on the irc channel.
Median utility by itself doesn’t work. I came up with an algorithm that compromises between them. In everyday circumstances it behaves like expected utility. In extreme cases, it behaves like median utility. And it has tunable parameters:
EDIT: Stuart Armstrong’s idea is much better than this and gets about the same results: http://lesswrong.com/r/discussion/lw/mqk/mean_of_quantiles/
I believe this more closely matches how humans actually make decisions, and what we actually want, than expected utility. But I am no longer certain of this. Someone suggested that you can deal with most of the expected utility issues by modifying the utility function. And that is somewhat more elegant than this.
As for inconsistency, I proposed a way of dealing with that too. EU is consistent at every single point in time. It’s memoryless. If you can precommit yourself to doing certain things in the future, you don’t need this property. You can maintain consistency by committing yourself to only take actions that are consistent with your current decision theory.
This is basically the same thing as your policy selection idea.
I am not sure of the point. If you can “sample … from your probability distribution” then you fully know your probability distribution including all of its statistics—mean, median, etc. And then you proceed to generate some sample estimates which just add noise but, as far as I can see, do nothing else useful.
If you want something more robust than the plain old mean, check out M-estimators which are quite flexible.
That’s not true. (Though it might well be in all practical cases.) In particular, there are good algorithms for sampling from unknown or uncomputable probability distributions. Of course, any method that lets you sample from it lets you sample the parameters as well, but that’s exactly the process the parent comment is suggesting.
A fair point, though I don’t think it makes any difference in the context. And I’m not sure the utility function is amenable to MCMC sampling...
I basically agree. However...
It might be more amenable to MCMC sampling than you think. MCMC basically is a series of operations of the form “make a small change and compare the result to the status quo”, which now that I phrase it that way sounds a lot like human ethical reasoning. (Maybe the real problem with philosophy is that we don’t consider enough hypothetical cases? I kid… mostly...)
In practice, the symmetry constraint isn’t as nasty as it looks. For example, you can do MH to sample a random node from a graph, knowing only local topology (you need some connectivity constraints to get a good walk length to get good diffusion properties). Basically, I posit that the hard part is coming up with a sane definition for “nearby possible world” (and that the symmetry constraint and other parts are pretty easy after that).
In that case we can have wonderful debates about which sub-space to sample our hypotheticals from, and once a bright-eyed and bushy-tailed acolyte breates out “ALL of it!” we can pontificate about the boundaries of all :-)
P.S. In about a century philosophy will discover the curse of dimensionality and there will be much rending of clothes and gnashing of teeth...
I should have explained it better. You take n samples, and calculate the mean of those samples. You do that a bunch of times, and create a new distribution of those means of samples. Then you take the median of that.
This gives a tradeoff between mean and median. As n goes to infinity, you just get the mean. As n goes to 1, you just get the median. Values in between are a compromise. n = 100 will roughly ignore things that have less than 1% chance of happening (as opposed to less than 50% chance of happening, like the standard median.)
There is a variety of ways to get a tradeoff between the mean and the median (or, more generally, between an efficient but not robust estimator and a robust but not efficient estimator). The real question is how do you decide what a good tradeoff is.
Basically if your mean and your median are different, your distribution is asymmetric. If you want a single-point summary of the entire distribution, you need to decide how to deal with that asymmetry. Until you specify some criteria under which you’ll be optimizing your single-point summary you can’t really talk about what’s better and what’s worse.
This is just one of many possible algorithms which trade off between median and mean. Unfortunately there is no objective way to determine which one is best (or the setting of the hyperparameter.)
The criteria we are optimizing is just “how closely does it match the behavior we actually want.”
EDIT: Stuart Armstrong’s idea is much better: http://lesswrong.com/r/discussion/lw/mqk/mean_of_quantiles/
And what is “the behavior we actually want”?