SoerenMind comments on Big Advance in Infinite Ethics

SoerenMind 11 Dec 2017 3:30 UTC
1 point
Warning: I haven’t read the paper so take this with a grain of salt
Here’s how it would go wrong if I understand it right: For exponentially discounted MDPs there’s something called an effective horizon. That means everything after that time is essentially ignored.
You pick a tiny $ϵ > 0$ . Say (without loss of generality) that all utilities $u_{t} \in [- 1, 1]$ . Then there is a time $t_{0}$ with $δ^{t_{0}} < ϵ$ . So the discounted cumulative utility from anything after $t_{0}$ is bounded by $c = ϵ \frac{1}{1 - δ}$ (which follows from the limit of the geometric series). That’s an arbitrarily small constant.
We can now easily construct pairs of sequences for which LDU gives counterintuitive conclusions. E.g. a sequence $s_{1}$ which is maximally better than $s_{2}$ for any $t > t_{0}$ until the end of time but ever so slightly worse (by $c$ ) for $0 < t < t_{0}$ .
So anything that happens after $t_{0}$ is essentially ignored—we’ve essentially made the problem finite.
Exponential discounting in MDPs is standard practice. I’m surprised that this is presented as a big advance in infinite ethics as people have certainly thought about this in economics, machine learning and ethics before.
Btw, your meta-MDP probably falls into the category of Bayes-Adaptive MDP (BAMDP) or Bayes-Adaptive partially observable MDP (BAPOMDP) with learned rewards.
- bwest 29 Dec 2017 19:46 UTC
  1 point
  Parent
  Thanks for the response. EDIT: Adam pointed out to me that LDU does not suffer from dictatorship of the present as I originally stated below and as you argued above. What you are saying is true for a fixed discount factor, but in this case we take the limit as $δ \to 1$ .
  The property you describe is known as “dictatorship of the present”, and you can read more about it here. In order to get rid of this “dictatorship” you end up having to do things like reject stationary, which are plausibly just as counterintuitive.
  > I’m surprised that this is presented as a big advance in infinite ethics as people have certainly thought about this in economics, machine learning and ethics before.
  Could you elaborate? The reason that I thought this was important was:
  > Previous algorithms like the overtaking criterion had fairly “obvious” incomparable streams, with no real justification for why those streams would not be encountered by a decision-maker. LDU is not complete, but we at least have some reason to think that it may be all we “practically” need.
  Are there other algorithms which you think are all we will “practically” need?