Tetraspace comments on Smoking Lesion Steelman

Tetraspace Jul 21, 2020, 2:43 AM
13 points
I didn’t find the conclusion about the smoke-lovers and non-smoke-lovers obvious in the EDT case at first glance, so I added in some numbers and ran through the calculations that the robots will do to see for myself and get a better handle on what not being able to introspect but still gaining evidence about your utility function actually looks like.
Suppose that, out of the $N$ robots that have ever been built, $n N$ are smoke-lovers and $(1 - n) N$ are non-smoke-lovers. Suppose also the smoke-lovers end up smoking with probability $p$ and non-smoke-lovers end up smoking with probability $q$ .
Then $(p n + q (1 - n)) N$ robots smoke, and $((1 - p) n + (1 - q) (1 - n)) N$ robots don’t smoke. So by Bayes’ theorem, if a robot smokes, there is a $\frac{p n}{p n + q (1 - n)}$ chance that it’s killed, and if a robot doesn’t smoke, there’s a $\frac{(1 - p) n}{1 - (p n + q (1 - n))}$ chance that it’s killed.
Hence, the expected utilities are:
- An EDT non-smoke-lover looks at the possibilities. It sees that if it smokes, it expects to get $- 101 \frac{p n}{p n + q (1 - n)} - 1 (1 - \frac{p n}{p n + q (1 - n)})$ utilons, and that if it doesn’t smoke, it expects to get $- 100 \frac{(1 - p) n}{1 - (p n + q (1 - n))}$ utilons.
- An EDT smoke-lover looks at the possibilities. It sees that if it smokes, it expects to get $- 90 \frac{p n}{p n + q (1 - n)} + 10 (1 - \frac{p n}{p n + q (1 - n)})$ utilons, and if it doesn’t smoke, it expects to get $- 100 \frac{(1 - p) n}{1 - (p n + q (1 - n))}$ utilons.
Now consider some equilibria. Suppose that no non-smoke-lovers smoke, but some smoke-lovers smoke. So $q = ε$ and $p ≫ ε$ . So (taking limits as $ε \to 0$ along the way):
- non-smoke-lovers expect to get $- 101$ utilons if they smoke, and $- 100 \frac{n - p n}{1 - p n}$ utilons if they don’t smoke. $n < 1$ so non-smoke-lovers will choose not to smoke.
- smoke-lovers expect to get $- 90$ utilons if they smoke, and $- 100 \frac{n - p n}{1 - p n}$ utilons if they don’t smoke. Smoke-lovers would be indifferent between the two if $p = 10 - \frac{9}{n}$ . This works fine if at least 90% of robots are smoke lovers, and equilibrium is achieved. But if less than 90% of robots are smoke-lovers, then there is no point at which they would be indifferent, and they will always choose not to smoke.
But wait! This is fine if more than 90% are smoke-lovers, but if fewer than 90% are smoke-lovers, then they would always choose not to smoke, that’s inconsistent with the assumption that $p$ is much larger than $ε$ . So instead suppose that $p$ is only only a little bit bigger than $ε = q$ , say that $p = k ε$ . Then:
- non-smoke-lovers expect to get $- 100 (\frac{k}{1 + (k - 1) n} + \frac{1}{100 n}) n$ utilons if they smoke, and $- 100 n$ utilons if they don’t smoke. They will choose to smoke if $k < 1 + \frac{1}{101 n - 100 n^{2}}$ , i.e. if smoke-lovers smoke so rarely that not smoking would make them believe they’re a smoke-lover about to be killed by the blade runner.
- smoke-lovers expect to get $- 100 (\frac{k}{1 + (k - 1) n} - \frac{1}{10 n}) n$ utilons if they smoke, and $- 100 n$ utilons if they don’t smoke. They are indifferent between these two when $k = 1 + \frac{1}{9 n - 10 n^{2}}$ . This means that, when $k$ is at the equilibrium point, non-smoke-lovers will not choose to smoke when fewer than 90% of robots are smoke-lovers, which is exactly when this regime applies.
I wrote a quick python simulation to check these conclusions, and it was the case that $p = 10 - \frac{9}{n}$ for $0.9 < n < 1$ , and $p = (1 + \frac{1}{9 n - 10 n^{2}}) ε$ for $0 < n < 0.9$ there as well.