Suppose that, like Yudkowsky, you really care about humanity surviving this century but you think that nothing you can do has a decent chance of achieving that.
It’s an unfortunate fact of human psychology that, when faced with this kind of situation, people will often do nothing at all instead of the thing which has the highest chance of achieving their goal. Hence, you might give up on alignment research entirely, and either lie in bed all day with paralysing depression, or convert your FAANG income into short-term pleasures. How can we avoid this trap?
It seems we have three options:
(1) Change your psychology. This would be the ideal option. If you can do that, then do that. But the historical track-record suggests this is really hard.
(2) Change your beliefs. This is called “hope”, and it’s a popular trick among AI doomers. You change your belief from “there’s nothing I can do which makes survival likely” to “there’s something I can do which makes survival likely”.
(3) Change your goals.This is what Yudkowsky proposes. You change your goal from “humanity survives this century” to “my actions increase the log-odds that humanity survives this century”. Yudkowsky calls this new goal “dignity”. The old goal had only two possible values, 0 and 1, but the new goal has possible values anywhere between −∞ and +∞.
Of course, it’s risky to change either your beliefs or your goals, because you might face a situation where the optimal policy after the change differs from the optimal policy before the change. But Yudkowsky thinks that (3) is less optimal-policy-corrupting than (2).
Why’s that? Well, if you force yourself to believe something unlikely (e.g. “there’s something I can do which makes survival likely”), then the inaccuracy can leak into your other beliefs because your beliefs are connected together by a web of inferences. You’ll start making poor predictions about AI, and also make silly decisions.
On the other hand, changing your goal from “survival” to “dignity” is like Trying to Try rather than trying — it’s relatively less optimal-policy-corrupting.
MIRI’s “Death with Dignity” in 60 seconds.
Suppose that, like Yudkowsky, you really care about humanity surviving this century but you think that nothing you can do has a decent chance of achieving that.
It’s an unfortunate fact of human psychology that, when faced with this kind of situation, people will often do nothing at all instead of the thing which has the highest chance of achieving their goal. Hence, you might give up on alignment research entirely, and either lie in bed all day with paralysing depression, or convert your FAANG income into short-term pleasures. How can we avoid this trap?
It seems we have three options:
(1) Change your psychology. This would be the ideal option. If you can do that, then do that. But the historical track-record suggests this is really hard.
(2) Change your beliefs. This is called “hope”, and it’s a popular trick among AI doomers. You change your belief from “there’s nothing I can do which makes survival likely” to “there’s something I can do which makes survival likely”.
(3) Change your goals. This is what Yudkowsky proposes. You change your goal from “humanity survives this century” to “my actions increase the log-odds that humanity survives this century”. Yudkowsky calls this new goal “dignity”. The old goal had only two possible values, 0 and 1, but the new goal has possible values anywhere between −∞ and +∞.
Of course, it’s risky to change either your beliefs or your goals, because you might face a situation where the optimal policy after the change differs from the optimal policy before the change. But Yudkowsky thinks that (3) is less optimal-policy-corrupting than (2).
Why’s that? Well, if you force yourself to believe something unlikely (e.g. “there’s something I can do which makes survival likely”), then the inaccuracy can leak into your other beliefs because your beliefs are connected together by a web of inferences. You’ll start making poor predictions about AI, and also make silly decisions.
On the other hand, changing your goal from “survival” to “dignity” is like Trying to Try rather than trying — it’s relatively less optimal-policy-corrupting.