ryan_greenblatt comments on The Shutdown Problem: Incomplete Preferences as a Solution

ryan_greenblatt 19 Nov 2024 22:49 UTC
2 points
0
I was asking about HSB not because I think it is similar to the process about AIs but because if the answer differs, then it implies your making some narrower assumption about the inductive biases of AI training.

On generalizing to extremely unlikely conditionals, I think TD-agents are in much the same position as other kinds of agents, like expected utility maximizers. Strictly, both have to consider extremely unlikely conditionals to select actions. In practice, both can approximate the results of this process using heuristics.

Sure, from a capabilities perspective. But the question is how the motivations/internal objectives generalize. I agree that AIs trained to be a TD-agent might generalize for the same reason that an AI trained on a paperclip maximization objective might generalize to maximize paperclips in some very different circumstance. But, I don’t necessarily buy this is how the paperclip-maximization-trained AI will generalize!

(I’m picking up this thread from 7 months ago, so I might be forgetting some important details.)