I don’t think human selective breeding tells us much about what’s simple and natural for AIs. HSB seems very different from AI training. I’m reminded of the Quintin Pope point that evolution selects genes that build brains that learn parameter values, rather than selecting for parameter values directly. It’s probably hard to get next-token predictors via HSB, but you can do it via AI training.
On generalizing to extremely unlikely conditionals, I think TD-agents are in much the same position as other kinds of agents, like expected utility maximizers. Strictly, both have to consider extremely unlikely conditionals to select actions. In practice, both can approximate the results of this process using heuristics.
I was asking about HSB not because I think it is similar to the process about AIs but because if the answer differs, then it implies your making some narrower assumption about the inductive biases of AI training.
On generalizing to extremely unlikely conditionals, I think TD-agents are in much the same position as other kinds of agents, like expected utility maximizers. Strictly, both have to consider extremely unlikely conditionals to select actions. In practice, both can approximate the results of this process using heuristics.
Sure, from a capabilities perspective. But the question is how the motivations/internal objectives generalize. I agree that AIs trained to be a TD-agent might generalize for the same reason that an AI trained on a paperclip maximization objective might generalize to maximize paperclips in some very different circumstance. But, I don’t necessarily buy this is how the paperclip-maximization-trained AI will generalize!
(I’m picking up this thread from 7 months ago, so I might be forgetting some important details.)
I don’t think human selective breeding tells us much about what’s simple and natural for AIs. HSB seems very different from AI training. I’m reminded of the Quintin Pope point that evolution selects genes that build brains that learn parameter values, rather than selecting for parameter values directly. It’s probably hard to get next-token predictors via HSB, but you can do it via AI training.
On generalizing to extremely unlikely conditionals, I think TD-agents are in much the same position as other kinds of agents, like expected utility maximizers. Strictly, both have to consider extremely unlikely conditionals to select actions. In practice, both can approximate the results of this process using heuristics.
I was asking about HSB not because I think it is similar to the process about AIs but because if the answer differs, then it implies your making some narrower assumption about the inductive biases of AI training.
Sure, from a capabilities perspective. But the question is how the motivations/internal objectives generalize. I agree that AIs trained to be a TD-agent might generalize for the same reason that an AI trained on a paperclip maximization objective might generalize to maximize paperclips in some very different circumstance. But, I don’t necessarily buy this is how the paperclip-maximization-trained AI will generalize!
(I’m picking up this thread from 7 months ago, so I might be forgetting some important details.)