I left a reply to this view at the other comment. However, I don’t feel that point connects very well to the point I tried to make.
Your OP talks about minimization of prediction error as a theory of human value, relevant to alignment. It might be that evolution re-purposes predictive machinery to pursue adaptive goals; this seems like the sort of thing evolution would do. However, this leaves the question of what those goals are. You say you’re not claiming that humans globally minimize prediction error. But, partly because of the remarks you made in the OP, I’m reading you as suggesting that humans do minimize prediction error, but relative to a skewed prediction.
Are human values well-predicted by modeling us as minimizing prediction error relative to a skewed prediction?
My argument here is that evolved creatures such as humans are more likely to (as one component of value) steer toward prediction error, because doing so tends to lead to learning, which is broadly valuable. This is difficult to model by taking a system which minimizes prediction error and skewing the predictions, because it is the exact opposite.
Elsewhere, you suggest that exploration can be predicted by your theory if there’s a sort of reflection within the system, so that prediction error is predicted as well. The system therefore has an overall set-point for prediction error and explores if it’s too small. But I think this would be drowned out. If I started with a system which minimizes prediction error and added a curiosity drive on top of it, I would have to entirely cancel out the error-minimization drive before I started to see the curiosity doing its job successfully. Similarly for your hypothesized part. Everything else in the system is strategically avoiding error. One part steering toward error would have to out-vote or out-smart all those other parts.
Now, that’s over-stating my point. I don’t think human curiosity drive is exactly seeking maximum prediction error. I think it’s more likely related to the derivative of prediction error. But the point remains that that’s difficult to model as minimization of a skewed prediction error, and requires a sub-part implementing curiosity to drown out all the other parts.
Instead of modeling human value as minimization of error of a skewed prediction, why not step back and model it as minimizing “some kind of error”? This seems no less parsimonious (since you have to specify the skew anyway), and leaves you with all the same controller machinery to propagate error through the system and learn to avoid it.
I left a reply to this view at the other comment. However, I don’t feel that point connects very well to the point I tried to make.
Your OP talks about minimization of prediction error as a theory of human value, relevant to alignment. It might be that evolution re-purposes predictive machinery to pursue adaptive goals; this seems like the sort of thing evolution would do. However, this leaves the question of what those goals are. You say you’re not claiming that humans globally minimize prediction error. But, partly because of the remarks you made in the OP, I’m reading you as suggesting that humans do minimize prediction error, but relative to a skewed prediction.
Are human values well-predicted by modeling us as minimizing prediction error relative to a skewed prediction?
My argument here is that evolved creatures such as humans are more likely to (as one component of value) steer toward prediction error, because doing so tends to lead to learning, which is broadly valuable. This is difficult to model by taking a system which minimizes prediction error and skewing the predictions, because it is the exact opposite.
Elsewhere, you suggest that exploration can be predicted by your theory if there’s a sort of reflection within the system, so that prediction error is predicted as well. The system therefore has an overall set-point for prediction error and explores if it’s too small. But I think this would be drowned out. If I started with a system which minimizes prediction error and added a curiosity drive on top of it, I would have to entirely cancel out the error-minimization drive before I started to see the curiosity doing its job successfully. Similarly for your hypothesized part. Everything else in the system is strategically avoiding error. One part steering toward error would have to out-vote or out-smart all those other parts.
Now, that’s over-stating my point. I don’t think human curiosity drive is exactly seeking maximum prediction error. I think it’s more likely related to the derivative of prediction error. But the point remains that that’s difficult to model as minimization of a skewed prediction error, and requires a sub-part implementing curiosity to drown out all the other parts.
Instead of modeling human value as minimization of error of a skewed prediction, why not step back and model it as minimizing “some kind of error”? This seems no less parsimonious (since you have to specify the skew anyway), and leaves you with all the same controller machinery to propagate error through the system and learn to avoid it.