I think it’s not that the reward function is insufficient, it’s the deeper problem that the situation is literally undefined. Can you explain why you think there _IS_ a “true” factor? Not “can a learning system find it”, but “is there something to find”? If all known real examples have flags, flatness, and redness 100% correlated, there is no real preference for which one to use in the (counterfactual) case where they diverge. This isn’t sampling error or bias, it’s just not there.
I’ll note that we are using the term “human values” as if all humans had the same values. Even in fairly trivial cases humans can differ in what tradeoffs they’ll accept. E.g Adam gets food at a convenience store because it’s convenient, Beth goes to Whole Foods for healthy* foods, and Chad goes to Walmart because he’s cheap. All of them value convenience, nutrition, and cost, but to varying degrees.
*And with varying levels of information and disinformation about the actual nutritional needs of their bodies.
Can you explain why you think there _IS_ a “true” factor
Apologies for the miscommunication, but I don’t think there really is an objectively true factor. It’s true to the extent that humans say that it’s the true reward function, but I don’t think it’s a mathematical fact. That’s part of what I’m arguing. I agree with what you are saying.
I think it’s not that the reward function is insufficient, it’s the deeper problem that the situation is literally undefined. Can you explain why you think there _IS_ a “true” factor? Not “can a learning system find it”, but “is there something to find”? If all known real examples have flags, flatness, and redness 100% correlated, there is no real preference for which one to use in the (counterfactual) case where they diverge. This isn’t sampling error or bias, it’s just not there.
I’ll note that we are using the term “human values” as if all humans had the same values. Even in fairly trivial cases humans can differ in what tradeoffs they’ll accept. E.g Adam gets food at a convenience store because it’s convenient, Beth goes to Whole Foods for healthy* foods, and Chad goes to Walmart because he’s cheap. All of them value convenience, nutrition, and cost, but to varying degrees.
*And with varying levels of information and disinformation about the actual nutritional needs of their bodies.
Apologies for the miscommunication, but I don’t think there really is an objectively true factor. It’s true to the extent that humans say that it’s the true reward function, but I don’t think it’s a mathematical fact. That’s part of what I’m arguing. I agree with what you are saying.