It’s not obvious to me that the information you’re looking for is not present in a single toe. In the same way that an advanced AI could discover General Relativity by carefully examining a few frames of a falling apple, couldn’t it infer something about human/rabbit/rainforest values by observing the behavior of a toe? My concern would instead be that there is too much information and that the AI would pick out some values but not necessarily the ones you expect.
But the question is whether values are even the right way to go about this problem. That’s the kind of information we’re seeking: information about how even to go about being beneficial, and what beneficial really means. Does it really make sense to model a rainforest as an agent and back out a value function for it? If we did that, would it work out in a way that we could look back on and be glad about? Perhaps it would, perhaps it wouldn’t, but the hard problem of AI safety is this question of what even is the right frame to start thinking about this in, and how we can even begin to answer such a question.
Now perhaps it’s still true that the information we seek can be found in a human toe. But just beware that we’re not talking about anything so concrete as values here.
It’s not obvious to me that the information you’re looking for is not present in a single toe. In the same way that an advanced AI could discover General Relativity by carefully examining a few frames of a falling apple, couldn’t it infer something about human/rabbit/rainforest values by observing the behavior of a toe? My concern would instead be that there is too much information and that the AI would pick out some values but not necessarily the ones you expect.
But the question is whether values are even the right way to go about this problem. That’s the kind of information we’re seeking: information about how even to go about being beneficial, and what beneficial really means. Does it really make sense to model a rainforest as an agent and back out a value function for it? If we did that, would it work out in a way that we could look back on and be glad about? Perhaps it would, perhaps it wouldn’t, but the hard problem of AI safety is this question of what even is the right frame to start thinking about this in, and how we can even begin to answer such a question.
Now perhaps it’s still true that the information we seek can be found in a human toe. But just beware that we’re not talking about anything so concrete as values here.