What do you think about the possibility that, in practice, a really good strategy might be to not learn about humans at all, but just to learn to adapt to whatever player is out there (if you’re powerful enough to use a simplicity prior, you only make a small finite number of mistakes relative to the best hypothesis in your hypothesis space)? I think it might exacerbate the issues CIRL has with distinguishing humans from the environment.
So in general the larger the set of agents that you have to be robust to, the less you are able to coordinate with them. If you need to coordinate with a human on saying the same word out of “Heads” and “Tails”, you can probably do it. If you need to coordinate with an arbitrary agent about that, you have no hope.
if you’re powerful enough to use a simplicity prior, you only make a small finite number of mistakes relative to the best hypothesis in your hypothesis space
… But when you update on the evidence that you see, you are learning about humans? I’m not sure why you say this is “not learn[ing] about humans at all”.
Also, I disagree with “small” in that quote, but that’s probably not central.
Also, our AI systems are not going to be powerful enough to use a simplicity prior.
… But when you update on the evidence that you see, you are learning about humans? I’m not sure why you say this is “not learn[ing] about humans at all”.
Maybe I should retract this to “not learning about humans at train time,” but on the other hand, maybe not. The point here is probably worth explaining, and then some rationalist taboo is probably in order.
What’s that quote (via Richard Rorty summarizing Donald Davidson)? “If you believe that beavers live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs, true or false, about beavers.” There is a certain sort of syntactic aboutness that we sometimes care about, not merely that our model captures the function of something, but that we can access the right concept via some specific signal.
When you train the AI on datasets of human behavior, the sense in which it’s “learning about humans” isn’t merely related to its function in a specific environment at test time, it’s learning a model that is forced to capture human behavior in a wide variety of contexts, and it’s learning this model in such a way that you the programmer can access it later to make use of it for planning, and be confident that you’re not like the person trying to use the label “beaver” to communicate with someone who thinks beavers live in deserts.
When the purely-adaptive AI “learns about humans” during test time, it has fewer of those nice properties. It is not forced to make a broad model of humans, and in fact it doesn’t need to distinguish humans from complicated human-related parts of the environment. If you think humans come with wifi and cellphone reception, and can use their wheels to travel at speeds upwards of 150 kph, I’m suspicious about your opinion on how to satisfy human values.
Also, I disagree with “small” in that quote, but that’s probably not central.
Also, our AI systems are not going to be powerful enough to use a simplicity prior.
Fair enough (though it can be approximated surprisingly well, and many effective learning algorithms aspire to similar bounds on error relative to best in class). So do you think this means that pre-training a human model will in general be a superior solution to having the AI adapt to its environment? Or just that it will be important in enough specific cases (e.g. certain combinations of availability of human data, ease of simulation of the problem, and ease of extracting a reward signal from the environment) that the “engineers on the ground” will sit up and take notice?
Oh, I see, thanks for the explanation. In that case, I basically agree with your original comment: it seems likely that eventually we will have AI systems that simply play well with their environment. However, I think that by that point AI systems will have a good concept of “human”, because humans are a natural category / the “human” concept carves reality at its joints.
(I anticipate pushback on the claim that the “human” concept is natural; I’m not going to defend it here, though if the claim is unclear I’m happy to clarify.)
What do you think about the possibility that, in practice, a really good strategy might be to not learn about humans at all, but just to learn to adapt to whatever player is out there (if you’re powerful enough to use a simplicity prior, you only make a small finite number of mistakes relative to the best hypothesis in your hypothesis space)? I think it might exacerbate the issues CIRL has with distinguishing humans from the environment.
So in general the larger the set of agents that you have to be robust to, the less you are able to coordinate with them. If you need to coordinate with a human on saying the same word out of “Heads” and “Tails”, you can probably do it. If you need to coordinate with an arbitrary agent about that, you have no hope.
… But when you update on the evidence that you see, you are learning about humans? I’m not sure why you say this is “not learn[ing] about humans at all”.
Also, I disagree with “small” in that quote, but that’s probably not central.
Also, our AI systems are not going to be powerful enough to use a simplicity prior.
Maybe I should retract this to “not learning about humans at train time,” but on the other hand, maybe not. The point here is probably worth explaining, and then some rationalist taboo is probably in order.
What’s that quote (via Richard Rorty summarizing Donald Davidson)? “If you believe that beavers live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs, true or false, about beavers.” There is a certain sort of syntactic aboutness that we sometimes care about, not merely that our model captures the function of something, but that we can access the right concept via some specific signal.
When you train the AI on datasets of human behavior, the sense in which it’s “learning about humans” isn’t merely related to its function in a specific environment at test time, it’s learning a model that is forced to capture human behavior in a wide variety of contexts, and it’s learning this model in such a way that you the programmer can access it later to make use of it for planning, and be confident that you’re not like the person trying to use the label “beaver” to communicate with someone who thinks beavers live in deserts.
When the purely-adaptive AI “learns about humans” during test time, it has fewer of those nice properties. It is not forced to make a broad model of humans, and in fact it doesn’t need to distinguish humans from complicated human-related parts of the environment. If you think humans come with wifi and cellphone reception, and can use their wheels to travel at speeds upwards of 150 kph, I’m suspicious about your opinion on how to satisfy human values.
Fair enough (though it can be approximated surprisingly well, and many effective learning algorithms aspire to similar bounds on error relative to best in class). So do you think this means that pre-training a human model will in general be a superior solution to having the AI adapt to its environment? Or just that it will be important in enough specific cases (e.g. certain combinations of availability of human data, ease of simulation of the problem, and ease of extracting a reward signal from the environment) that the “engineers on the ground” will sit up and take notice?
Oh, I see, thanks for the explanation. In that case, I basically agree with your original comment: it seems likely that eventually we will have AI systems that simply play well with their environment. However, I think that by that point AI systems will have a good concept of “human”, because humans are a natural category / the “human” concept carves reality at its joints.
(I anticipate pushback on the claim that the “human” concept is natural; I’m not going to defend it here, though if the claim is unclear I’m happy to clarify.)