Yann LeCun: … instrumental subgoals are much weaker drives of behavior than hardwired objectives. Else, how could one explain the lack of domination behavior in non-social animals, such as orangutans.
What’s your specific critique of this? I think it’s an interesting and insightful point.
LeCun claims too much. It’s true that the case of animals like orangutans points to a class of cognitive architectures which seemingly don’t prioritize power-seeking. It’s true that this is some evidence against power-seeking behavior being common amongst relevant cognitive architectures. However, it doesn’t show that instrumental subgoals are much weaker drives of behavior than hardwired objectives.
One reading of this “drives of behavior” claim is that it has to be tautological; by definition, instrumental subgoals are always in service of the (hardwired) objective. I assume that LeCun is instead discussing “all else equal, will statistical instrumental tendencies (‘instrumental convergence’) be more predictive of AI behavior than its specific objective function?”.
But “instrumental subgoals are much weaker drives of behavior than hardwired objectives” is not the only possible explanation of “the lack of domination behavior in non-social animals”! Maybe the orangutans aren’t robust to scale. Maybe orangutans do implement non power-seeking cognition, but maybe their cognitive architecture will be hard or unlikely for us to reproduce in a machine—maybe the distribution of TAI cognitive architectures we should expect, is far different from what orangutans are like.
I do agree that there’s a very good point in the neighborhood of the quoted argument. My steelman of this would be:
Some animals, like humans, seem to have power-seeking drives. Other animals, like orangutans, do not. Therefore, it’s possible to design agents of some intelligence which do not seek power. Obviously, we will be trying to design agents which do not seek power. Why, then, should we expect such agents to be more like humans than like orangutans?
(This is loose for a different reason, in that it presupooses a single relevant axis of variation between humans and orangutans. Is a personal computer more like a human, or more like an orangutan? But set that aside for the moment.)
I think he’s overselling the evidence. However, on reflection, I wouldn’t pick out the point for such strong ridicule.
I feel like you can turn this point upside down. Even among primates that seem unusually docile, like orang utans, male-male competition can get violent and occasionally ends in death. Isn’t that evidence that power-seeking is hard to weed out? And why wouldn’t it be in an evolved species that isn’t eusocial or otherwise genetically weird?
What’s your specific critique of this? I think it’s an interesting and insightful point.
LeCun claims too much. It’s true that the case of animals like orangutans points to a class of cognitive architectures which seemingly don’t prioritize power-seeking. It’s true that this is some evidence against power-seeking behavior being common amongst relevant cognitive architectures. However, it doesn’t show that instrumental subgoals are much weaker drives of behavior than hardwired objectives.
One reading of this “drives of behavior” claim is that it has to be tautological; by definition, instrumental subgoals are always in service of the (hardwired) objective. I assume that LeCun is instead discussing “all else equal, will statistical instrumental tendencies (‘instrumental convergence’) be more predictive of AI behavior than its specific objective function?”.
But “instrumental subgoals are much weaker drives of behavior than hardwired objectives” is not the only possible explanation of “the lack of domination behavior in non-social animals”! Maybe the orangutans aren’t robust to scale. Maybe orangutans do implement non power-seeking cognition, but maybe their cognitive architecture will be hard or unlikely for us to reproduce in a machine—maybe the distribution of TAI cognitive architectures we should expect, is far different from what orangutans are like.
I do agree that there’s a very good point in the neighborhood of the quoted argument. My steelman of this would be:
(This is loose for a different reason, in that it presupooses a single relevant axis of variation between humans and orangutans. Is a personal computer more like a human, or more like an orangutan? But set that aside for the moment.)
I think he’s overselling the evidence. However, on reflection, I wouldn’t pick out the point for such strong ridicule.
I feel like you can turn this point upside down. Even among primates that seem unusually docile, like orang utans, male-male competition can get violent and occasionally ends in death. Isn’t that evidence that power-seeking is hard to weed out? And why wouldn’t it be in an evolved species that isn’t eusocial or otherwise genetically weird?