More generally, it seems like we can build systems that succeed in accomplishing long run goals without having the core components which are doing this actually ‘want’ to accomplish any long run goal.
It seems like this is common for corporations and we see similar dynamics for language model agents.
I do not expect you to be able to give an example of a corporation that is a central example of this without finding that there is in fact a “want” implemented in the members of the corporation wanting to satisfy their bosses, who in turn want to satisfy theirs, etc. Corporations are generally supervisor trees where bosses set up strong incentives, and it seems to me that this produces a significant amount of aligned wanting in the employees, though of course there’s also backpressure.
More generally, it seems like we can build systems that succeed in accomplishing long run goals without having the core components which are doing this actually ‘want’ to accomplish any long run goal.
It seems like this is common for corporations and we see similar dynamics for language model agents.
(Again, efficiency concerns are reasonable.)
I do not expect you to be able to give an example of a corporation that is a central example of this without finding that there is in fact a “want” implemented in the members of the corporation wanting to satisfy their bosses, who in turn want to satisfy theirs, etc. Corporations are generally supervisor trees where bosses set up strong incentives, and it seems to me that this produces a significant amount of aligned wanting in the employees, though of course there’s also backpressure.
I agree that there is want, but it’s very unclear if this needs to be long run ‘want’.
(And for danger, it seems the horizon of want matters a lot.)