ryan_greenblatt comments on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

ryan_greenblatt 25 Nov 2023 1:03 UTC
LW: 15 AF: 7
−2
AF
More generally, it seems like we can build systems that succeed in accomplishing long run goals without having the core components which are doing this actually ‘want’ to accomplish any long run goal.

It seems like this is common for corporations and we see similar dynamics for language model agents.

(Again, efficiency concerns are reasonable.)
- the gears to ascension 26 Nov 2023 6:48 UTC
  5 points
  3
  Parent
  I do not expect you to be able to give an example of a corporation that is a central example of this without finding that there is in fact a “want” implemented in the members of the corporation wanting to satisfy their bosses, who in turn want to satisfy theirs, etc. Corporations are generally supervisor trees where bosses set up strong incentives, and it seems to me that this produces a significant amount of aligned wanting in the employees, though of course there’s also backpressure.
  - ryan_greenblatt 26 Nov 2023 6:55 UTC
    11 points
    1
    Parent
    I agree that there is want, but it’s very unclear if this needs to be long run ‘want’.
    
    (And for danger, it seems the horizon of want matters a lot.)