habryka comments on Decomposing Agency — capabilities without desires

habryka 29 Jul 2024 6:54 UTC
LW: 19 AF: 7
22
AF
Promoted to curated: I don’t think this post is perfect (and I have various disagreements with both its structure and content), but I do think the post overall is “going for the throat” in ways that relatively little safety research these days feels like its doing. Characterizing agency is at the heart of basically all AI existential risk arguments, and progress and deconfusion on that seems likely to have large effects on AI risk mitigation strategies.
- Thomas Kwa 31 Jul 2024 17:27 UTC
  LW: 6 AF: 3
  0
  AF Parent
  I’m glad to see this post curated. It seems increasingly likely that ~~we need~~ it will be useful to carefully construct agents that have only what agency is required to accomplish a task, and the ideas here seem like the first steps.
  - Jeremy Gillen 31 Jul 2024 21:38 UTC
    LW: 2 AF: 1
    −2
    AF Parent
    What task? All the tasks I know of that are sufficient to reduce x-risk are really hard.
    - Thomas Kwa 1 Aug 2024 1:04 UTC
      LW: 2 AF: 1
      0
      AF Parent
      I’m not thinking of a specific task here, but I think there are two sources of hope. One is that humans are agentic above and beyond what is required to do novel science, e.g. we have biological drives, goals other than doing the science, often the desire to use any means to achieve our goals rather than whitelisted means, and the ability and desire to stop people from interrupting us. Another is that learning how to safely operate agents at a slightly superhuman level will be progress towards safely operating nanotech-capable agents, which could also require control, oversight, steering, or some other technique. I don’t think limiting agency will be sufficient unless the problem is easy, and then it would have other possible solutions.
- Raymond D 29 Jul 2024 16:23 UTC
  2 points
  2
  Parent
  I’d be pretty curious to hear about your disagreements if you’re willing to share