Promoted to curated: I don’t think this post is perfect (and I have various disagreements with both its structure and content), but I do think the post overall is “going for the throat” in ways that relatively little safety research these days feels like its doing. Characterizing agency is at the heart of basically all AI existential risk arguments, and progress and deconfusion on that seems likely to have large effects on AI risk mitigation strategies.
I’m glad to see this post curated. It seems increasingly likely that we need it will be useful to carefully construct agents that have only what agency is required to accomplish a task, and the ideas here seem like the first steps.
I’m not thinking of a specific task here, but I think there are two sources of hope. One is that humans are agentic above and beyond what is required to do novel science, e.g. we have biological drives, goals other than doing the science, often the desire to use any means to achieve our goals rather than whitelisted means, and the ability and desire to stop people from interrupting us. Another is that learning how to safely operate agents at a slightly superhuman level will be progress towards safely operating nanotech-capable agents, which could also require control, oversight, steering, or some other technique. I don’t think limiting agency will be sufficient unless the problem is easy, and then it would have other possible solutions.
Promoted to curated: I don’t think this post is perfect (and I have various disagreements with both its structure and content), but I do think the post overall is “going for the throat” in ways that relatively little safety research these days feels like its doing. Characterizing agency is at the heart of basically all AI existential risk arguments, and progress and deconfusion on that seems likely to have large effects on AI risk mitigation strategies.
I’m glad to see this post curated. It seems increasingly likely that
we needit will be useful to carefully construct agents that have only what agency is required to accomplish a task, and the ideas here seem like the first steps.What task? All the tasks I know of that are sufficient to reduce x-risk are really hard.
I’m not thinking of a specific task here, but I think there are two sources of hope. One is that humans are agentic above and beyond what is required to do novel science, e.g. we have biological drives, goals other than doing the science, often the desire to use any means to achieve our goals rather than whitelisted means, and the ability and desire to stop people from interrupting us. Another is that learning how to safely operate agents at a slightly superhuman level will be progress towards safely operating nanotech-capable agents, which could also require control, oversight, steering, or some other technique. I don’t think limiting agency will be sufficient unless the problem is easy, and then it would have other possible solutions.
I’d be pretty curious to hear about your disagreements if you’re willing to share