I’m glad to see this post curated. It seems increasingly likely that we need it will be useful to carefully construct agents that have only what agency is required to accomplish a task, and the ideas here seem like the first steps.
I’m not thinking of a specific task here, but I think there are two sources of hope. One is that humans are agentic above and beyond what is required to do novel science, e.g. we have biological drives, goals other than doing the science, often the desire to use any means to achieve our goals rather than whitelisted means, and the ability and desire to stop people from interrupting us. Another is that learning how to safely operate agents at a slightly superhuman level will be progress towards safely operating nanotech-capable agents, which could also require control, oversight, steering, or some other technique. I don’t think limiting agency will be sufficient unless the problem is easy, and then it would have other possible solutions.
I’m glad to see this post curated. It seems increasingly likely that
we needit will be useful to carefully construct agents that have only what agency is required to accomplish a task, and the ideas here seem like the first steps.What task? All the tasks I know of that are sufficient to reduce x-risk are really hard.
I’m not thinking of a specific task here, but I think there are two sources of hope. One is that humans are agentic above and beyond what is required to do novel science, e.g. we have biological drives, goals other than doing the science, often the desire to use any means to achieve our goals rather than whitelisted means, and the ability and desire to stop people from interrupting us. Another is that learning how to safely operate agents at a slightly superhuman level will be progress towards safely operating nanotech-capable agents, which could also require control, oversight, steering, or some other technique. I don’t think limiting agency will be sufficient unless the problem is easy, and then it would have other possible solutions.