This is an interesting idea that is being explored, but how do you nail it down precisely so that the superintelligence is actually interested in optimizing for it, and so that the beings whose agency is being optimized for are actually the ones you’re interested in preserving? Identifying the agents in a chunk of matter is not a solved problem. Eg, here’s a rough sketch of the challenge I see, posed as a question to a hypothetical future LLM (I know of no LLM capable of helping significantly with this, GPT4 and Gemini Advanced have both been insufficient. I’m hopeful the causal incentives group hits another home run like Discovering Agents and nails it down.)
Meanwhile, the folks who have been discussing boundaries seem to maybe possibly be onto something about defining a zone of agency, maybe. I’m not totally sure they have anything to add on top of Discovering Agents.
Cannell has also talked about “empowerment of other”—Empowerment is the term of art for what you’re proposing here.
It always comes down to the difficulty of making sure the superintelligence’s agency is actually seeking agency for others, rather than a facsimile of agency for others that turns out to just be pictures of agency.
There’s a lot of information here that will be super helpful for me to delve into. I’ve been bookmarking your links.
I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I’m glad to see there’s lots of research happening on this and I’ll be checking out ‘empowerment’ as an agency term.
Agency doesn’t equal ‘goodness’, but it seems like an easier target to hit. I’m trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.
the problem is there are going to be self-agency-maximizing ais at some point and the question is how to make AIs that can defend the agency of humans against those.
With optimization, I’m always concerned with the interactions of multiple agents, are there any ways in this system that two or more agents could form cartels and increase each others agency. I see this happen with some reinforcement learning models where if some edge cases aren’t covered, then they will just mine each other for easy points thanks to how we set up the reward function.
This is an interesting idea that is being explored, but how do you nail it down precisely so that the superintelligence is actually interested in optimizing for it, and so that the beings whose agency is being optimized for are actually the ones you’re interested in preserving? Identifying the agents in a chunk of matter is not a solved problem. Eg, here’s a rough sketch of the challenge I see, posed as a question to a hypothetical future LLM (I know of no LLM capable of helping significantly with this, GPT4 and Gemini Advanced have both been insufficient. I’m hopeful the causal incentives group hits another home run like Discovering Agents and nails it down.)
Meanwhile, the folks who have been discussing boundaries seem to maybe possibly be onto something about defining a zone of agency, maybe. I’m not totally sure they have anything to add on top of Discovering Agents.
Cannell has also talked about “empowerment of other”—Empowerment is the term of art for what you’re proposing here.
It always comes down to the difficulty of making sure the superintelligence’s agency is actually seeking agency for others, rather than a facsimile of agency for others that turns out to just be pictures of agency.
Super interesting!
There’s a lot of information here that will be super helpful for me to delve into. I’ve been bookmarking your links.
I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I’m glad to see there’s lots of research happening on this and I’ll be checking out ‘empowerment’ as an agency term.
Agency doesn’t equal ‘goodness’, but it seems like an easier target to hit. I’m trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.
the problem is there are going to be self-agency-maximizing ais at some point and the question is how to make AIs that can defend the agency of humans against those.
With optimization, I’m always concerned with the interactions of multiple agents, are there any ways in this system that two or more agents could form cartels and increase each others agency. I see this happen with some reinforcement learning models where if some edge cases aren’t covered, then they will just mine each other for easy points thanks to how we set up the reward function.