This is an interesting idea that is being explored, but how do you nail it down precisely so that the superintelligence is actually interested in optimizing for it, and so that the beings whose agency is being optimized for are actually the ones you’re interested in preserving? Identifying the agents in a chunk of matter is not a solved problem. Eg, here’s a rough sketch of the challenge I see, posed as a question to a hypothetical future LLM (I know of no LLM capable of helping significantly with this, GPT4 and Gemini Advanced have both been insufficient. I’m hopeful the causal incentives group hits another home run like Discovering Agents and nails it down.)
Meanwhile, the folks who have been discussing boundaries seem to maybe possibly be onto something about defining a zone of agency, maybe. I’m not totally sure they have anything to add on top of Discovering Agents.
Cannell has also talked about “empowerment of other”—Empowerment is the term of art for what you’re proposing here.
It always comes down to the difficulty of making sure the superintelligence’s agency is actually seeking agency for others, rather than a facsimile of agency for others that turns out to just be pictures of agency.
Yes to both. I don’t think Cannell is correct about an implementation of what he said being a good idea, even if it was a certified implementation, and I also don’t think his idea is close to ready to implement. Agent membranes still seem at all interesting, right now as far as I know the most interesting work is coming from the Levin lab (tufts university, michael levin), but I’m not happy with any of it for nailing down what we mean by aligning an arbitrarily powerful mind to care about the actual beings in its environment in a strongly durable way.
it’s not directly on alignment, but it’s relevant to understanding agent membranes. understanding his work seems useful as a strong exemplar of what one needs to describe with a formal theory of agents and such. particularly interesting is https://pubmed.ncbi.nlm.nih.gov/31920779/
It’s not the result we’re looking for, but it’s inspiring in useful ways.
There’s a lot of information here that will be super helpful for me to delve into. I’ve been bookmarking your links.
I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I’m glad to see there’s lots of research happening on this and I’ll be checking out ‘empowerment’ as an agency term.
Agency doesn’t equal ‘goodness’, but it seems like an easier target to hit. I’m trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.
the problem is there are going to be self-agency-maximizing ais at some point and the question is how to make AIs that can defend the agency of humans against those.
With optimization, I’m always concerned with the interactions of multiple agents, are there any ways in this system that two or more agents could form cartels and increase each others agency. I see this happen with some reinforcement learning models where if some edge cases aren’t covered, then they will just mine each other for easy points thanks to how we set up the reward function.
This is an interesting idea that is being explored, but how do you nail it down precisely so that the superintelligence is actually interested in optimizing for it, and so that the beings whose agency is being optimized for are actually the ones you’re interested in preserving? Identifying the agents in a chunk of matter is not a solved problem. Eg, here’s a rough sketch of the challenge I see, posed as a question to a hypothetical future LLM (I know of no LLM capable of helping significantly with this, GPT4 and Gemini Advanced have both been insufficient. I’m hopeful the causal incentives group hits another home run like Discovering Agents and nails it down.)
Meanwhile, the folks who have been discussing boundaries seem to maybe possibly be onto something about defining a zone of agency, maybe. I’m not totally sure they have anything to add on top of Discovering Agents.
Cannell has also talked about “empowerment of other”—Empowerment is the term of art for what you’re proposing here.
It always comes down to the difficulty of making sure the superintelligence’s agency is actually seeking agency for others, rather than a facsimile of agency for others that turns out to just be pictures of agency.
Do you mean this? Empowerment is (almost) All We Need
and this: Agent membranes/boundaries and formalizing “safety”
Yes to both. I don’t think Cannell is correct about an implementation of what he said being a good idea, even if it was a certified implementation, and I also don’t think his idea is close to ready to implement. Agent membranes still seem at all interesting, right now as far as I know the most interesting work is coming from the Levin lab (tufts university, michael levin), but I’m not happy with any of it for nailing down what we mean by aligning an arbitrarily powerful mind to care about the actual beings in its environment in a strongly durable way.
I’m not clear about what research by Michael Levin you mean. I found him mentioned here: «Boundaries», Part 3b: Alignment problems in terms of boundaries but his research seems to be about cellular computation, not related to alignment.
https://www.drmichaellevin.org/research/
https://www.drmichaellevin.org/publications/
it’s not directly on alignment, but it’s relevant to understanding agent membranes. understanding his work seems useful as a strong exemplar of what one needs to describe with a formal theory of agents and such. particularly interesting is https://pubmed.ncbi.nlm.nih.gov/31920779/
It’s not the result we’re looking for, but it’s inspiring in useful ways.
Super interesting!
There’s a lot of information here that will be super helpful for me to delve into. I’ve been bookmarking your links.
I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I’m glad to see there’s lots of research happening on this and I’ll be checking out ‘empowerment’ as an agency term.
Agency doesn’t equal ‘goodness’, but it seems like an easier target to hit. I’m trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.
the problem is there are going to be self-agency-maximizing ais at some point and the question is how to make AIs that can defend the agency of humans against those.
With optimization, I’m always concerned with the interactions of multiple agents, are there any ways in this system that two or more agents could form cartels and increase each others agency. I see this happen with some reinforcement learning models where if some edge cases aren’t covered, then they will just mine each other for easy points thanks to how we set up the reward function.