Super interesting!
There’s a lot of information here that will be super helpful for me to delve into. I’ve been bookmarking your links.
I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I’m glad to see there’s lots of research happening on this and I’ll be checking out ‘empowerment’ as an agency term.
Agency doesn’t equal ‘goodness’, but it seems like an easier target to hit. I’m trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.
Is it possible to develop specialized (narrow) AI that surpasses every human at infecting/destroying GPU systems, but won’t wipe us out? LLM-powered Stuxnet would be an example. Bacteria isn’t smarter than humans, but it is still very dangerous. It seems like a digital counterpart could prevent GPUs and so, prevent AGI.
(Obviously, I’m not advocating for this in particular since it would mean the end of the internet and I like the internet. It seems likely, however, that there are pivotal acts possible by narrow AI that prevent AGI without actually being AGI.)