I have zero technical AI alignment knowledge. But this question has kept recurring to me for like a year now so I thought I’d ask.
A lot of the arguments for the danger of GAI revolve around the notion that an agent that is smarter than a human is un-boxable, self-creating and self-enhancing, and not necessarily aligned with human interests.
That pattern-matches very well onto “governments,” “corporations,” and other forms of collective agencies. They have access to collective intelligence far beyond what’s accessible to an individual. That intelligence brings them power beyond the ability of even the most clever individual to avoid long-term. Their goals aren’t necessarily aligned with human values. They use their intelligence and power to enhance their own intelligence and power. They’re not always successful, but they are often able to learn from their mistakes. If one agency destroys itself, another takes its place.
How much bearing does this have on technical AI alignment work? Can AI alignment work translate into solutions for the problems we presently have in aligning these agencies to human values? Do the restraints that have so far prevented governments/corporations from paperclipping the world map onto any proposed strategies for AI alignment?
Why Not Just: Think of AGI Like a Corporation?
Thanks, I didn’t know this series existed and it looks like it covers a lot of my questions in an accessible way!
I think this post is pointing to some strong analogies between them, though there are also some obvious disanalogies, like time it takes for a completely new agent to arise.
I think the main restraint here is time. Specifically, self-enhancing of governments and corporations is very slow and unreliable.
And this planet has already been partially “paperclipped”. The environment is destroyed, corporations and governments oppress people in many places.
From a pessimistic perspective, the only reason democracy works is that satisfied and educated humans are economically more productive, so you can extract more resources from them if you keep them happy. With invention of human-level AI, this restraint will be gone.
I find this an optimistic perspective. If Moloch is aligned with satisfaction and education, the win is stable.
Perhaps. A lot depends on exact values and whether it remains true that overall productivity depends on satisfied and educated humans. And also on whether human-level AI are morally-relevant entities and whether their satisfaction increases productivity. The term “productivity” gets weird in many singularity visions, but stays somewhat sane in others.