The main problem here is that this approach doesn’t solve alignment, but merely shifts it to another system. We know that human organizational systems also suffer from misalignment—they are intrinsically misaligned. Here are several types of human organizational misalignment:
Dictatorship: exhibits non-corrigibility, with power becoming a convergent goal
Goodharting: manifests the same way as in AI systems
The most important thing here is that we can at least achieve an outcome with AI that is equal to the outcome we would get without AI, and as far as I know nobody has suggested a system that has that property.
I once wrote about an idea that we need to scan just one good person and make them a virtual king. This idea of mine is a subset of your idea in which several uploads form a good government.
I also spent last year perfecting my mind’s model (sideload) to be run by an LLM. I am likely now the closest person on Earth to being uploaded.
The main problem here is that this approach doesn’t solve alignment, but merely shifts it to another system. We know that human organizational systems also suffer from misalignment—they are intrinsically misaligned. Here are several types of human organizational misalignment:
Dictatorship: exhibits non-corrigibility, with power becoming a convergent goal
Goodharting: manifests the same way as in AI systems
Corruption: acts as internal wireheading
Absurd projects (pyramids, genocide): parallel AI’s paperclip maximization
Hansonian organizational rot: mirrors error accumulation in AI systems
Aggression: parallels an AI’s drive to dominate the world
All previous attempts to create a government without these issues have failed (Musk’s DOGE will likely be another such attempt).
Furthermore, this approach doesn’t prevent others from creating self-improving paperclippers.
The most important thing here is that we can at least achieve an outcome with AI that is equal to the outcome we would get without AI, and as far as I know nobody has suggested a system that has that property.
The famous “list of lethalities” (https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities) piece would consider that a strong success.
I once wrote about an idea that we need to scan just one good person and make them a virtual king. This idea of mine is a subset of your idea in which several uploads form a good government.
I also spent last year perfecting my mind’s model (sideload) to be run by an LLM. I am likely now the closest person on Earth to being uploaded.
that’s true, however I don’t think it’s necessary that the person is good.
If one king-person, he needs to be good. If many, organizational system needs to be good. Like virtual US Constitution.
yes. But this is a very unusual arrangement.
If we have one good person, we could use his-her copies many times in many roles, including high-speed assessment of the safety of AI’s outputs.
Current LLM’s, btw, have good model of the mind of Gwern (without any his personal details).