Thousands of people are thinking about the AI Alignment problem. Many are recreating ideas that have already been suggested and shot down. More importantly, many are creating new ideas but figure that they are in the first camp and fail to share them.
Please feel free to use the comments as a temporary solution! It would also be appreciated to include in the comments bad ideas you have seen and why they won’t work.
In order to remove myself from the second camp, I’ll share my amateur alignment idea here:
An AI that behaves towards humans in a similar way to how it would choose for a more powerful agent to behave towards it. This is hoping that the Golden Rule can prevent some of the worst AI outcomes. The AI could be a paperclip maximizer, but as long as it recognizes that humans have wants (it doesn’t matter if the AI is created with an understanding of exactly what those wants are) and it understands that the more powerful agent could have wants different from its own, it will hopefully stay mostly out of humanity’s way while it turns the rest of the universe into paperclips and maybe throw us some resources and technology our way, as it would want the more powerful agent to do for it. It even prevents the situation where AI self-modifies itself to not care about the Golden Rule, because it would not want for the more powerful agent to remove its own Golden Rule. This was inspired by superrationality, which seems pretty close to a way to put morals into something amoral.
We Need a Consolidated List of Bad AI Alignment Solutions
Thousands of people are thinking about the AI Alignment problem. Many are recreating ideas that have already been suggested and shot down. More importantly, many are creating new ideas but figure that they are in the first camp and fail to share them.
Please feel free to use the comments as a temporary solution! It would also be appreciated to include in the comments bad ideas you have seen and why they won’t work.
In order to remove myself from the second camp, I’ll share my amateur alignment idea here:
An AI that behaves towards humans in a similar way to how it would choose for a more powerful agent to behave towards it. This is hoping that the Golden Rule can prevent some of the worst AI outcomes. The AI could be a paperclip maximizer, but as long as it recognizes that humans have wants (it doesn’t matter if the AI is created with an understanding of exactly what those wants are) and it understands that the more powerful agent could have wants different from its own, it will hopefully stay mostly out of humanity’s way while it turns the rest of the universe into paperclips and maybe throw us some resources and technology our way, as it would want the more powerful agent to do for it. It even prevents the situation where AI self-modifies itself to not care about the Golden Rule, because it would not want for the more powerful agent to remove its own Golden Rule. This was inspired by superrationality, which seems pretty close to a way to put morals into something amoral.