I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you’re creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can’t think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it’s not clear if such AI systems will tell us much about how more powerful systems will behave. It’s this “single chance to transition from a safe to dangerous operating domain” part of the problem that is so uniquely difficult about AI alignment.
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you’re creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can’t think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it’s not clear if such AI systems will tell us much about how more powerful systems will behave. It’s this “single chance to transition from a safe to dangerous operating domain” part of the problem that is so uniquely difficult about AI alignment.