The point I was trying to make with the quote is that many people are not motivated to do “rational reflection on morality” or examine their value systems to see if they would “survive full logical and empirical information”. In fact they’re motivated to do the opposite, to protect their value systems against such reflection/examination. I’m worried that alignment researchers are not worried enough that if an alignment scheme causes the AI to just “do what the user wants”, that could cause a lock-in of crazy value systems that wouldn’t survive full logical and empirical information.
The point I was trying to make with the quote is that many people are not motivated to do “rational reflection on morality” or examine their value systems to see if they would “survive full logical and empirical information”. In fact they’re motivated to do the opposite, to protect their value systems against such reflection/examination. I’m worried that alignment researchers are not worried enough that if an alignment scheme causes the AI to just “do what the user wants”, that could cause a lock-in of crazy value systems that wouldn’t survive full logical and empirical information.