The Ernie Davis interview was pretty interesting as a really good delve into what people are thinking when they don’t see AI alignment work as important.
The disagreement on how impactful superintelligent AI would be seems important, but not critically important. As long as you agree the impact of AIs that make plans about the real world will be “big enough,” you’re probably on board with wanting them to make plans that are aligned with human values.
The “common sense” disagreement definitely seems more central. The argument goes something like “Any AI that actually makes good plans has to have common sense about the world, it’s common sense that killing is wrong, the AI won’t kill people.”
Put like this, there’s a bit of a package deal fallacy, where common sense is treated as a package deal even though “fire is hot” and “killing is bad” are easy to logically separate.
But we can steelman this by talking about learning methods—if we have a learning method that works for all the common sense that learns “fire is hot,” wouldn’t it be easy to also use that to learn “killing is bad?” Well, maybe not necessarily, because of the is/ought distinction. If the AI represents “is” statements with a world model, and then rates actions in the world by using an “ought” model, then it’s possible for a method to do really well at learning “is”’s without being good at learning “ought”s.
Thinking in terms of learning methods also opens up a second question—is it really necessary for an AI to have human-like common sense? If you just throw a bunch of compute at the problem, could you get an AI that takes clever actions in the world without ever learning the specific fact “fire is hot”? How likely is this possibility?
What you say about is/ought is basically the alignment problem, right? My take is: I have high confidence that future AIs will know intellectually what it is that humans regard as common-sense morality, since that knowledge is instrumentally useful for any goal involving predicting or interacting with humans. I have less confidence that we’ll figure out how to ensure that those AIs adopt human common-sense morality. Even humans, who probably have an innate drive to follow societal norms, will sometimes violate norms anyway, or do terrible things in a way that works around those constraints.
The Ernie Davis interview was pretty interesting as a really good delve into what people are thinking when they don’t see AI alignment work as important.
The disagreement on how impactful superintelligent AI would be seems important, but not critically important. As long as you agree the impact of AIs that make plans about the real world will be “big enough,” you’re probably on board with wanting them to make plans that are aligned with human values.
The “common sense” disagreement definitely seems more central. The argument goes something like “Any AI that actually makes good plans has to have common sense about the world, it’s common sense that killing is wrong, the AI won’t kill people.”
Put like this, there’s a bit of a package deal fallacy, where common sense is treated as a package deal even though “fire is hot” and “killing is bad” are easy to logically separate.
But we can steelman this by talking about learning methods—if we have a learning method that works for all the common sense that learns “fire is hot,” wouldn’t it be easy to also use that to learn “killing is bad?” Well, maybe not necessarily, because of the is/ought distinction. If the AI represents “is” statements with a world model, and then rates actions in the world by using an “ought” model, then it’s possible for a method to do really well at learning “is”’s without being good at learning “ought”s.
Thinking in terms of learning methods also opens up a second question—is it really necessary for an AI to have human-like common sense? If you just throw a bunch of compute at the problem, could you get an AI that takes clever actions in the world without ever learning the specific fact “fire is hot”? How likely is this possibility?
What you say about is/ought is basically the alignment problem, right? My take is: I have high confidence that future AIs will know intellectually what it is that humans regard as common-sense morality, since that knowledge is instrumentally useful for any goal involving predicting or interacting with humans. I have less confidence that we’ll figure out how to ensure that those AIs adopt human common-sense morality. Even humans, who probably have an innate drive to follow societal norms, will sometimes violate norms anyway, or do terrible things in a way that works around those constraints.