CEV is based on the idea that there is an algorithm that can look at the state of my brain, filter out various kinds of noise, and extrapolate what sort of desires and values I’d want to have if I lived in a kinder more benevolent society, wasn’t subject to nearly as many serious cognitive biases, etc.
The problem I’m seeing is that the origin and meaning of terms like ‘desire’ and ‘value’ are in prescientific culture—folk psychology. they were created by people in absolute ignorance about how brains work, and it seems increasingly plausible that these concepts will be totally inadequate for any accurate scientific explanation of how brains produce human behaviour.
It seems to be common sense that desires and values and the like are indispensable theoretical posits simply because they are all we have. Our brains’ extremely limited metacogntive abilities prevent us from modelling ourselves as brains, so our brains invent a kind of mythology to explain their behaviour, which is pure confabulation.
If these ideas are right, by asking CEV to consider folk psychological ideas like desires and values, we would be committing it to the existence of things that just aren’t really present in our brain states in any objective sense.
In the worst case, running CEV might be somewhat analogous to asking the AI to use Aristotelian physics to build a better airplane.
What we perceive as the fragility and complexity of human based values might not map onto brain states at all - ‘values’ as we wish to conceive of them may not exist outside of narrative fiction and philosophy papers.
My recent thinking on these topics has been heavily influenced by the writings of Scott Bakker , Daniel Hutto and Peter Watts’ Blindsight
I hope I’m wrong about this stuff, but I don’t have the training to fully analyze and debunk these ideas by myself—if it’s even possible. I hope LW and MIRI have some insights about these issues, because I am seriously troubled by the apparent implications for the future of humanity.
Let me see if I can unpack this idea a bit more.
CEV is based on the idea that there is an algorithm that can look at the state of my brain, filter out various kinds of noise, and extrapolate what sort of desires and values I’d want to have if I lived in a kinder more benevolent society, wasn’t subject to nearly as many serious cognitive biases, etc.
The problem I’m seeing is that the origin and meaning of terms like ‘desire’ and ‘value’ are in prescientific culture—folk psychology. they were created by people in absolute ignorance about how brains work, and it seems increasingly plausible that these concepts will be totally inadequate for any accurate scientific explanation of how brains produce human behaviour.
It seems to be common sense that desires and values and the like are indispensable theoretical posits simply because they are all we have. Our brains’ extremely limited metacogntive abilities prevent us from modelling ourselves as brains, so our brains invent a kind of mythology to explain their behaviour, which is pure confabulation.
If these ideas are right, by asking CEV to consider folk psychological ideas like desires and values, we would be committing it to the existence of things that just aren’t really present in our brain states in any objective sense.
In the worst case, running CEV might be somewhat analogous to asking the AI to use Aristotelian physics to build a better airplane.
What we perceive as the fragility and complexity of human based values might not map onto brain states at all - ‘values’ as we wish to conceive of them may not exist outside of narrative fiction and philosophy papers.
My recent thinking on these topics has been heavily influenced by the writings of Scott Bakker , Daniel Hutto and Peter Watts’ Blindsight
I hope I’m wrong about this stuff, but I don’t have the training to fully analyze and debunk these ideas by myself—if it’s even possible. I hope LW and MIRI have some insights about these issues, because I am seriously troubled by the apparent implications for the future of humanity.
You may be interested in Yvain’s Blue-Minimizing Robot sequence, which addresses these concerns. To read it, go to http://lesswrong.com/user/Yvain/submitted/?count=25&after=t3_8kn, and read the posts from “The Blue-Minimizing Robot” to “Tendencies in reflective equilibrium”.
Thanks! I’ve read some of the stuff by Yvain but not these posts.