Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?
Reflective equilibrium. Yudkowsky’s proposed extrapolation works analogously to what philosophers call ‘reflective equilibrium.’ The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.
Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about ‘what we would want if we were fully informed, etc.’ or ‘what a perfectly informed agent would want’ like CEV does. There’s someliterature on this, but it’s only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.
Metaethics. Should we use CEV, or something else? What does ‘should’ mean?
An Introduction to Contemporary Metaethics is a good introduction to mainstream metaethics. Unfortunately, nearly all of mainstream metaethics is horribly misguided, but the book will at least give you a good sense of the questions involved and what some of the wrong answers are. The chapter on moral reductionism is the most profitable.
Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.
Beginning resources for CEV research
I’ve been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I’d share some of the sources I’ve been using.
CEV sources.
Yudkowsky, Metaethics sequence
Yudkowsky, ‘Coherent Extrapolated Volition’
Tarleton, ‘Coherent extrapolated volition: A meta-level approach to machine ethics’
Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.
Neuroeconomics studies motivation as a driver of action under uncertainty. Start with Neuroeconomics: Decision Making and the Brain (2008) and Foundations of Neuroeconomic Analysis (2010), and see my bibliography here.
Affective neuroscience studies motivation as an emotion. Start with Pleasures of the Brain (2009) and my bibliography here.
Motivation science integrates psychological approaches to studying motivation. Start with The Psychology of Goals (2009), Oxford Handbook of Human Action (2008), and Handbook of Motivation Science (2007).
Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?
Reflective equilibrium. Yudkowsky’s proposed extrapolation works analogously to what philosophers call ‘reflective equilibrium.’ The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.
Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about ‘what we would want if we were fully informed, etc.’ or ‘what a perfectly informed agent would want’ like CEV does. There’s some literature on this, but it’s only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.
Metaethics. Should we use CEV, or something else? What does ‘should’ mean?
Yudkowsky, Metaethics sequence
An Introduction to Contemporary Metaethics is a good introduction to mainstream metaethics. Unfortunately, nearly all of mainstream metaethics is horribly misguided, but the book will at least give you a good sense of the questions involved and what some of the wrong answers are. The chapter on moral reductionism is the most profitable.
Also see ‘Which Consequentialism? Machine ethics and moral divergence.’
Building the utility function. How can a seed AI be built? How can it read what to value?
Dewey, ‘Learning What to Value’
Yudkowsky, ‘Coherent Extrapolated Volition’
Yudkowsky, ‘Artificial Intelligence as a Positive and Negative Factor in Global Risk’
Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?
Yudkowsky, ‘Coherent Extrapolated Volition’
De Blanc, ‘Ontological Crises in Artificial Agents’ Value Systems’
Omohundro, ‘Basic AI Drives’ and ‘The Nature of Self-Improving Artificial Intelligence’ (instrumental drives to watch out for, and more)
Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.
See the Less Wrong wiki page on decision theory.
Wei Dai’s Updateless Decision Theory
Yudkowsky’s Timeless Decision Theory
Additional suggestions welcome. I’ll try to keep this page up-to-date.