Bostrom’s philosophical outlook shows. He’s defined the four categories to be mutually exclusive, and with the obvious fifth case they’re exhaustive, too.
Don’t select motivations, but use ones believed to be friendly. (e.g. Augment a nice person.)
Don’t select motivations, and use ones not believed to be friendly. (i.e. Constrain them with domesticity constraints.)
(Combinations of 1-4.)
In one sense, then, there aren’t other general motivation selection methods. But in a more useful sense, we might be able to divide up the conceptual space into different categories than the ones Bostrom used, and the resulting categories could be heuristics that jumpstart development of new ideas.
Um, I should probably get more concrete and try to divide it differently. The following example alternative categories aren’t promised to be the kind that will effectively ripen your heuristics.
Research how human values are developed as a biological and cognitive process, and simulate that in the AI whether or not we understand what will result. (i.e. Neuromorphic AI, the kind Bostrom fears most)
Research how human values are developed as a social and dialectic process, and simulate that in the AI whether or not we understand what will result. (e.g. Rawls’s Genie)
Directly specify a single theory of partial human value, but an important part that we can get right, and sacrifice our remaining values to guarantee this one; or indirectly specify that the AI should figure out what single principle we most value and ensure that it is done. (e.g. Zookeeper).
Directly specify a combination of many different ideas about human values rather than trying to get the one theory right; or indirectly specify that the AI should do the same thing. (e.g. “Plato’s Republic”)
The thought was to first divide the methods by whether we program the means or the ends, roughly. Second I subdivided those by whether we program it to find a unified or a composite solution, roughly. Anyhow, there may be other methods of categorizing this area of thought that more neatly carve it up at its joints.
Another approach might be to dump in a whole bunch of data, and hope that the simplest model that fits the data is a good model of human values (this is like Paul Christiano’s hack to attempt to specify a whole brain emulation as part of an indirect normativity if we haven’t achieved whole brain emulation capability yet: http://ordinaryideas.wordpress.com/2012/04/21/indirect-normativity-write-up/). There might be other sets of data that could be used in this way, ie. run a massive survey on philosophical problems, record a bunch of people’s brains while they watch stories play out in television, dump in DNA and hope it encodes stuff that points to brain regions relevant to morality etc. (I don’t trust this method though).
Bostrom’s philosophical outlook shows. He’s defined the four categories to be mutually exclusive, and with the obvious fifth case they’re exhaustive, too.
Select motivations directly. (e.g. Asimov’s 3 laws)
Select motivations indirectly. (e.g. CEV)
Don’t select motivations, but use ones believed to be friendly. (e.g. Augment a nice person.)
Don’t select motivations, and use ones not believed to be friendly. (i.e. Constrain them with domesticity constraints.)
(Combinations of 1-4.)
In one sense, then, there aren’t other general motivation selection methods. But in a more useful sense, we might be able to divide up the conceptual space into different categories than the ones Bostrom used, and the resulting categories could be heuristics that jumpstart development of new ideas.
Um, I should probably get more concrete and try to divide it differently. The following example alternative categories aren’t promised to be the kind that will effectively ripen your heuristics.
Research how human values are developed as a biological and cognitive process, and simulate that in the AI whether or not we understand what will result. (i.e. Neuromorphic AI, the kind Bostrom fears most)
Research how human values are developed as a social and dialectic process, and simulate that in the AI whether or not we understand what will result. (e.g. Rawls’s Genie)
Directly specify a single theory of partial human value, but an important part that we can get right, and sacrifice our remaining values to guarantee this one; or indirectly specify that the AI should figure out what single principle we most value and ensure that it is done. (e.g. Zookeeper).
Directly specify a combination of many different ideas about human values rather than trying to get the one theory right; or indirectly specify that the AI should do the same thing. (e.g. “Plato’s Republic”)
The thought was to first divide the methods by whether we program the means or the ends, roughly. Second I subdivided those by whether we program it to find a unified or a composite solution, roughly. Anyhow, there may be other methods of categorizing this area of thought that more neatly carve it up at its joints.
Another approach might be to dump in a whole bunch of data, and hope that the simplest model that fits the data is a good model of human values (this is like Paul Christiano’s hack to attempt to specify a whole brain emulation as part of an indirect normativity if we haven’t achieved whole brain emulation capability yet: http://ordinaryideas.wordpress.com/2012/04/21/indirect-normativity-write-up/). There might be other sets of data that could be used in this way, ie. run a massive survey on philosophical problems, record a bunch of people’s brains while they watch stories play out in television, dump in DNA and hope it encodes stuff that points to brain regions relevant to morality etc. (I don’t trust this method though).