Back around 2010, I used to emphasize the importance of neuroscience for CEV (“basically an exercise in applied neuroscience”, “the AI is the neuroscientist”). I even dared to predict that before we got close to AGI, there would be a research community with a tentative draft of what human-friendly AI values should be, that is based on neuroscience. I was advancing this in contrast to proposals that we could just upload a bunch of wise humans, run them at high speed, and let them figure everything out.
These days, the avantgarde in alignment proposals seems to be less about uploading someone or everyone (though Jan Leike, head of alignment at OpenAI, talks about achieving CEV through “simulated deliberative democracy”), and more about trusting the seed AI (the AI that figures out alignment for us) to figure out the relevant causal structure of the human mind, by way of AIXI-like general powers of modeling. I’m thinking here of alignment proposals like PreDCA and QACI.
“Get the uploads to do the hard work” and “get a superintelligent modeler to do the hard work”, as alignment proposals, have a common motivation, of defining a one-shot alignment method that is as simple as possible. By contrast, RLHF is an alignment method where humans are in the loop constantly. You could say that what I had in mind was something analogous, for the task of extracting the actual human decision procedure, from data about human brains and human behavior. I envisaged it as being achieved by human beings doing research in the ordinary way.
So in the present what do we have? Steven Byrnes talks about how to align brain-like AI—perhaps one could apply his method to the architecture of the human brain itself, if only we knew it in sufficient detail. And June Ku’s very underappreciated approach to CEV rests on a specific methodology for obtaining an idealized rational utility function, from the competing inconsistent imperatives found in an actual brain.
But overall, the idea of achieving CEV, or even getting halfway to CEV, by doing plain old neuroscientific research, seems to be a path not taken. So I view your comments with interest. But I would emphasize that for this pathway, neuroscience, not just neurotechnology, is the essential ingredient. We needs theories and insights, not just experiment and data. To that end, we should be searching mainstream neuroscience for theories and insights that provide guidance.
Back around 2010, I used to emphasize the importance of neuroscience for CEV (“basically an exercise in applied neuroscience”, “the AI is the neuroscientist”). I even dared to predict that before we got close to AGI, there would be a research community with a tentative draft of what human-friendly AI values should be, that is based on neuroscience. I was advancing this in contrast to proposals that we could just upload a bunch of wise humans, run them at high speed, and let them figure everything out.
These days, the avantgarde in alignment proposals seems to be less about uploading someone or everyone (though Jan Leike, head of alignment at OpenAI, talks about achieving CEV through “simulated deliberative democracy”), and more about trusting the seed AI (the AI that figures out alignment for us) to figure out the relevant causal structure of the human mind, by way of AIXI-like general powers of modeling. I’m thinking here of alignment proposals like PreDCA and QACI.
“Get the uploads to do the hard work” and “get a superintelligent modeler to do the hard work”, as alignment proposals, have a common motivation, of defining a one-shot alignment method that is as simple as possible. By contrast, RLHF is an alignment method where humans are in the loop constantly. You could say that what I had in mind was something analogous, for the task of extracting the actual human decision procedure, from data about human brains and human behavior. I envisaged it as being achieved by human beings doing research in the ordinary way.
So in the present what do we have? Steven Byrnes talks about how to align brain-like AI—perhaps one could apply his method to the architecture of the human brain itself, if only we knew it in sufficient detail. And June Ku’s very underappreciated approach to CEV rests on a specific methodology for obtaining an idealized rational utility function, from the competing inconsistent imperatives found in an actual brain.
But overall, the idea of achieving CEV, or even getting halfway to CEV, by doing plain old neuroscientific research, seems to be a path not taken. So I view your comments with interest. But I would emphasize that for this pathway, neuroscience, not just neurotechnology, is the essential ingredient. We needs theories and insights, not just experiment and data. To that end, we should be searching mainstream neuroscience for theories and insights that provide guidance.
Agree!