My nonexpert guess would be that the “superintelligent brain emulation” class of solutions has potential to work. But we’d still need to figure out how to prevent an AGI from being made until we’re ready to actually implement that solution.
My hope was that maybe we can recreate the way we humans make beneficial decisions for fellow beings without simulating a complete brain. But I agree that AGI might be built before we have solved this.
I think doing that via, like, reinforcement learning, is well-established as a possible strategy and discarded because it probably won’t generate the properties we want.
Maybe we could solve legibility enough to extract the value-assessing part out of a human brain, and then put it on a computer? This doesn’t strike me as a solution but it might be a useful idea.
I was thinking more about the way psychologists try to understand the way we make decisions. I stumbled across two papers from a few years ago by such a psychologist, Mark Muraven, who thinks that the way humans deal with conflicting goals could be important for AI alignment (https://arxiv.org/abs/1701.01487 and https://arxiv.org/abs/1703.06354).They appear a bit shallow to me and don’t contain any specific ideas on how to implement this. But maybe Muraven has a point here. Maybe we should put more effort into understanding the way we humans deal with goals, instead of letting an AI figure it out for itself through RL or IRL.
My nonexpert guess would be that the “superintelligent brain emulation” class of solutions has potential to work. But we’d still need to figure out how to prevent an AGI from being made until we’re ready to actually implement that solution.
My hope was that maybe we can recreate the way we humans make beneficial decisions for fellow beings without simulating a complete brain. But I agree that AGI might be built before we have solved this.
I think doing that via, like, reinforcement learning, is well-established as a possible strategy and discarded because it probably won’t generate the properties we want.
Maybe we could solve legibility enough to extract the value-assessing part out of a human brain, and then put it on a computer? This doesn’t strike me as a solution but it might be a useful idea.
I was thinking more about the way psychologists try to understand the way we make decisions. I stumbled across two papers from a few years ago by such a psychologist, Mark Muraven, who thinks that the way humans deal with conflicting goals could be important for AI alignment (https://arxiv.org/abs/1701.01487 and https://arxiv.org/abs/1703.06354).They appear a bit shallow to me and don’t contain any specific ideas on how to implement this. But maybe Muraven has a point here. Maybe we should put more effort into understanding the way we humans deal with goals, instead of letting an AI figure it out for itself through RL or IRL.