This is brilliant work, thank you. It’s great that someone is working on these topics and they seem highly relevant to AGI alignment.
One intuition for why a neuroscience-inspired approach to AI alignment seems promising is that apparently a similar strategy worked for AI capabilities: the neural network researchers from the 1980s who tried to copy how the brain works using deep learning were ultimately the most successful at building highly intelligent AIs (e.g. GPT-4) and more synthetic approaches (e.g. pure logic) were less successful.
Similarly, we already know that the brain has the capacity to represent and be directed by human values so arguably the shortest path to succeeding at AI alignment is to try to understand and replicate the brain’s circuitry underlying human motivations and values in AIs.
The only other AI alignment research agenda I can think of that seems to follow a similar strategy is Shard Theory though it seems more high-level and more related to RL than neuroscience.
This is brilliant work, thank you. It’s great that someone is working on these topics and they seem highly relevant to AGI alignment.
One intuition for why a neuroscience-inspired approach to AI alignment seems promising is that apparently a similar strategy worked for AI capabilities: the neural network researchers from the 1980s who tried to copy how the brain works using deep learning were ultimately the most successful at building highly intelligent AIs (e.g. GPT-4) and more synthetic approaches (e.g. pure logic) were less successful.
Similarly, we already know that the brain has the capacity to represent and be directed by human values so arguably the shortest path to succeeding at AI alignment is to try to understand and replicate the brain’s circuitry underlying human motivations and values in AIs.
The only other AI alignment research agenda I can think of that seems to follow a similar strategy is Shard Theory though it seems more high-level and more related to RL than neuroscience.