Neuroscience, and how I see it being relevant to AI alignment.
Corrigibility (not an expert here, but excited to learn more and hear others ideas). Especially interested in figuring out if anyone has ideas about how to draw boundaries around what would constitute ‘too much manipulation’ from a model or how to measure manipulation.
Compute Governance, why it might be a good idea but also might be a bad idea.
Why I think simulations are valuable for alignment research. (I’m referring to simulations as in simulated worlds or game environments, not as in viewing LLMs as simulators of agents).
I’d enjoy having a dialogue.
Some topics that seem interesting to me:
Neuroscience, and how I see it being relevant to AI alignment.
Corrigibility (not an expert here, but excited to learn more and hear others ideas). Especially interested in figuring out if anyone has ideas about how to draw boundaries around what would constitute ‘too much manipulation’ from a model or how to measure manipulation.
Compute Governance, why it might be a good idea but also might be a bad idea.
Why I think simulations are valuable for alignment research. (I’m referring to simulations as in simulated worlds or game environments, not as in viewing LLMs as simulators of agents).
Interested in chatting about neuroscience