I’ve started formalizing my research proposal, so I now have: I intend to use computational game theory, system modeling, cognitive science, causal inference, and operations research methods to explore the ways in which AI systems can produce unintended consequences and develop better methods to anticipate outer alignment failures.
Can anyone point me to existing university research along these lines? I’ve made some progress after finding this thread, and I’m now planning to contact FHI about their Research Scholar’s Programme, but I’m still finding it a little time-consuming to try to match specific ongoing research with a given University or professor, so if anyone can point me to any other university programs (or professors to contact) which would fit well with my interests, that would be super helpful.
I’ve started formalizing my research proposal, so I now have:
I intend to use computational game theory, system modeling, cognitive science, causal inference, and operations research methods to explore the ways in which AI systems can produce unintended consequences and develop better methods to anticipate outer alignment failures.
Can anyone point me to existing university research along these lines? I’ve made some progress after finding this thread, and I’m now planning to contact FHI about their Research Scholar’s Programme, but I’m still finding it a little time-consuming to try to match specific ongoing research with a given University or professor, so if anyone can point me to any other university programs (or professors to contact) which would fit well with my interests, that would be super helpful.
I’d suggest talking to AI Safety Support, they offer free calls with people who want to work in the field. Rohin’s advice for alignment researchers is also worth looking at, it talks a fair amount about PhDs.
For that specific topic, maybe https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic is relevant?