I’m doing research and other work focused on AI safety/security, governance and risk reduction. Currently my top projects are (last updated Feb 26, 2025):
Technical researcher for UC Berkeley at the AI Security Initiative, part of the Center for Long-Term Cybersecurity (CLTC)
Serving on the board of directors for AI Governance & Safety Canada
General areas of interest for me are AI safety strategy, comparative AI alignment research, prioritizing technical alignment work, analyzing the published alignment plans of major AI labs, interpretability, deconfusion research and other AI safety-related topics.
Research that I’ve authored or co-authored:
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
(Scroll down to read other posts and comments I’ve written)
Before getting into AI safety, I was a software engineer for 11 years at Google and various startups. You can find details about my previous work on my LinkedIn.
While I’m not always great at responding, I’m happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!
Wow this seems like a really important breakthrough.
Are defection probes also a solution to the undetectable backdoor problem from Goldwasser et al. 2022?