Evan R. Murphy

Karma: 1,142

I’m doing research and other work focused on AI safety/security, governance and risk reduction. Currently my top projects are (last updated Feb 26, 2025):

Technical researcher for UC Berkeley at the AI Security Initiative, part of the Center for Long-Term Cybersecurity (CLTC)
Serving on the board of directors for AI Governance & Safety Canada

General areas of interest for me are AI safety strategy, comparative AI alignment research, prioritizing technical alignment work, analyzing the published alignment plans of major AI labs, interpretability, the Conditioning Predictive Models agenda, deconfusion research and other AI safety-related topics. My work is currently self-funded.

Research that I’ve authored or co-authored:

Steering Behaviour: Testing for (Non-)Myopia in Language Models
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
(Scroll down to read other posts and comments I’ve written)

Other recent work:

Started the regular coworking meetup in Vancouver, BC for people interested in AI safety and effective altruism
Facilitator for the AI Safety Fellowship (2022) at Columbia University Effective Altruism
Gave a talk on myopia and deceptive alignment at an AI safety event hosted by University of Victoria (Jan 29, 2023)
Invited/participated in the CLTC UC Berkeley Virtual Workshops on the “Risk Management-Standards Profile for Increasingly Multi- or General-Purpose AI” (Jan 2023 and May 2023)
Reviewed early pre-published drafts of work by other researchers:
- Conditioning Predictive Models: Risks and Strategies by Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson and Kate Woolverton
- Circumventing interpretability: How to defeat mind-readers by Lee Sharkey
- Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks by Tony Barrett, Dan Hendryks, Jessica Newman and Brandie Nonnecke
- AI Safety Seems Hard to Measure by Holden Karnofsky
- Racing through a minefield: the AI deployment problem by Holden Karnofsky
- Alignment with argument-networks and assessment-predictions by Tor Økland Barstad
- Interpreting Neural Networks through the Polytope Lens by Sid Black et al.
- Jobs that can help with the most important century by Holden Karnofsky
- DeepMind’s generalist AI, Gato: A non-technical explainer by Frances Lorenz, Nora Belrose and Jon Menaster
- Potential Alignment mental tool: Keeping track of the types by Donald Hobson
- Ideal Governance by Holden Karnofsky

Before getting into AI safety, I was a software engineer for 11 years at Google and various startups. You can find details about my previous work on my LinkedIn.

I’m always happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!

Evan R. Murphy’s Shortform

Evan R. MurphyFeb 28, 2025, 12:56 AM

6 points

2 comments1 min readLW link

Steven Pinker on ChatGPT and AGI (Feb 2023)

Evan R. MurphyMar 5, 2023, 9:34 PM

11 points

8 comments1 min readLW link

(news.harvard.edu)

Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy and Megan Kinniment

Dec 5, 2022, 8:28 PM

40 points

19 comments10 min readLW link

Paper: Large Language Models Can Self-improve [Linkpost]

Evan R. MurphyOct 2, 2022, 1:29 AM

52 points

15 comments1 min readLW link

(openreview.net)

Google AI integrates PaLM with robotics: SayCan update [Linkpost]

Evan R. MurphyAug 24, 2022, 8:54 PM

25 points

0 comments1 min readLW link

(sites.research.google)

Surprised by ELK report’s counterexample to Debate, IDA

Evan R. MurphyAug 4, 2022, 2:12 AM

18 points

0 comments5 min readLW link

New US Senate Bill on X-Risk Mitigation [Linkpost]

Evan R. MurphyJul 4, 2022, 1:25 AM

35 points

12 comments1 min readLW link

(www.hsgac.senate.gov)

Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM

58 points

0 comments59 min readLW link

Introduction to the sequence: Interpretability Research for the Most Important Century

Evan R. MurphyMay 12, 2022, 7:59 PM

16 points

0 comments8 min readLW link

[Question] What is a training “step” vs. “episode” in machine learning?

Evan R. MurphyApr 28, 2022, 9:53 PM

10 points

4 comments1 min readLW link

Action: Help expand funding for AI Safety by coordinating on NSF response

Evan R. MurphyJan 19, 2022, 10:47 PM

23 points

8 comments3 min readLW link

Promising posts on AF that have fallen through the cracks

Evan R. MurphyJan 4, 2022, 3:39 PM

34 points

6 comments2 min readLW link