RogerDearnaley

Karma: 1,736

I’m a staff artificial intelligence engineer working with AI and LLMs, and have been interested in AI alignment, safety and interpretability for the last 15 years. I’m actively looking for employment working in this area, preferably in the UK — meanwhile I’ll be participating in SERI MATS summer 2025. I will also be attending LessOnline.

The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

RogerDearnaleyMay 28, 2025, 6:21 AM

31 points

15 comments8 min readLW link

Why Aligning an LLM is Hard, and How to Make it Easier

RogerDearnaleyJan 23, 2025, 6:44 AM

33 points

3 comments4 min readLW link

[Question] What Other Lines of Work are Safe from AI Automation?

RogerDearnaleyJul 11, 2024, 10:01 AM

33 points

35 comments5 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM

63 points

41 comments24 min readLW link

7. Evolution and Ethics

RogerDearnaleyFeb 15, 2024, 11:38 PM

3 points

7 comments6 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM

41 points

12 comments31 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM

16 points

15 comments13 min readLW link

Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

RogerDearnaleyJan 26, 2024, 3:58 AM

16 points

2 comments11 min readLW link

A Chinese Room Containing a Stack of Stochastic Parrots

RogerDearnaleyJan 12, 2024, 6:29 AM

20 points

3 comments5 min readLW link

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?

RogerDearnaleyJan 11, 2024, 12:56 PM

35 points

4 comments39 min readLW link

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor

RogerDearnaleyJan 9, 2024, 8:42 PM

48 points

8 comments36 min readLW link

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

5. Moral Value for Sentient Animals? Alas, Not Yet

RogerDearnaleyDec 27, 2023, 6:42 AM

33 points

41 comments23 min readLW link

Interpreting the Learning of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM

30 points

14 comments9 min readLW link

Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment

RogerDearnaleyDec 7, 2023, 6:14 AM

9 points

0 comments11 min readLW link

6. The Mutable Values Problem in Value Learning and CEV

RogerDearnaleyDec 4, 2023, 6:31 PM

12 points

0 comments49 min readLW link

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

Dec 2, 2023, 6:03 AM

15 points

2 comments25 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM

65 points

30 comments11 min readLW link

4. A Moral Case for Evolved-Sapience-Chauvinism

RogerDearnaleyNov 24, 2023, 4:56 AM

10 points

0 comments4 min readLW link

3. Uploading

RogerDearnaleyNov 23, 2023, 7:39 AM

21 points

5 comments8 min readLW link