RSS

RogerDearnaley

Karma: 1,591

I’m an staff artificial intelligence engineer currently working with LLMs, and have been interested in AI alignment, safety and interpretability for the last 15 years. I’m now actively looking for employment working in this area.

Why Align­ing an LLM is Hard, and How to Make it Easier

RogerDearnaleyJan 23, 2025, 6:44 AM
30 points
3 comments4 min readLW link

[Question] What Other Lines of Work are Safe from AI Au­toma­tion?

RogerDearnaleyJul 11, 2024, 10:01 AM
31 points
35 comments5 min readLW link

A “Bit­ter Les­son” Ap­proach to Align­ing AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM
61 points
41 comments24 min readLW link

7. Evolu­tion and Ethics

RogerDearnaleyFeb 15, 2024, 11:38 PM
3 points
6 comments6 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM
41 points
12 comments31 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM
16 points
15 comments13 min readLW link

Ap­prox­i­mately Bayesian Rea­son­ing: Knigh­tian Uncer­tainty, Good­hart, and the Look-Else­where Effect

RogerDearnaleyJan 26, 2024, 3:58 AM
16 points
2 comments11 min readLW link

A Chi­nese Room Con­tain­ing a Stack of Stochas­tic Parrots

RogerDearnaleyJan 12, 2024, 6:29 AM
20 points
3 comments5 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaleyJan 11, 2024, 12:56 PM
35 points
4 comments39 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaleyJan 9, 2024, 8:42 PM
47 points
8 comments36 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM
37 points
4 comments2 min readLW link

5. Mo­ral Value for Sen­tient An­i­mals? Alas, Not Yet

RogerDearnaleyDec 27, 2023, 6:42 AM
33 points
41 comments23 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM
30 points
14 comments9 min readLW link

Lan­guage Model Me­moriza­tion, Copy­right Law, and Con­di­tional Pre­train­ing Alignment

RogerDearnaleyDec 7, 2023, 6:14 AM
9 points
0 comments11 min readLW link

6. The Mutable Values Prob­lem in Value Learn­ing and CEV

RogerDearnaleyDec 4, 2023, 6:31 PM
12 points
0 comments49 min readLW link

After Align­ment — Dialogue be­tween RogerDear­naley and Seth Herd

Dec 2, 2023, 6:03 AM
15 points
2 comments25 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM
65 points
30 comments11 min readLW link

4. A Mo­ral Case for Evolved-Sapi­ence-Chau­vinism

RogerDearnaleyNov 24, 2023, 4:56 AM
10 points
0 comments4 min readLW link

3. Uploading

RogerDearnaleyNov 23, 2023, 7:39 AM
21 points
5 comments8 min readLW link

2. AIs as Eco­nomic Agents

RogerDearnaleyNov 23, 2023, 7:07 AM
9 points
2 comments6 min readLW link