Alignment Research Center (ARC)

TagLast edit: Dec 30, 2024, 9:23 AM by Dakara

Alignment Research Centre (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests. Its current work focuses on developing an alignment strategy that could be adopted in industry today while scaling gracefully to future ML systems. Right now Paul Christiano, Mark Xu, and Jacob Hilton are researchers and Kyle Scott handles operations.

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth BarnesAug 1, 2023, 6:30 PM

153 points

12 comments5 min readLW link

(evals.alignment.org)

Steelmanning heuristic arguments

Dmitry VaintrobApr 13, 2025, 1:09 AM

72 points

0 comments17 min readLW link

Obstacles in ARC’s agenda: Low Probability Estimation

David MatolcsiMay 2, 2025, 7:38 PM

43 points

0 comments6 min readLW link

Obstacles in ARC’s agenda: Finding explanations

David MatolcsiApr 30, 2025, 11:03 PM

122 points

10 comments17 min readLW link

[Question] How is ARC planning to use ELK?

jacquesthibsDec 15, 2022, 8:11 PM

24 points

5 comments1 min readLW link

Obstacles in ARC’s agenda: Mechanistic Anomaly Detection

David MatolcsiMay 1, 2025, 8:51 PM

42 points

1 comment11 min readLW link

Paul Christiano on Dwarkesh Podcast

ESRogsNov 3, 2023, 10:13 PM

19 points

0 comments1 min readLW link

(www.dwarkeshpatel.com)

ARC is hiring theoretical researchers

paulfchristiano, Jacob_Hilton and Mark Xu

Jun 12, 2023, 6:50 PM

126 points

12 comments4 min readLW link

(www.alignment.org)

Low Probability Estimation in Language Models

Gabriel WuOct 18, 2024, 3:50 PM

50 points

0 comments10 min readLW link

(www.alignment.org)

Prizes for matrix completion problems

paulfchristianoMay 3, 2023, 11:30 PM

164 points

52 comments1 min readLW link

(www.alignment.org)

ARC’s first technical report: Eliciting Latent Knowledge

paulfchristiano, Mark Xu and Ajeya Cotra

Dec 14, 2021, 8:09 PM

228 points

90 comments1 min readLW link 3 reviews

(docs.google.com)

A bird’s eye view of ARC’s research

Jacob_HiltonOct 23, 2024, 3:50 PM

121 points

12 comments7 min readLW link

(www.alignment.org)

Estimating Tail Risk in Neural Networks

Mark XuSep 13, 2024, 8:00 PM

68 points

9 comments23 min readLW link

(www.alignment.org)

AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu

DanielFilanJul 27, 2023, 1:50 AM

22 points

0 comments72 min readLW link

ARC paper: Formalizing the presumption of independence

Erik JennerNov 20, 2022, 1:22 AM

97 points

2 comments2 min readLW link

(arxiv.org)

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher KingMar 15, 2023, 12:29 AM

116 points

22 comments2 min readLW link

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Beth BarnesMar 19, 2023, 12:25 AM

233 points

54 comments8 min readLW link

(evals.alignment.org)

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Andrea_Miotti, paulfchristiano, Gabriel Alfour and OliviaJ

Feb 24, 2023, 11:03 PM

61 points

7 comments47 min readLW link

Counterexamples to some ELK proposals

paulfchristianoDec 31, 2021, 5:05 PM

53 points

10 comments7 min readLW link

ARC is hiring!

paulfchristiano and Mark Xu

Dec 14, 2021, 8:09 PM

64 points

2 comments1 min readLW link

Concrete Methods for Heuristic Estimation on Neural Networks

Oliver DanielsNov 14, 2024, 5:07 AM

28 points

0 comments27 min readLW link

Experimentally evaluating whether honesty generalizes

paulfchristianoJul 1, 2021, 5:47 PM

103 points

24 comments9 min readLW link 1 review

Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth BarnesSep 9, 2022, 10:46 PM

99 points

7 comments10 min readLW link

The Alignment Problems

Martín SotoJan 12, 2023, 10:29 PM

20 points

0 comments4 min readLW link

[Question] Why is there an alignment problem?

InfiniteLightDec 22, 2023, 6:19 AM

1 point

0 comments1 min readLW link

The Goal Misgeneralization Problem

MyspyMay 18, 2023, 11:40 PM

1 point

0 comments1 min readLW link

(drive.google.com)

ELK prize results

paulfchristiano and Mark Xu

Mar 9, 2022, 12:01 AM

138 points

50 comments21 min readLW link

No comments.

Align­ment Re­search Cen­ter (ARC)

Alignment Research Center (ARC)