RSS

Align­ment Re­search Cen­ter (ARC)

TagLast edit: 23 Oct 2024 17:57 UTC by Raemon

The Alignment Research Centre (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests. Its current work focuses on developing an alignment strategy that could be adopted in industry today while scaling gracefully to future ML systems. Right now Paul Christiano, Mark Xu, Beth Barnes, and Jacob Hilton are researchers and Kyle Scott handles operations.

ARC Evals new re­port: Eval­u­at­ing Lan­guage-Model Agents on Real­is­tic Au­tonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC
153 points
12 comments5 min readLW link
(evals.alignment.org)

Paul Chris­ti­ano on Dwarkesh Podcast

ESRogs3 Nov 2023 22:13 UTC
17 points
0 comments1 min readLW link
(www.dwarkeshpatel.com)

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik Jenner20 Nov 2022 1:22 UTC
97 points
2 comments2 min readLW link
(arxiv.org)

Prizes for ma­trix com­ple­tion problems

paulfchristiano3 May 2023 23:30 UTC
164 points
52 comments1 min readLW link
(www.alignment.org)

[Question] How is ARC plan­ning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC
24 points
5 comments1 min readLW link

More in­for­ma­tion about the dan­ger­ous ca­pa­bil­ity eval­u­a­tions we did with GPT-4 and Claude.

Beth Barnes19 Mar 2023 0:25 UTC
233 points
54 comments8 min readLW link
(evals.alignment.org)

Low Prob­a­bil­ity Es­ti­ma­tion in Lan­guage Models

Gabriel Wu18 Oct 2024 15:50 UTC
49 points
0 comments10 min readLW link
(www.alignment.org)

ARC is hiring the­o­ret­i­cal researchers

12 Jun 2023 18:50 UTC
126 points
12 comments4 min readLW link
(www.alignment.org)

AXRP Epi­sode 23 - Mechanis­tic Ano­maly De­tec­tion with Mark Xu

DanielFilan27 Jul 2023 1:50 UTC
22 points
0 comments72 min readLW link

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
116 points
22 comments2 min readLW link

A bird’s eye view of ARC’s research

Jacob_Hilton23 Oct 2024 15:50 UTC
116 points
12 comments7 min readLW link
(www.alignment.org)

Es­ti­mat­ing Tail Risk in Neu­ral Networks

Mark Xu13 Sep 2024 20:00 UTC
68 points
9 comments23 min readLW link
(www.alignment.org)

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

14 Dec 2021 20:09 UTC
225 points
90 comments1 min readLW link3 reviews
(docs.google.com)

Coun­terex­am­ples to some ELK proposals

paulfchristiano31 Dec 2021 17:05 UTC
53 points
10 comments7 min readLW link

Con­crete Meth­ods for Heuris­tic Es­ti­ma­tion on Neu­ral Networks

Oliver Daniels14 Nov 2024 5:07 UTC
27 points
0 comments27 min readLW link

The Goal Mis­gen­er­al­iza­tion Problem

Myspy18 May 2023 23:40 UTC
1 point
0 comments1 min readLW link
(drive.google.com)

ARC is hiring!

14 Dec 2021 20:09 UTC
64 points
2 comments1 min readLW link

[Question] Why is there an al­ign­ment prob­lem?

InfiniteLight22 Dec 2023 6:19 UTC
1 point
0 comments1 min readLW link

Ex­per­i­men­tally eval­u­at­ing whether hon­esty generalizes

paulfchristiano1 Jul 2021 17:47 UTC
103 points
24 comments9 min readLW link1 review

Eval­u­a­tions pro­ject @ ARC is hiring a re­searcher and a web­dev/​engineer

Beth Barnes9 Sep 2022 22:46 UTC
99 points
7 comments10 min readLW link

The Align­ment Problems

Martín Soto12 Jan 2023 22:29 UTC
20 points
0 comments4 min readLW link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
61 points
7 comments47 min readLW link

ELK prize results

9 Mar 2022 0:01 UTC
138 points
50 comments21 min readLW link
No comments.