RSS

Apart Research

TagLast edit: 18 Jul 2024 14:34 UTC by Esben Kran

Apart Research is an AI safety research lab. They host the Apart Sprints, large-scale international events for research experimentation. This tag includes posts written by Apart researchers and content about Apart Research.

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC
25 points
0 comments1 min readLW link

Black Box In­ves­ti­ga­tion Re­search Hackathon

12 Sep 2022 7:20 UTC
9 points
4 comments2 min readLW link

Analysing Ad­ver­sar­ial At­tacks with Lin­ear Probing

17 Jun 2024 14:16 UTC
9 points
0 comments8 min readLW link

De­cep­tive agents can col­lude to hide dan­ger­ous fea­tures in SAEs

15 Jul 2024 17:07 UTC
33 points
2 comments7 min readLW link

Re­sults from the lan­guage model hackathon

Esben Kran10 Oct 2022 8:29 UTC
22 points
1 comment4 min readLW link

We Found An Neu­ron in GPT-2

11 Feb 2023 18:27 UTC
143 points
23 comments7 min readLW link
(clementneo.com)

AI Safety Ideas: An Open AI Safety Re­search Platform

Esben Kran17 Oct 2022 17:01 UTC
24 points
0 comments1 min readLW link

Safety timelines: How long will it take to solve al­ign­ment?

19 Sep 2022 12:53 UTC
37 points
7 comments6 min readLW link
(forum.effectivealtruism.org)

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 2

25 May 2023 15:37 UTC
71 points
1 comment13 min readLW link

How-to Trans­former Mechanis­tic In­ter­pretabil­ity—in 50 lines of code or less!

StefanHex24 Jan 2023 18:45 UTC
47 points
5 comments13 min readLW link

Ro­bust­ness of Model-Graded Eval­u­a­tions and Au­to­mated Interpretability

15 Jul 2023 19:12 UTC
47 points
5 comments9 min readLW link

Su­per­po­si­tion and Dropout

Edoardo Pona16 May 2023 7:24 UTC
21 points
5 comments6 min readLW link

Re­sults from the AI test­ing hackathon

Esben Kran2 Jan 2023 15:46 UTC
13 points
0 comments1 min readLW link

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC
11 points
0 comments2 min readLW link
(www.youtube.com)

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yix15 Nov 2024 0:22 UTC
39 points
2 comments5 min readLW link
(open.substack.com)

Iden­ti­fy­ing se­man­tic neu­rons, mechanis­tic cir­cuits & in­ter­pretabil­ity web apps

13 Apr 2023 11:59 UTC
18 points
0 comments8 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
17 points
0 comments5 min readLW link

Join the in­ter­pretabil­ity re­search hackathon

Esben Kran28 Oct 2022 16:26 UTC
15 points
0 comments1 min readLW link

[Book] In­ter­pretable Ma­chine Learn­ing: A Guide for Mak­ing Black Box Models Explainable

Esben Kran31 Oct 2022 11:38 UTC
20 points
1 comment1 min readLW link
(christophm.github.io)

NeurIPS Safety & ChatGPT. MLAISU W48

2 Dec 2022 15:50 UTC
3 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

ML Safety at NeurIPS & Paradig­matic AI Safety? MLAISU W49

9 Dec 2022 10:38 UTC
19 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Join the AI Test­ing Hackathon this Friday

Esben Kran12 Dec 2022 14:24 UTC
10 points
0 comments1 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben Kran16 Dec 2022 11:03 UTC
12 points
7 comments4 min readLW link
(newsletter.apartresearch.com)

AI im­prov­ing AI [MLAISU W01!]

Esben Kran6 Jan 2023 11:13 UTC
5 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Ro­bust­ness & Evolu­tion [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC
10 points
0 comments3 min readLW link
(newsletter.apartresearch.com)

Gen­er­al­iz­abil­ity & Hope for AI [MLAISU W03]

Esben Kran20 Jan 2023 10:06 UTC
5 points
2 comments2 min readLW link
(newsletter.apartresearch.com)

Au­to­mated Sand­wich­ing & Quan­tify­ing Hu­man-LLM Co­op­er­a­tion: ScaleOver­sight hackathon results

23 Feb 2023 10:48 UTC
8 points
0 comments6 min readLW link

Find­ing De­cep­tion in Lan­guage Models

20 Aug 2024 9:42 UTC
18 points
4 comments4 min readLW link

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC
11 points
0 comments1 min readLW link
(into-ai-safety.github.io)

De­mon­strate and eval­u­ate risks from AI to so­ciety at the AI x Democ­racy re­search hackathon

Esben Kran19 Apr 2024 14:46 UTC
5 points
0 comments1 min readLW link
(www.apartresearch.com)

Catas­trophic Cy­ber Ca­pa­bil­ities Bench­mark (3CB): Ro­bustly Eval­u­at­ing LLM Agent Cy­ber Offense Capabilities

5 Nov 2024 1:01 UTC
8 points
0 comments6 min readLW link
(www.apartresearch.com)

Re­sults from the AI x Democ­racy Re­search Sprint

14 Jun 2024 16:40 UTC
13 points
0 comments6 min readLW link

Can star­tups be im­pact­ful in AI safety?

13 Sep 2024 19:00 UTC
12 points
0 comments6 min readLW link

Com­pu­ta­tional Me­chan­ics Hackathon (June 1 & 2)

Adam Shai24 May 2024 22:18 UTC
34 points
5 comments1 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

9 May 2023 19:41 UTC
119 points
1 comment10 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

17 Nov 2022 14:51 UTC
81 points
0 comments6 min readLW link
(alignmentjam.com)
No comments.