RSS

Apart Research

TagLast edit: Jul 18, 2024, 2:34 PM by Esben Kran

Apart Research is an AI safety research lab. They host the Apart Sprints, large-scale international events for research experimentation. This tag includes posts written by Apart researchers and content about Apart Research.

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM
25 points
0 comments1 min readLW link

Black Box In­ves­ti­ga­tion Re­search Hackathon

Sep 12, 2022, 7:20 AM
9 points
4 comments2 min readLW link

We Found An Neu­ron in GPT-2

Feb 11, 2023, 6:27 PM
143 points
23 comments7 min readLW link
(clementneo.com)

Analysing Ad­ver­sar­ial At­tacks with Lin­ear Probing

Jun 17, 2024, 2:16 PM
9 points
0 comments8 min readLW link

Re­sults from the lan­guage model hackathon

Esben KranOct 10, 2022, 8:29 AM
22 points
1 comment4 min readLW link

De­cep­tive agents can col­lude to hide dan­ger­ous fea­tures in SAEs

Jul 15, 2024, 5:07 PM
33 points
2 comments7 min readLW link

AI Safety Ideas: An Open AI Safety Re­search Platform

Esben KranOct 17, 2022, 5:01 PM
24 points
0 comments1 min readLW link

Safety timelines: How long will it take to solve al­ign­ment?

Sep 19, 2022, 12:53 PM
37 points
7 comments6 min readLW link
(forum.effectivealtruism.org)

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 2

May 25, 2023, 3:37 PM
71 points
1 comment13 min readLW link

How-to Trans­former Mechanis­tic In­ter­pretabil­ity—in 50 lines of code or less!

StefanHexJan 24, 2023, 6:45 PM
47 points
5 comments13 min readLW link

Ro­bust­ness of Model-Graded Eval­u­a­tions and Au­to­mated Interpretability

Jul 15, 2023, 7:12 PM
47 points
5 comments9 min readLW link

Su­per­po­si­tion and Dropout

Edoardo PonaMay 16, 2023, 7:24 AM
21 points
5 comments6 min readLW link

Re­sults from the AI test­ing hackathon

Esben KranJan 2, 2023, 3:46 PM
13 points
0 comments1 min readLW link

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul BricmanJan 7, 2024, 9:31 AM
11 points
0 comments2 min readLW link
(www.youtube.com)

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yixNov 15, 2024, 12:22 AM
41 points
2 comments5 min readLW link
(open.substack.com)

La­tent Ad­ver­sar­ial Train­ing (LAT) Im­proves the Rep­re­sen­ta­tion of Refusal

Jan 6, 2025, 10:24 AM
20 points
6 comments10 min readLW link

Iden­ti­fy­ing se­man­tic neu­rons, mechanis­tic cir­cuits & in­ter­pretabil­ity web apps

Apr 13, 2023, 11:59 AM
18 points
0 comments8 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

Oct 3, 2023, 7:45 AM
17 points
0 comments5 min readLW link

Join the in­ter­pretabil­ity re­search hackathon

Esben KranOct 28, 2022, 4:26 PM
15 points
0 comments1 min readLW link

[Book] In­ter­pretable Ma­chine Learn­ing: A Guide for Mak­ing Black Box Models Explainable

Esben KranOct 31, 2022, 11:38 AM
20 points
1 comment1 min readLW link
(christophm.github.io)

NeurIPS Safety & ChatGPT. MLAISU W48

Dec 2, 2022, 3:50 PM
3 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

ML Safety at NeurIPS & Paradig­matic AI Safety? MLAISU W49

Dec 9, 2022, 10:38 AM
19 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Join the AI Test­ing Hackathon this Friday

Esben KranDec 12, 2022, 2:24 PM
10 points
0 comments1 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben KranDec 16, 2022, 11:03 AM
12 points
7 comments4 min readLW link
(newsletter.apartresearch.com)

AI im­prov­ing AI [MLAISU W01!]

Esben KranJan 6, 2023, 11:13 AM
5 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Ro­bust­ness & Evolu­tion [MLAISU W02]

Esben KranJan 13, 2023, 3:47 PM
10 points
0 comments3 min readLW link
(newsletter.apartresearch.com)

Gen­er­al­iz­abil­ity & Hope for AI [MLAISU W03]

Esben KranJan 20, 2023, 10:06 AM
5 points
2 comments2 min readLW link
(newsletter.apartresearch.com)

Au­to­mated Sand­wich­ing & Quan­tify­ing Hu­man-LLM Co­op­er­a­tion: ScaleOver­sight hackathon results

Feb 23, 2023, 10:48 AM
8 points
0 comments6 min readLW link

Find­ing De­cep­tion in Lan­guage Models

Aug 20, 2024, 9:42 AM
18 points
4 comments4 min readLW link

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimesJan 8, 2024, 5:10 PM
11 points
0 comments1 min readLW link
(into-ai-safety.github.io)

De­mon­strate and eval­u­ate risks from AI to so­ciety at the AI x Democ­racy re­search hackathon

Esben KranApr 19, 2024, 2:46 PM
5 points
0 comments1 min readLW link
(www.apartresearch.com)

Catas­trophic Cy­ber Ca­pa­bil­ities Bench­mark (3CB): Ro­bustly Eval­u­at­ing LLM Agent Cy­ber Offense Capabilities

Nov 5, 2024, 1:01 AM
8 points
0 comments6 min readLW link
(www.apartresearch.com)

Re­sults from the AI x Democ­racy Re­search Sprint

Jun 14, 2024, 4:40 PM
13 points
0 comments6 min readLW link

Can star­tups be im­pact­ful in AI safety?

Sep 13, 2024, 7:00 PM
15 points
0 comments6 min readLW link

Com­pu­ta­tional Me­chan­ics Hackathon (June 1 & 2)

Adam ShaiMay 24, 2024, 10:18 PM
34 points
5 comments1 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

May 9, 2023, 7:41 PM
119 points
1 comment10 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

Nov 17, 2022, 2:51 PM
81 points
0 comments6 min readLW link
(alignmentjam.com)
No comments.