Apart Research

TagLast edit: Jul 18, 2024, 2:34 PM by Esben Kran

Apart Research is an AI safety research lab. They host the Apart Sprints, large-scale international events for research experimentation. This tag includes posts written by Apart researchers and content about Apart Research.

Newsletter for Alignment Research: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM

25 points

0 comments1 min readLW link

Black Box Investigation Research Hackathon

Esben Kran and Jonas Hallgren

Sep 12, 2022, 7:20 AM

9 points

4 comments2 min readLW link

Safety timelines: How long will it take to solve alignment?

Esben Kran, JonathanRystroem and Steinthal

Sep 19, 2022, 12:53 PM

37 points

7 comments6 min readLW link

(forum.effectivealtruism.org)

Analysing Adversarial Attacks with Linear Probing

Yoann Poupart, Imene Kerboua, Clement Neo and Jason Hoelscher-Obermaier

Jun 17, 2024, 2:16 PM

9 points

0 comments8 min readLW link

We Found An Neuron in GPT-2

Joseph Miller and Clement Neo

Feb 11, 2023, 6:27 PM

143 points

23 comments7 min readLW link

(clementneo.com)

Deceptive agents can collude to hide dangerous features in SAEs

Simon Lermen and Mateusz Dziemian

Jul 15, 2024, 5:07 PM

33 points

2 comments7 min readLW link

Results from the language model hackathon

Esben KranOct 10, 2022, 8:29 AM

22 points

1 comment4 min readLW link

AI Safety Ideas: An Open AI Safety Research Platform

Esben KranOct 17, 2022, 5:01 PM

24 points

0 comments1 min readLW link

Computational Mechanics Hackathon (June 1 & 2)

Adam ShaiMay 24, 2024, 10:18 PM

34 points

5 comments1 min readLW link

College technical AI safety hackathon retrospective—Georgia Tech

yixNov 15, 2024, 12:22 AM

41 points

2 comments5 min readLW link

(open.substack.com)

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

Oct 3, 2023, 7:45 AM

17 points

0 comments5 min readLW link

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1

StefanHex and Marius Hobbhahn

May 9, 2023, 7:41 PM

119 points

1 comment10 min readLW link

Robustness of Model-Graded Evaluations and Automated Interpretability

Simon Lermen and viluon

Jul 15, 2023, 7:12 PM

47 points

5 comments9 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben KranDec 16, 2022, 11:03 AM

12 points

7 comments4 min readLW link

(newsletter.apartresearch.com)

Towards AI Safety Infrastructure: Talk & Outline

Paul BricmanJan 7, 2024, 9:31 AM

11 points

0 comments2 min readLW link

(www.youtube.com)

Results from the interpretability hackathon

Esben Kran and Neel Nanda

Nov 17, 2022, 2:51 PM

81 points

0 comments6 min readLW link

(alignmentjam.com)

Identifying semantic neurons, mechanistic circuits & interpretability web apps

Esben Kran and Neel Nanda

Apr 13, 2023, 11:59 AM

18 points

0 comments8 min readLW link

Latent Adversarial Training (LAT) Improves the Representation of Refusal

alexandraabbas, nlpet and hal2k

Jan 6, 2025, 10:24 AM

20 points

6 comments10 min readLW link

Results from the AI x Democracy Research Sprint

Esben Kran, jordine and Jason Hoelscher-Obermaier

Jun 14, 2024, 4:40 PM

13 points

0 comments6 min readLW link

Robustness & Evolution [MLAISU W02]

Esben KranJan 13, 2023, 3:47 PM

10 points

0 comments3 min readLW link

(newsletter.apartresearch.com)

Hackathon and Staying Up-to-Date in AI

jacobhaimesJan 8, 2024, 5:10 PM

11 points

0 comments1 min readLW link

(into-ai-safety.github.io)

How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!

StefanHexJan 24, 2023, 6:45 PM

47 points

5 comments13 min readLW link

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2

StefanHex and Marius Hobbhahn

May 25, 2023, 3:37 PM

71 points

1 comment13 min readLW link

Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon

Esben KranApr 19, 2024, 2:46 PM

5 points

0 comments1 min readLW link

(www.apartresearch.com)

Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities

Jonathan N, abra, Connor Axiotes and Esben Kran

Nov 5, 2024, 1:01 AM

8 points

0 comments6 min readLW link

(www.apartresearch.com)

Can startups be impactful in AI safety?

Esben Kran and Archana Vaidheeswaran

Sep 13, 2024, 7:00 PM

15 points

0 comments6 min readLW link

AI improving AI [MLAISU W01!]

Esben KranJan 6, 2023, 11:13 AM

5 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

Generalizability & Hope for AI [MLAISU W03]

Esben KranJan 20, 2023, 10:06 AM

5 points

2 comments2 min readLW link

(newsletter.apartresearch.com)

Finding Deception in Language Models

Esben Kran and Archana Vaidheeswaran

Aug 20, 2024, 9:42 AM

20 points

4 comments4 min readLW link

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results

Esben Kran, Fazl, Sabrina Zaki, gabrielrecc and rz2383

Feb 23, 2023, 10:48 AM

8 points

0 comments6 min readLW link

Superposition and Dropout

Edoardo PonaMay 16, 2023, 7:24 AM

21 points

5 comments6 min readLW link

ML Safety at NeurIPS & Paradigmatic AI Safety? MLAISU W49

Esben Kran and Steinthal

Dec 9, 2022, 10:38 AM

19 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

Join the AI Testing Hackathon this Friday

Esben KranDec 12, 2022, 2:24 PM

10 points

0 comments1 min readLW link

Join the interpretability research hackathon

Esben KranOct 28, 2022, 4:26 PM

15 points

0 comments1 min readLW link

[Book] Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

Esben KranOct 31, 2022, 11:38 AM

20 points

1 comment1 min readLW link

(christophm.github.io)

Results from the AI testing hackathon

Esben KranJan 2, 2023, 3:46 PM

13 points

0 comments1 min readLW link

NeurIPS Safety & ChatGPT. MLAISU W48

Esben Kran and Steinthal

Dec 2, 2022, 3:50 PM

3 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

No comments.