RSS

Align­ment Jam

TagLast edit: May 16, 2023, 8:25 AM by Esben Kran

This lists the posts that have come from the Alignment Jam hackathons.

Com­pu­ta­tional Me­chan­ics Hackathon (June 1 & 2)

Adam ShaiMay 24, 2024, 10:18 PM
34 points
5 comments1 min readLW link

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul BricmanJan 7, 2024, 9:31 AM
11 points
0 comments2 min readLW link
(www.youtube.com)

Find­ing De­cep­tion in Lan­guage Models

Aug 20, 2024, 9:42 AM
18 points
4 comments4 min readLW link

De­mon­strate and eval­u­ate risks from AI to so­ciety at the AI x Democ­racy re­search hackathon

Esben KranApr 19, 2024, 2:46 PM
5 points
0 comments1 min readLW link
(www.apartresearch.com)

Re­sults from the AI test­ing hackathon

Esben KranJan 2, 2023, 3:46 PM
13 points
0 comments1 min readLW link

Su­per­po­si­tion and Dropout

Edoardo PonaMay 16, 2023, 7:24 AM
21 points
5 comments6 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

May 9, 2023, 7:41 PM
119 points
1 comment10 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

Nov 17, 2022, 2:51 PM
81 points
0 comments6 min readLW link
(alignmentjam.com)

Iden­ti­fy­ing se­man­tic neu­rons, mechanis­tic cir­cuits & in­ter­pretabil­ity web apps

Apr 13, 2023, 11:59 AM
18 points
0 comments8 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 2

May 25, 2023, 3:37 PM
71 points
1 comment13 min readLW link

We Found An Neu­ron in GPT-2

Feb 11, 2023, 6:27 PM
143 points
23 comments7 min readLW link
(clementneo.com)

How-to Trans­former Mechanis­tic In­ter­pretabil­ity—in 50 lines of code or less!

StefanHexJan 24, 2023, 6:45 PM
47 points
5 comments13 min readLW link

Ro­bust­ness of Model-Graded Eval­u­a­tions and Au­to­mated Interpretability

Jul 15, 2023, 7:12 PM
47 points
5 comments9 min readLW link