RSS

AI Safety Men­tors and Men­tees Program

TagLast edit: 10 May 2023 10:55 UTC by Magdalena Wache

The AI Safety Mentors and Mentees program aims to facilitate mentoring in AI safety. It does so by

An­nounc­ing AI safety Men­tors and Mentees

Marius Hobbhahn23 Nov 2022 15:21 UTC
62 points
7 comments10 min readLW link

AISC Pro­ject: Model­ling Tra­jec­to­ries of Lan­guage Models

NickyP13 Nov 2023 14:33 UTC
27 points
0 comments12 min readLW link

What Dis­cov­er­ing La­tent Knowl­edge Did and Did Not Find

Fabien Roger13 Mar 2023 19:29 UTC
166 points
17 comments11 min readLW link

The Translu­cent Thoughts Hy­pothe­ses and Their Implications

Fabien Roger9 Mar 2023 16:30 UTC
141 points
7 comments19 min readLW link

If Went­worth is right about nat­u­ral ab­strac­tions, it would be bad for alignment

Wuschel Schulz8 Dec 2022 15:19 UTC
29 points
5 comments4 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Introduction

21 Nov 2022 20:34 UTC
34 points
3 comments4 min readLW link
(www.snellessen.com)

[Heb­bian Nat­u­ral Ab­strac­tions] Math­e­mat­i­cal Foundations

25 Dec 2022 20:58 UTC
15 points
2 comments6 min readLW link
(www.snellessen.com)

How Do In­duc­tion Heads Ac­tu­ally Work in Trans­form­ers With Finite Ca­pac­ity?

Fabien Roger23 Mar 2023 9:09 UTC
27 points
0 comments5 min readLW link

I made an AI safety fel­low­ship. What I wish I knew.

Ruben Castaing8 Jun 2024 15:23 UTC
12 points
0 comments2 min readLW link

The In­ter-Agent Facet of AI Alignment

Michael Oesterle18 Sep 2022 20:39 UTC
12 points
1 comment5 min readLW link
No comments.