RSS

Charbel-Raphaël

Karma: 1,665

Charbel-Raphael Segerie

https://​​crsegerie.github.io/​​

Living in Paris

AI Safety Strate­gies Landscape

Charbel-Raphaël9 May 2024 17:33 UTC
29 points
1 comment42 min readLW link

Con­structabil­ity: Plainly-coded AGIs may be fea­si­ble in the near future

27 Apr 2024 16:04 UTC
66 points
12 comments13 min readLW link

[Question] What con­vinc­ing warn­ing shot could help pre­vent ex­tinc­tion from AI?

13 Apr 2024 18:09 UTC
106 points
18 comments2 min readLW link

My in­tel­lec­tual jour­ney to (dis)solve the hard prob­lem of consciousness

Charbel-Raphaël6 Apr 2024 9:32 UTC
41 points
41 comments30 min readLW link

AI Safety 101 : Ca­pa­bil­ities—Hu­man Level AI, What? How? and When?

7 Mar 2024 17:29 UTC
46 points
8 comments49 min readLW link

The case for train­ing fron­tier AIs on Sume­rian-only corpus

15 Jan 2024 16:40 UTC
127 points
14 comments3 min readLW link

aisafety.info, the Table of Content

Charbel-Raphaël31 Dec 2023 13:57 UTC
23 points
1 comment11 min readLW link

Re­sults from the Tur­ing Sem­i­nar hackathon

7 Dec 2023 14:50 UTC
29 points
1 comment6 min readLW link

AI Safety 101 - Chap­ter 5.2 - Un­re­stricted Ad­ver­sar­ial Training

Charbel-Raphaël31 Oct 2023 14:34 UTC
17 points
0 comments19 min readLW link

AI Safety 101 - Chap­ter 5.1 - Debate

Charbel-Raphaël31 Oct 2023 14:29 UTC
14 points
0 comments13 min readLW link

Char­bel-Raphaël and Lu­cius dis­cuss Interpretability

30 Oct 2023 5:50 UTC
104 points
7 comments21 min readLW link

Against Al­most Every The­ory of Im­pact of Interpretability

Charbel-Raphaël17 Aug 2023 18:44 UTC
315 points
83 comments26 min readLW link

AI Safety 101 : In­tro­duc­tion to Vi­sion Interpretability

28 Jul 2023 17:32 UTC
41 points
0 comments1 min readLW link
(github.com)

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC
27 points
0 comments19 min readLW link
(docs.google.com)

An Overview of AI risks—the Flyer

17 Jul 2023 12:03 UTC
20 points
0 comments1 min readLW link
(docs.google.com)

In­tro­duc­ing EffiS­ciences’ AI Safety Unit

30 Jun 2023 7:44 UTC
65 points
0 comments12 min readLW link

Im­prove­ment on MIRI’s Corrigibility

9 Jun 2023 16:10 UTC
54 points
8 comments13 min readLW link

Thriv­ing in the Weird Times: Prepar­ing for the 100X Economy

8 May 2023 13:44 UTC
23 points
16 comments2 min readLW link

Davi­dad’s Bold Plan for Align­ment: An In-Depth Explanation

19 Apr 2023 16:09 UTC
154 points
30 comments21 min readLW link

New Hackathon: Ro­bust­ness to dis­tri­bu­tion changes and ambiguity

Charbel-Raphaël31 Jan 2023 12:50 UTC
11 points
3 comments1 min readLW link