RSS

joshc

Karma: 571

joshuaclymer.com

New re­port: Safety Cases for AI

joshc20 Mar 2024 16:45 UTC
89 points
14 comments1 min readLW link
(twitter.com)

List of strate­gies for miti­gat­ing de­cep­tive alignment

joshc2 Dec 2023 5:56 UTC
35 points
2 comments6 min readLW link

New pa­per shows truth­ful­ness & in­struc­tion-fol­low­ing don’t gen­er­al­ize by default

joshc19 Nov 2023 19:27 UTC
59 points
0 comments4 min readLW link

Testbed evals: eval­u­at­ing AI safety even when it can’t be di­rectly mea­sured

joshc15 Nov 2023 19:00 UTC
71 points
2 comments4 min readLW link

Red team­ing: challenges and re­search directions

joshc10 May 2023 1:40 UTC
31 points
1 comment10 min readLW link

Safety stan­dards: a frame­work for AI regulation

joshc1 May 2023 0:56 UTC
19 points
0 comments8 min readLW link

Are short timelines ac­tu­ally bad?

joshc5 Feb 2023 21:21 UTC
61 points
7 comments3 min readLW link

[MLSN #7]: an ex­am­ple of an emer­gent in­ter­nal optimizer

9 Jan 2023 19:39 UTC
28 points
0 comments6 min readLW link