RSS

joshc

Karma: 1,183

joshuaclymer.com

New re­port: Safety Cases for AI

joshcMar 20, 2024, 4:45 PM
89 points
14 comments1 min readLW link
(twitter.com)

List of strate­gies for miti­gat­ing de­cep­tive alignment

joshcDec 2, 2023, 5:56 AM
36 points
2 comments6 min readLW link

New pa­per shows truth­ful­ness & in­struc­tion-fol­low­ing don’t gen­er­al­ize by default

joshcNov 19, 2023, 7:27 PM
60 points
0 comments4 min readLW link

Testbed evals: eval­u­at­ing AI safety even when it can’t be di­rectly mea­sured

joshcNov 15, 2023, 7:00 PM
71 points
2 comments4 min readLW link

Red team­ing: challenges and re­search directions

joshcMay 10, 2023, 1:40 AM
31 points
1 comment10 min readLW link

Safety stan­dards: a frame­work for AI regulation

joshcMay 1, 2023, 12:56 AM
19 points
0 comments8 min readLW link

Are short timelines ac­tu­ally bad?

joshcFeb 5, 2023, 9:21 PM
61 points
7 comments3 min readLW link

[MLSN #7]: an ex­am­ple of an emer­gent in­ter­nal optimizer

Jan 9, 2023, 7:39 PM
28 points
0 comments6 min readLW link