RSS

joshc

Karma: 1,620

Plan­ning for Ex­treme AI Risks

joshcJan 29, 2025, 6:33 PM
139 points
5 comments16 min readLW link

When does ca­pa­bil­ity elic­i­ta­tion bound risk?

joshcJan 22, 2025, 3:42 AM
25 points
0 comments17 min readLW link
(redwoodresearch.substack.com)

Ex­tend­ing con­trol eval­u­a­tions to non-schem­ing threats

joshcJan 12, 2025, 1:42 AM
30 points
1 comment12 min readLW link

New re­port: Safety Cases for AI

joshcMar 20, 2024, 4:45 PM
89 points
14 comments1 min readLW link
(twitter.com)

List of strate­gies for miti­gat­ing de­cep­tive alignment

joshcDec 2, 2023, 5:56 AM
38 points
2 comments6 min readLW link

New pa­per shows truth­ful­ness & in­struc­tion-fol­low­ing don’t gen­er­al­ize by default

joshcNov 19, 2023, 7:27 PM
60 points
0 comments4 min readLW link

Testbed evals: eval­u­at­ing AI safety even when it can’t be di­rectly mea­sured

joshcNov 15, 2023, 7:00 PM
71 points
2 comments4 min readLW link

Red team­ing: challenges and re­search directions

joshcMay 10, 2023, 1:40 AM
31 points
1 comment10 min readLW link