RSS

rgorman

Karma: 274

Go home GPT-4o, you’re drunk: emer­gent mis­al­ign­ment as low­ered inhibitions

Mar 18, 2025, 2:48 PM
77 points
12 comments5 min readLW link

Us­ing Prompt Eval­u­a­tion to Com­bat Bio-Weapon Research

Feb 19, 2025, 12:39 PM
11 points
2 comments3 min readLW link

Defense Against the Dark Prompts: Miti­gat­ing Best-of-N Jailbreak­ing with Prompt Evaluation

Jan 31, 2025, 3:36 PM
16 points
2 comments2 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

Dec 12, 2022, 10:09 PM
20 points
2 comments3 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

Dec 6, 2022, 7:54 PM
170 points
85 comments9 min readLW link