RSS

rgorman

Karma: 206

Us­ing Prompt Eval­u­a­tion to Com­bat Bio-Weapon Research

19 Feb 2025 12:39 UTC
11 points
2 comments3 min readLW link

Defense Against the Dark Prompts: Miti­gat­ing Best-of-N Jailbreak­ing with Prompt Evaluation

31 Jan 2025 15:36 UTC
16 points
2 comments2 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

12 Dec 2022 22:09 UTC
20 points
2 comments3 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

6 Dec 2022 19:54 UTC
170 points
85 comments9 min readLW link