RSS

Simon Lermen

Karma: 792

Twitter: @SimonLermenAI

Hu­man study on AI spear phish­ing campaigns

Jan 3, 2025, 3:11 PM
79 points
8 comments5 min readLW link

Cur­rent safety train­ing tech­niques do not fully trans­fer to the agent setting

Nov 3, 2024, 7:24 PM
158 points
9 comments5 min readLW link

De­cep­tive agents can col­lude to hide dan­ger­ous fea­tures in SAEs

Jul 15, 2024, 5:07 PM
33 points
2 comments7 min readLW link

Ap­ply­ing re­fusal-vec­tor ab­la­tion to a Llama 3 70B agent

Simon LermenMay 11, 2024, 12:08 AM
51 points
14 comments7 min readLW link