RSS

Jeffrey Ladish

Karma: 1,982

Bounty for Ev­i­dence on Some of Pal­isade Re­search’s Beliefs

Sep 23, 2024, 8:01 PM
46 points
4 comments2 min readLW link

Take SCIFs, it’s dan­ger­ous to go alone

May 1, 2024, 8:02 AM
42 points
1 comment3 min readLW link

Pal­isade is hiring Re­search Engineers

Nov 11, 2023, 3:09 AM
23 points
0 comments3 min readLW link

unRLHF—Effi­ciently un­do­ing LLM safeguards

Oct 12, 2023, 7:58 PM
117 points
15 comments20 min readLW link

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

Oct 12, 2023, 7:58 PM
151 points
29 comments14 min readLW link

The Agency Overhang

Jeffrey LadishApr 21, 2023, 7:47 AM
85 points
6 comments6 min readLW link

Dona­tion offsets for ChatGPT Plus subscriptions

Jeffrey LadishMar 16, 2023, 11:29 PM
53 points
3 comments3 min readLW link

To de­ter­mine al­ign­ment difficulty, we need to know the ab­solute difficulty of al­ign­ment generalization

Jeffrey LadishMar 14, 2023, 3:52 AM
12 points
3 comments2 min readLW link

Thoughts on the OpenAI al­ign­ment plan: will AI re­search as­sis­tants be net-pos­i­tive for AI ex­is­ten­tial risk?

Jeffrey LadishMar 10, 2023, 8:21 AM
58 points
3 comments9 min readLW link