RSS

Florian_Dietz

Karma: 286

Edge Cases in AI Alignment

Florian_DietzMar 24, 2025, 9:27 AM
19 points
3 comments4 min readLW link

Split Per­son­al­ity Train­ing: Re­veal­ing La­tent Knowl­edge Through Per­son­al­ity-Shift Tokens

Florian_DietzMar 10, 2025, 4:07 PM
35 points
3 comments9 min readLW link

Do we want al­ign­ment fak­ing?

Florian_DietzFeb 28, 2025, 9:50 PM
7 points
4 comments1 min readLW link

Re­veal­ing al­ign­ment fak­ing with a sin­gle prompt

Florian_DietzJan 29, 2025, 9:01 PM
9 points
5 comments4 min readLW link