RSS

Owain_Evans

Karma: 3,417

https://​​owainevans.github.io/​​

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

Jun 20, 2025, 11:38 PM
30 points
8 comments6 min readLW link

Thought Crime: Back­doors & Emer­gent Misal­ign­ment in Rea­son­ing Models

Jun 16, 2025, 4:43 PM
66 points
2 comments8 min readLW link

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

Feb 25, 2025, 5:39 PM
329 points
91 comments4 min readLW link

Tell me about your­self: LLMs are aware of their learned behaviors

Jan 22, 2025, 12:47 AM
132 points
5 comments6 min readLW link

New, im­proved mul­ti­ple-choice TruthfulQA

15 Jan 2025 23:32 UTC
72 points
0 comments3 min readLW link