RSS

Owain_Evans

Karma: 3,324

https://​​owainevans.github.io/​​

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

Feb 25, 2025, 5:39 PM
321 points
88 comments4 min readLW link

Tell me about your­self: LLMs are aware of their learned behaviors

Jan 22, 2025, 12:47 AM
129 points
5 comments6 min readLW link

New, im­proved mul­ti­ple-choice TruthfulQA

Jan 15, 2025, 11:32 PM
72 points
0 comments3 min readLW link

In­fer­ence-Time-Com­pute: More Faith­ful? A Re­search Note

Jan 15, 2025, 4:43 AM
69 points
10 comments11 min readLW link

Tips On Em­piri­cal Re­search Slides

Jan 8, 2025, 5:06 AM
89 points
4 comments6 min readLW link