RSS

kaivu

Karma: 176

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

Jul 8, 2024, 10:24 PM
109 points
37 comments5 min readLW link

Take­aways from a Mechanis­tic In­ter­pretabil­ity pro­ject on “For­bid­den Facts”

Dec 15, 2023, 11:05 AM
33 points
8 comments10 min readLW link

Up­date on Har­vard AI Safety Team and MIT AI Alignment

Dec 2, 2022, 12:56 AM
60 points
4 comments8 min readLW link