RSS

StefanHex

Karma: 1,271

Stefan Heimersheim. Research Scientist at Apollo Research, Mechanistic Interpretability. The opinions expressed here are my own and do not necessarily reflect the views of my employer.

You can re­move GPT2’s Lay­erNorm by fine-tun­ing for an hour

StefanHexAug 8, 2024, 6:33 PM
161 points
11 comments8 min readLW link

A List of 45+ Mech In­terp Pro­ject Ideas from Apollo Re­search’s In­ter­pretabil­ity Team

Jul 18, 2024, 2:15 PM
118 points
18 comments18 min readLW link

[In­terim re­search re­port] Ac­ti­va­tion plateaus & sen­si­tive di­rec­tions in GPT2

Jul 5, 2024, 5:05 PM
65 points
2 comments5 min readLW link

Ste­fanHex’s Shortform

StefanHexJul 5, 2024, 2:31 PM
5 points
22 comments1 min readLW link