Bogdan Ionut Cirstea comments on Bogdan Ionut Cirstea’s Shortform

Bogdan Ionut Cirstea 25 Dec 2023 10:32 UTC
6 points
4
quick take: Against Almost Every Theory of Impact of Interpretability should be required reading for ~anyone starting in AI safety (e.g. it should be in the AGISF curriculum), especially if they’re considering any model internals work (and of course even more so if they’re specifically considering mech interp)