I’ve seen a little bit of this, but nowhere near as much as I think the topic merits. I agree that systematic studies on where and how the reconstruction errors make their effects known might be quite informative.
Basically, whenever people train SAEs, or use some other approximate model decomposition that degrades performance, I think they should ideally spend some time after just playing with the degraded model and talking to it. Figure out in what ways it is worse.
I’ve seen a little bit of this, but nowhere near as much as I think the topic merits. I agree that systematic studies on where and how the reconstruction errors make their effects known might be quite informative.
Basically, whenever people train SAEs, or use some other approximate model decomposition that degrades performance, I think they should ideally spend some time after just playing with the degraded model and talking to it. Figure out in what ways it is worse.
Hmmm ok maybe I’ll take a look at this :)