Some of my mentees are working on something related to this right now! (One of them brought this comment to my attention). It’s definitely very preliminary work—observing what results in different contexts when you do something like tune a model using some extracted representations of model internals as a signal—but as you say we don’t really have very much empirics on this, so I’m pretty excited by it.
Some of my mentees are working on something related to this right now! (One of them brought this comment to my attention). It’s definitely very preliminary work—observing what results in different contexts when you do something like tune a model using some extracted representations of model internals as a signal—but as you say we don’t really have very much empirics on this, so I’m pretty excited by it.