Thanks for the insightful response! Agree it’s just suggestive for now. Though more then with image models (where I’d expect lenses to transfer really badly, but don’t know). Perhaps it being a residual network is the key thing, since effective path lengths are low most of the information is “carried along” unchanged, meaning the same probe continues working for other layers. Idk
Thanks for the insightful response! Agree it’s just suggestive for now. Though more then with image models (where I’d expect lenses to transfer really badly, but don’t know). Perhaps it being a residual network is the key thing, since effective path lengths are low most of the information is “carried along” unchanged, meaning the same probe continues working for other layers. Idk