Thanks for the insight! I expect the same to hold though for Gemma 2B base (pre-trained) vs Gemma 2B Instruct models? Gemma-2b-Python-codes is just a full finetune on top of the Instruct model (probably produced without a large number of update steps), and previous work that studied Instruct models indicated that SAEs don’t transfer to the Instruct Gemma 2B either.
Thanks for the insight! I expect the same to hold though for Gemma 2B base (pre-trained) vs Gemma 2B Instruct models? Gemma-2b-Python-codes is just a full finetune on top of the Instruct model (probably produced without a large number of update steps), and previous work that studied Instruct models indicated that SAEs don’t transfer to the Instruct Gemma 2B either.