Thanks for the question -- f is calculated over an entire batch of inputs, not a single x. Figure 1 shows how the Switch SAE processes a single residual stream activation x.
Thank you for the answer, that makes more sense.
Thanks for the question -- f is calculated over an entire batch of inputs, not a single x. Figure 1 shows how the Switch SAE processes a single residual stream activation x.
Thank you for the answer, that makes more sense.