Can you say more about what you mean by “steering mechanisms”? Is it something like “outer loop of an optimization algorithm”?
How complex of a computation do you expect to need in order to find an example where cut activations express something that’s hard to find in layer activations?
Can you say more about what you mean by “steering mechanisms”? Is it something like “outer loop of an optimization algorithm”?
How complex of a computation do you expect to need in order to find an example where cut activations express something that’s hard to find in layer activations?