I don’t really have any coherent hypotheses (not that I’ve tried for any fixed amount of time by the clock) for why this might be the case. I do, however, have a couple of vague suggestions for how one might go about gaining slightly more information that might lead to a hypothesis, if you’re interested.
The main one involves looking at the local nonlinearities of the few layers after the intervention layer at various inputs, by which I mean examining diff(t) = f(input+t*top_right_vec) - f(input) as a function of t (for small values of t, in particular) (where f=nn.Sequential({the n layers after the intervention layer}) for various small integers n).
One of the motivations for this is that it feels more confusing that [adding works and subtracting doesn’t] than that [increasing the coefficient strength does diff things in diff regimes, ie for diff coefficient strengths], but if you think about it, both of those are just us being surprised/confused that the function I described above is locally nonlinear for various values of t.[1] It seems possible, then, that examining the nonlinearities in the subsequent few layers could shed some light on a slightly more general phenomenon that’ll also explain why adding works but subtracting doesn’t.
It’s also possible, of course, that all the relevant nonlinearities kick in much further down the line, which would render this pretty useless. If this turns out to be the case, one might try finding “cheese vectors” or “top-right vectors” in as late a layer as possible[2], and then re-attempt this.
We only care more about the former confusion (that adding works and subtracting doesn’t) because we’re privileging t=0, which isn’t unreasonable, but perhaps zooming out just a bit will help, idk
I’m under the impression that the current layer wasn’t chosen for much of a particular reason, so it might be a simple matter to just choose a later layer that performs nearly as well?
I’m under the impression that the current layer wasn’t chosen for much of a particular reason, so it might be a simple matter to just choose a later layer that performs nearly as well?
The current layer was chosen because I looked at all the layers for the cheese vector, and the current layer is the only one (IIRC) which produced interesting/good results. I think the cheese vector doesn’t really work at other layers, but haven’t checked recently.
In the framework of the comment above regarding the add/subtract thing, I’d also be interested in examining the function diff(s,t) = f(input+t*top_right_vec+s*cheese_vec) - f(input).
The composition claim here is saying something like diff(s,t) = diff(s,0) + diff(0,t). I’d be interested to see when this is true. It seems like your current claim is that this (approximately) holds when s<0 and t>0 and neither are too large, but maybe it holds in more or fewer scenarios. In particular, I’m surprised at the weird hard boundaries at s=0 and t=0.
Same.
I don’t really have any coherent hypotheses (not that I’ve tried for any fixed amount of time by the clock) for why this might be the case. I do, however, have a couple of vague suggestions for how one might go about gaining slightly more information that might lead to a hypothesis, if you’re interested.
The main one involves looking at the local nonlinearities of the few layers after the intervention layer at various inputs, by which I mean examining
diff(t) = f(input+t*top_right_vec) - f(input)
as a function of t (for small values of t, in particular) (wheref=nn.Sequential({the n layers after the intervention layer})
for various small integers n).One of the motivations for this is that it feels more confusing that [adding works and subtracting doesn’t] than that [increasing the coefficient strength does diff things in diff regimes, ie for diff coefficient strengths], but if you think about it, both of those are just us being surprised/confused that the function I described above is locally nonlinear for various values of t.[1] It seems possible, then, that examining the nonlinearities in the subsequent few layers could shed some light on a slightly more general phenomenon that’ll also explain why adding works but subtracting doesn’t.
It’s also possible, of course, that all the relevant nonlinearities kick in much further down the line, which would render this pretty useless. If this turns out to be the case, one might try finding “cheese vectors” or “top-right vectors” in as late a layer as possible[2], and then re-attempt this.
We only care more about the former confusion (that adding works and subtracting doesn’t) because we’re privileging t=0, which isn’t unreasonable, but perhaps zooming out just a bit will help, idk
I’m under the impression that the current layer wasn’t chosen for much of a particular reason, so it might be a simple matter to just choose a later layer that performs nearly as well?
The current layer was chosen because I looked at all the layers for the cheese vector, and the current layer is the only one (IIRC) which produced interesting/good results. I think the cheese vector doesn’t really work at other layers, but haven’t checked recently.
In the framework of the comment above regarding the add/subtract thing, I’d also be interested in examining the function
diff(s,t) = f(input+t*top_right_vec+s*cheese_vec) - f(input)
.The composition claim here is saying something like
diff(s,t) = diff(s,0) + diff(0,t)
. I’d be interested to see when this is true. It seems like your current claim is that this (approximately) holds when s<0 and t>0 and neither are too large, but maybe it holds in more or fewer scenarios. In particular, I’m surprised at the weird hard boundaries ats=0
andt=0
.