abhatt349 comments on Maze-solving agents: Add a top-right vector, make the agent go to the top-right

abhatt349 Apr 7, 2023, 7:35 AM
10 points
0
I guess that I’m imagining that the {presence of a representation of a path}, to the extent that it’s represented in the model at all, is used primarily to compute some sort of “top-right affinity” heuristic. So even if it is true that, when there’s no representation of a path, subtracting the {representation of a path}-vector should do nothing, I think that subtracting the “top-right affinity” vector that’s downstream of this path representation should still do something regardless of whether there is or isn’t currently a path representation.
So I guess the disagreement in our intuitions (or the intuitions suggested by our respective hypotheses) maybe just boils down to “is the thing we’re editing closer to a {path representation} or a {top-right affinity heuristic}?” Maybe this weakly implies that this effect might weaken/disappear if you tried to do your AVE at a later layer (as I suggest at the end of this comment), since that might be more likely to represent a {top-right affinity heuristic} than a {path representation}?
It’s possible, however, that I’m misunderstanding your point. To help clarify, can I ask what you mean by “representation of a path” on a slightly more mechanistic level?
- Do you mean you can find some set of activations (after the edited layer) from which you can faithfully reconstruct the path to the top right?
- Or do you perhaps mean something weaker, like being able to find some activation that strongly and robustly correlates with “top-right-path-existence” or “top-right-path-length”, or something like that?^[1]
- Or maybe you didn’t mean anything specific and were just trying to draw a comparison to other reasoning processes? If this is the case, I think I don’t quite buy that this is too likely to be informative about the maze model’s internal cognition without further justification.
- Or maybe you meant something else entirely!!! I’m sure I’ve left out many very reasonable possibilities, so please do correct me when I’m wrong!
1. ^
  Btw, it seems like a cheap and relatively informative experiment to just try computing neural correlates with variables like “distance to top-right-most reachable point” or “how close top-right-most reachable point is to the top-right”. This might be worth doing even if this isn’t what you meant by “representations of a path”, since it could shed light on what channels/layers are most important or best to perform AVE on.