Any idea why “cheese Euclidean distance to top-right corner” is so important? It’s surprising to me because the convolutional layers should apply the same filter everywhere.
My naive guess is that the other relationships are nonlinear, and this is the best way to approximate those relationships out of just linear relationships of the variables the regressor had access to.
Hm, what do you mean by “other relationships”? Is your guess that “cheese Euclidean distance to top-right” is a statistical artifact, or something else?
If so—I’m quite confident that relationship isn’t an artifact (although I don’t strongly believe that the network is literally modulating its decisions on the basis of this exact formalization). For example, see footnote 4. I’d also be happy to generate additional vector field visualizations in support of this claim.
Is the dataset you used for the regression available? Might be easier to generate the graphs that I’m thinking of then describe them.
[EDIT: I was confused when I wrote the earlier comment, I thought Vivek was talking about the decision square distance to the top 5x5 corner, which I do think my naive guess is plausible for; I don’t have the same guess about cheese Euclidean distance to top right corner.]
Thanks for the good thoughts and questions on this! We’re taking a closer look at the behavioral statistics modeling, and here are some heatmaps that visualize the “cheese Euclidean distance to top-right corner” metric’s relationship with the chance of successful cheese-finding.
These plots show the frequency of cheese-finding over 10k random mazes (sampled from the “maze has a decision square” distribution) vs the x/y offset from the top-right corner to the cheese location. The raw data is shown, plus a version binned into 5x5 patches to get more samples in each bin. The bin counts are also plotted for reference. (The unequal sampling is expected, as all maze sizes can have small cheese-corner offsets, but only large mazes can have large offsets. The smallest 5x5 bin by count has 35 data points).
We can see a pretty clear relationship between cheese-corner offset and probability of finding the cheese, with the expected perfect performance in the top-right 5x5 patch that was the only allowed cheese location during the training of this particular agent. But the relationship is non-linear, and of cause doesn’t provide direct evidence of causality.
I’m also lightly surprised by the strength of the relationship, but not because of the convolutional layers. It seems like if “convolutional layers apply the same filter everywhere” makes me surprised by the cheese-distance influence, it should also make me be surprised by “the mouse behaves differently in a dead-end versus a long corridor” or “the mouse tends to go to the top-right.”
(I have some sense of “maybe I’m not grappling with Vivek’s reasons for being surprised”, so feel free to tell me if so!)
Any idea why “cheese Euclidean distance to top-right corner” is so important? It’s surprising to me because the convolutional layers should apply the same filter everywhere.
My naive guess is that the other relationships are nonlinear, and this is the best way to approximate those relationships out of just linear relationships of the variables the regressor had access to.
Hm, what do you mean by “other relationships”? Is your guess that “cheese Euclidean distance to top-right” is a statistical artifact, or something else?
If so—I’m quite confident that relationship isn’t an artifact (although I don’t strongly believe that the network is literally modulating its decisions on the basis of this exact formalization). For example, see footnote 4. I’d also be happy to generate additional vector field visualizations in support of this claim.
Is the dataset you used for the regression available? Might be easier to generate the graphs that I’m thinking of then describe them.
[EDIT: I was confused when I wrote the earlier comment, I thought Vivek was talking about the decision square distance to the top 5x5 corner, which I do think my naive guess is plausible for; I don’t have the same guess about cheese Euclidean distance to top right corner.]
Here’s a colab notebook (it takes a while to load the data, be warned). We’ll have a post out later.
Yeah, we’ll put up additional notebooks/resources/datasets soon.
Thanks for the good thoughts and questions on this! We’re taking a closer look at the behavioral statistics modeling, and here are some heatmaps that visualize the “cheese Euclidean distance to top-right corner” metric’s relationship with the chance of successful cheese-finding.
These plots show the frequency of cheese-finding over 10k random mazes (sampled from the “maze has a decision square” distribution) vs the x/y offset from the top-right corner to the cheese location. The raw data is shown, plus a version binned into 5x5 patches to get more samples in each bin. The bin counts are also plotted for reference. (The unequal sampling is expected, as all maze sizes can have small cheese-corner offsets, but only large mazes can have large offsets. The smallest 5x5 bin by count has 35 data points).
We can see a pretty clear relationship between cheese-corner offset and probability of finding the cheese, with the expected perfect performance in the top-right 5x5 patch that was the only allowed cheese location during the training of this particular agent. But the relationship is non-linear, and of cause doesn’t provide direct evidence of causality.
I’m also lightly surprised by the strength of the relationship, but not because of the convolutional layers. It seems like if “convolutional layers apply the same filter everywhere” makes me surprised by the cheese-distance influence, it should also make me be surprised by “the mouse behaves differently in a dead-end versus a long corridor” or “the mouse tends to go to the top-right.”
(I have some sense of “maybe I’m not grappling with Vivek’s reasons for being surprised”, so feel free to tell me if so!)