Nice project and writeup. I particularly liked the walkthrough of thought processes throughout the project
Decision square’s Euclidean distance to the top-right 5×5 corner, positive (+1.326).
We are confused and don’t fully understand which logical interactions produce this positive regression coefficient.
I’d be weary about interpreting the regression coefficients of features that are correlated (see Multicollinearity). Even the sign may be misleading.
It might be worth making a cross-correlation plot of the features. This won’t give you a new coefficients to put faith in, but it might help you decide how much to trust the ones you have. It can also be useful looking at how unstable the coefficients are during training (or e.g. when trained on a different dataset).
TL;DR You raise a reasonable worry, but the three key variables[1] have stable signs and seem like legit decision-making factors. The variable you quote indeed seems to be a statistical artifact, as we speculated.[2]
There is indeed a strong correlation between two[3] of our highly predictive variables:
We computed the variation inflation factors for the three predictive variables. VIF measures how collinearity increases the variance of the regression coefficients. A score exceeding 4 is considered to be a warning sign of multicollinearity.
Attribute
VIF
Euclidean distance between cheese and top-right square
1.05
Steps between cheese and decision-square
4.64
Euclidean distance between cheese and decision-square
4.66
So we’re at risk here. However, we re-isolated these three variables as both:
Predictively useful on their own, and
No/extremely rare sign-flipping when regressing upon randomly selected subsets of variables.
Considering a range of regressions on a range of train/validation splits, these variables have stable regression coefficient signs and somewhat stable coefficient magnitudes. (Although we don’t mean for our analysis to be predicated on the magnitudes themselves; we know these are unreliable and contingent quantities!)
Furthermore, we regressed upon 200 random subsets of our larger set of variables, and the cheese/decision-square distance regression coefficients never experienced a sign flip. The cheese/top-right Euclidean distance had a few sign flips. The other variables sign-flip frequently.
We reran this analysis for a second dataset of 10,000 trajectories, and the analysis was the same, with the exception of dEuclidean(decision-square,cheese) failing to be predictive in certain regressions in the second dataset. Not sure what’s up with that.
So overall I’m not worried about the signs of these variables.
Nice project and writeup. I particularly liked the walkthrough of thought processes throughout the project
I’d be weary about interpreting the regression coefficients of features that are correlated (see Multicollinearity). Even the sign may be misleading.
It might be worth making a cross-correlation plot of the features. This won’t give you a new coefficients to put faith in, but it might help you decide how much to trust the ones you have. It can also be useful looking at how unstable the coefficients are during training (or e.g. when trained on a different dataset).
We just posted Behavioral statistics for a maze-solving agent.
TL;DR You raise a reasonable worry, but the three key variables[1] have stable signs and seem like legit decision-making factors. The variable you quote indeed seems to be a statistical artifact, as we speculated.[2]
There is indeed a strong correlation between two[3] of our highly predictive variables:
We computed the variation inflation factors for the three predictive variables. VIF measures how collinearity increases the variance of the regression coefficients. A score exceeding 4 is considered to be a warning sign of multicollinearity.
So we’re at risk here. However, we re-isolated these three variables as both:
Predictively useful on their own, and
No/extremely rare sign-flipping when regressing upon randomly selected subsets of variables.
Considering a range of regressions on a range of train/validation splits, these variables have stable regression coefficient signs and somewhat stable coefficient magnitudes. (Although we don’t mean for our analysis to be predicated on the magnitudes themselves; we know these are unreliable and contingent quantities!)
Furthermore, we regressed upon 200 random subsets of our larger set of variables, and the cheese/decision-square distance regression coefficients never experienced a sign flip. The cheese/top-right Euclidean distance had a few sign flips. The other variables sign-flip frequently.
We reran this analysis for a second dataset of 10,000 trajectories, and the analysis was the same, with the exception of dEuclidean(decision-square,cheese) failing to be predictive in certain regressions in the second dataset. Not sure what’s up with that.
So overall I’m not worried about the signs of these variables.
The three key variables being: Euclidean and path distances from decision square to cheese, and Euclidean distance from cheese to top-right corner:
Dark blue is +1 correlation, dark red is −1:
Thanks for sharing that analysis, it is indeed reassuring!