This reminds me a little bit of the posts on anti-memes. There’s a way in which people are constantly updating their worldviews based on personal experience that
is useless in discussion because people tend not to update on other people’s personal experience over their own,
is personally risky in adversarial contexts because personal information facilitates manipulation
is socially costly because the personal experience that people tend to update on is usually the kind of emotionally intense stuff that is viewed as inappropriate in ordinary conversation
And this means that there are a lot of ideas and worldviews produced by The Statistics which are never discussed or directly addressed in polite society. Instead, these emerge indirectly through particular beliefs which really on arguments that obfuscate the reality.
Not only is this hard to avoid on a civilizational level; it’s hard to avoid on a personal level: rational agents will reach inaccurate conclusions in adversarial (ie unlucky) environments.
When I first looked at these plots, I thought “ahhh, the top one has two valleys and the bottom one has two peaks. So, accounting for one reflecting error and the other reflecting accuracy, they capture the same behavior.” But this isn’t really what’s happening.
Comparing these plots is a little tricky. For instance, the double-descent graph shows two curves—“train error” (which can be interpreted as lack of confidence in model performance) and “test error” (which can be interpreted as lack of actual performance/lack of wisdom). Analogizing the double-descent curve to Dunning Kruger might be easier if one just plots “test error” on the y-axis and “train error” on the x-axis. Or better yet 1-error for both axes.
But actually trying to dig into the plots in this way is confusing. In the underfitted regime, there’s a pretty high level of knowledge (ie test error near the minimum value) withpretty low confidence (ie train error far from zero). In the overfitted regime, we then get double-descent into a higher level of knowledge (ie test error at the minimum) but now with extremely high confidence. Maybe we can tentatively interpret these minima as the “valley of despair” and “slope of enlightenment” but
In both cases, our train error is lower than our test error—implying a disproportionate amount of confidence all the time. This is not consistent with the Dunning-Kruger effect
The “slope of enlightenment” especially has way more unjustified confidence (ie train error near zero) despite still having some objectively pretty high test error (around 0.3). This is not consistent with the Dunning-Kruger effect
We see the same test error associated with both a high train error (in the underfit regime) and with a low train error (in the overfit regime). The Dunning-Kruger effect doesn’t capture the potential for different levels of confidence at the same level of wisdom
To me, the above deviations from Dunning-Kruger make sense. My mechanistic understanding of the effect is that it appears in fields of knowledge that are vast, but whose vastness can only be explored by those with enough introductory knowledge. So what happens is
You start out learning something new and you’re not confident
You master the introductory material and feel confident that you get things
You now realize that your introductory understanding gives you a glimpse into the vast frontier of the subject
Exposure to this vast frontier reduces your confidence
But as you explore it, both your understanding and confidence rise again
And this process can’t really be captured in a set-up with a fixed train and test set. Maybe it could show up in reinforcement learning though since exploration is possible.