Well there’s Goldberg, Lewis R. “Five models of clinical judgment: An empirical comparison between linear and nonlinear representations of the human inference process.” Organizational Behavior and Human Performance 6.4 (1971): 458-479.
The main thing is that these old papers seem to still be considered valid, see eg Shanteau, James. “How much information does an expert use? Is it relevant?.” Acta Psychologica 81.1 (1992): 75-86.
(It would be nice if you would link fulltext instead of providing citations; if you don’t have access to the fulltext, it’s a bad idea to cite it, and if you do, you should provide it for other people who are trying to evaluate your claims and whether the paper is relevant or wrong.)
A third possibility is that incorrect methods were used to measure the amount of information in experts’ judgments; use of the “correct” measurement method might support the Information-Use Hypothesis. In the studies reported here, four techniques were used to measure information use: protocol analysis, multiple regression analysis, analysis of variance, and self-ratings by judges. Despite differences in measurement methods, comparable results were reported. Other methodological issues might be raised, but the studies seem varied enough to rule out any artifactual explanation.
These aren’t very good methods for extracting the full measure of information.
So to summarize: reality isn’t entirely linear, so nonlinear methods frequently excel with modern developments to regularize and avoid overfitting (we can see this in the low prevalence of linear methods in demanding AI tasks like image recognition, or more generally, competitions like Kaggle on all sorts of domains); to the extent that humans are good predictors and classifiers too of reality, their predictions/classifications will be better mimicked by nonlinear methods; research showing the contrary typically does not compare very good methods and much more recent research may do much better (for example, parole/recidivism predictions by parole boards may be bad and easily improved on by linear models, but does that mean algorithms can’t do even better?), and to the extent linear methods succeed, it may reflect the lack of relevant data or inherent randomness of results for a particular cherrypicked task.
I tend to agree with you about models, once overfitting is sorted.
to the extent that humans are good predictors and classifiers too of reality, their predictions/classifications will be better mimicked by nonlinear methods
Well there’s Goldberg, Lewis R. “Five models of clinical judgment: An empirical comparison between linear and nonlinear representations of the human inference process.” Organizational Behavior and Human Performance 6.4 (1971): 458-479.
The main thing is that these old papers seem to still be considered valid, see eg Shanteau, James. “How much information does an expert use? Is it relevant?.” Acta Psychologica 81.1 (1992): 75-86.
(It would be nice if you would link fulltext instead of providing citations; if you don’t have access to the fulltext, it’s a bad idea to cite it, and if you do, you should provide it for other people who are trying to evaluate your claims and whether the paper is relevant or wrong.)
I’ve put up the first paper at https://dl.dropboxusercontent.com/u/85192141/1971-goldberg.pdf / https://pdf.yt/d/Ux7RZXbo0n374dUU I don’t think this is particularly relevant: it only shows that 2 very specific equations (pg4, #3 & #4) did not outperform the linear model on a particular dataset. Too bad for Einhorn 1971.
Your second paper doesn’t support the claims:
These aren’t very good methods for extracting the full measure of information.
So to summarize: reality isn’t entirely linear, so nonlinear methods frequently excel with modern developments to regularize and avoid overfitting (we can see this in the low prevalence of linear methods in demanding AI tasks like image recognition, or more generally, competitions like Kaggle on all sorts of domains); to the extent that humans are good predictors and classifiers too of reality, their predictions/classifications will be better mimicked by nonlinear methods; research showing the contrary typically does not compare very good methods and much more recent research may do much better (for example, parole/recidivism predictions by parole boards may be bad and easily improved on by linear models, but does that mean algorithms can’t do even better?), and to the extent linear methods succeed, it may reflect the lack of relevant data or inherent randomness of results for a particular cherrypicked task.
To show your original claim (“in many fields, linear models (even poor ones) are the best we’re going to get, with more complex models losing to overfitting”), I would want to see linear models steadily beat all comers, from random forests to deep neural networks to ensembles of all of the above, on a wide variety of large datasets. I don’t think you can show that.
I tend to agree with you about models, once overfitting is sorted.
This I’ve still seen no evidence for.