then the irrelevant variables will be randomly assigned to +1 or −1 weights and will on average cancel out, leaving the signal from the relevant variables who do not cancel each other out.
This will seriously degrade the signal. Normally there are only a few key variables, so adding more random ones with similar will increase the amount of spurious results.
Adding in more irrelevant variables does change things quantitatively by lowering power due to increased variance and requiring more data, but I don’t see how this leads to any qualitative transition from working to not working such that it might explain why they work.
I don’t think this is true. All the useful weights are set to +1 or −1 by expert assessment, and the non-useful weights are just noise. Why would more data be required?
Yes, but again, where is the qualitative difference? In what sense does this explain the performance of improper linear models versus human experts? Why does the subtle difference between a model based on an ‘enriched’ set of variables and a model based on a non-enriched-but-slightly-worse ‘explain’ how they perform better than humans?
? I’m not sure what you’re asking for. The basic points are a) experts are bad integrating information, and b) experts are good at selecting important variables of roughly equal importance, c) these variables are often highly correlated.
a) explains why experts are bad (as in worse than proper linear models), b) and c) explain why improper linear models might perform not too far off proper linear models (and hence be better than experts).
This will seriously degrade the signal. Normally there are only a few key variables, so adding more random ones with similar will increase the amount of spurious results.
ie making the model worse.
I don’t think this is true. All the useful weights are set to +1 or −1 by expert assessment, and the non-useful weights are just noise. Why would more data be required?
Yes, but again, where is the qualitative difference? In what sense does this explain the performance of improper linear models versus human experts? Why does the subtle difference between a model based on an ‘enriched’ set of variables and a model based on a non-enriched-but-slightly-worse ‘explain’ how they perform better than humans?
? I’m not sure what you’re asking for. The basic points are a) experts are bad integrating information, and b) experts are good at selecting important variables of roughly equal importance, c) these variables are often highly correlated.
a) explains why experts are bad (as in worse than proper linear models), b) and c) explain why improper linear models might perform not too far off proper linear models (and hence be better than experts).