I don’t want to get into the whole CW thing around this topic, *but*:
1. Since you so off handedly decided not to use p values, why do you:
a) Use linear models for the analysis provided such low r2 scores
b) Why use r2 at all ? Does it seem meaningful for this case, otherwise, if your whole shtik is being intuitive why not use mae or even some pct based error ?
c) Are you overfitting those regression models instead of doing cross validation ?
d) If the answer to c is no, then: provide nr of folds and variation of the coefficients given the folds, this is an amazing messure to determine a confidence value regarding the coeficient associated not being spurious (i.e of the variation is 0.001-0.1 then that means said coeficient is just overfitting on noise).
f) If the answer is no, why ? I mean, cross validation is basically required for this kind of analysis, if you’re just overfitting your whole dataset that basically makes the rest of your analysis invalid, you’re just finding noise that can be approximated using a linear function summation.
Also, provided the small effect sizes you found, why consider the data relevant at all ?
If anything this analysis shows that all the metrics you care about depends mostly on some hidden variable neither you nor the pseudoscientists you are responding to have found.
Maybe missing something here though, it’s 3:30am here, so do let me know if I’m being uncharitable here or underspecfying some of my questions/contention-points.
I don’t want to get into the whole CW thing around this topic, *but*:
1. Since you so off handedly decided not to use p values, why do you:
a) Use linear models for the analysis provided such low r2 scores
b) Why use r2 at all ? Does it seem meaningful for this case, otherwise, if your whole shtik is being intuitive why not use mae or even some pct based error ?
c) Are you overfitting those regression models instead of doing cross validation ?
d) If the answer to c is no, then: provide nr of folds and variation of the coefficients given the folds, this is an amazing messure to determine a confidence value regarding the coeficient associated not being spurious (i.e of the variation is 0.001-0.1 then that means said coeficient is just overfitting on noise).
f) If the answer is no, why ? I mean, cross validation is basically required for this kind of analysis, if you’re just overfitting your whole dataset that basically makes the rest of your analysis invalid, you’re just finding noise that can be approximated using a linear function summation.
Also, provided the small effect sizes you found, why consider the data relevant at all ?
If anything this analysis shows that all the metrics you care about depends mostly on some hidden variable neither you nor the pseudoscientists you are responding to have found.
Maybe missing something here though, it’s 3:30am here, so do let me know if I’m being uncharitable here or underspecfying some of my questions/contention-points.