Thanks for your reply! You are right of course. The argument being more about building up evidence.
But, after thinking about it some more , I see now that any evidence gathered with a single known feature of priorless models (like the one I mentioned) would be so minuscule (approaching 0[1]) that you’d need to combine unlikely amounts of features[2]. You’d end up in (a worse version of) an arms race akin to the ‘science it’ example/counterexample scenario mentioned in the report and thus it’s a dead end. By extension, all priorless models with or without a good training regime[3], with or without a good loss-function[4] , with or without some form of regularization[4], all of them are out.
So, I think it’s an interesting dead end. It reduces possible solutions to ELK to solutions that reduce the the ridge of good solutions with a prior imposed by/on the architecture or statistical features of the model. In other words, it requires either non-overparameterized models or a way to reduce the ridge of good solutions in overparameterized models. Of the latter I have only seen good descriptions[5] but no solutions[6] (but do let me know me if I missed something).
Thanks for your reply!
You are right of course. The argument being more about building up evidence.
But, after thinking about it some more , I see now that any evidence gathered with a single known feature of priorless models (like the one I mentioned) would be so minuscule (approaching 0[1]) that you’d need to combine unlikely amounts of features[2]. You’d end up in (a worse version of) an arms race akin to the ‘science it’ example/counterexample scenario mentioned in the report and thus it’s a dead end. By extension, all priorless models with or without a good training regime[3], with or without a good loss-function[4] , with or without some form of regularization[4], all of them are out.
So, I think it’s an interesting dead end. It reduces possible solutions to ELK to solutions that reduce the the ridge of good solutions with a prior imposed by/on the architecture or statistical features of the model. In other words, it requires either non-overparameterized models or a way to reduce the ridge of good solutions in overparameterized models. Of the latter I have only seen good descriptions[5] but no solutions[6] (but do let me know me if I missed something).
Do you agree?
assuming a blackbox, overparameterized model
Like finding a single needle in a infinite haystack. Even if your evidence hacks it in half you’ll be hacking away a long time.
in the case of the ELK-contest, due to not being able to sample outside the human understandable
because they would depend on the same (assumption of) evidence/feature
like this one
I know you can see the initialization of parameters as a prior, but I haven’t seen a meaningful prior