It sounds like you want some formalisation of theory/model selection in science, i.e., a formalisation of epistemology.
“It’s very unclear what it would look like to be able to prove that you understand your model”—in this phrase, the word “prove” rings a bell for me, because since Popper we know that there are only falsifications in science but not proofs.
Formalisation of science cannot escape also “solving” cognitive science, because science is something scientists do. But here, I think there is no way to escape model subjectivity. When you apply a certain framework to understand the AI model, you should also apply it, reflectively, to oneself. This is the “meta-theoretical move”. E.g., (inverse) RL framework is doing it. Free Energy Principle, too (“On the Map-Territory Fallacy Fallacy”). There may be some formal ways to map between RL and FEP using category theory, but this isn’t a priori true for any other theory of cognition whatsoever.
FEP literature asserts that it’s a “canonical” approach to modelling intelligence because it recovers maximum-entropy posterior expectation of behaviour. However, by construction, this is a story of inductivism and instrumentalism and thus doesn’t deal with out-of-distribution generalisation (or, one can say, deals with OoD generalisation in an average case rather than worst case way).
It sounds like you want some formalisation of theory/model selection in science, i.e., a formalisation of epistemology.
“It’s very unclear what it would look like to be able to prove that you understand your model”—in this phrase, the word “prove” rings a bell for me, because since Popper we know that there are only falsifications in science but not proofs.
Formalisation of science cannot escape also “solving” cognitive science, because science is something scientists do. But here, I think there is no way to escape model subjectivity. When you apply a certain framework to understand the AI model, you should also apply it, reflectively, to oneself. This is the “meta-theoretical move”. E.g., (inverse) RL framework is doing it. Free Energy Principle, too (“On the Map-Territory Fallacy Fallacy”). There may be some formal ways to map between RL and FEP using category theory, but this isn’t a priori true for any other theory of cognition whatsoever.
FEP literature asserts that it’s a “canonical” approach to modelling intelligence because it recovers maximum-entropy posterior expectation of behaviour. However, by construction, this is a story of inductivism and instrumentalism and thus doesn’t deal with out-of-distribution generalisation (or, one can say, deals with OoD generalisation in an average case rather than worst case way).