The fixed point problem is worse than you think. Take the Hungarian astrology example, with an initial easy set with both a length limitation (e.g. < 100k characters) and simplicity limitation.
Now I propose a very simple improvement scheme: If the article ends in a whitespace character, then try to classify the shortened article with last character removed.
This gives you an infinite sequence of better and better decision boundaries (each time, a couple of new cases are solved—the ones that are of lenth 100k + $N$, end in at least $N$ whitespace, and are in the easy set once the whitespace has been stripped). This nicely converges to the classifier that trims all trailing whitespace and then asks its initial classifier.
What I’m trying to say here is: The space of cases to consider can be large in many dimensions. The countable limit of a sequence of extensions needs not be a fixed point of the magical improvement oracle.
Generally, I’d go into a different direction: Instead of arguing about iterated improvement, argue that of course you cannot correctly extrapolate all decision problems from a limited amount of labeled easy cases and limited context. The style of counter-example is to construct two settings (“models” in the lingo of logic) A and B with same labeled easy set (and context made available to the classifier), where the correct answer for some datapoint x differs in both settings. Hence, safe extrapolation must always conservatively answer NO to x, and cannot be expected to answer all queries correctly from limited training data (typical YES / NO / MAYBE split).
I think the discussion about the fixed point or limit iterative improvement does not lead to the actually relevant argument that extrapolation cannot conjure information out of nowhere?
You could cut it out completely without weakening the argument against certain types of automated ontology identification being impossible.
The space of cases to consider can be large in many dimensions. The countable limit of a sequence of extensions needs not be a fixed point of the magical improvement oracle.
Indeed. We may need to put a measure on the set of cases and make a generalization guarantee that refers to solving X% of remaining cases. That would be a much stronger generalization guarantee.
The style of counter-example is to construct two settings (“models” in the lingo of logic) A and B with same labeled easy set (and context made available to the classifier), where the correct answer for some datapoint x differs in both settings
I appreciate the suggestion but I think that line of argument would also conclude that statistical learning is impossible, no? When I give a classifier a set of labelled cat and dog images and ask it to classify which are cats and which are dogs, it’s always possible that I was really asking some question that was not exactly about cats versus dogs, but in practice it’s not like that.
Also, humans do communicate about concepts with one another, and they eventually “get it” with respect to each other’s concept boundaries, and it’s possible to see that someone “got it” and trust that they now have the same concept that I do. So it seems possible to learn concepts in a trustworthy way from very small datasets, though it’s not a very “black box” kind of phenomenon.
The fixed point problem is worse than you think. Take the Hungarian astrology example, with an initial easy set with both a length limitation (e.g. < 100k characters) and simplicity limitation.
Now I propose a very simple improvement scheme: If the article ends in a whitespace character, then try to classify the shortened article with last character removed.
This gives you an infinite sequence of better and better decision boundaries (each time, a couple of new cases are solved—the ones that are of lenth 100k + $N$, end in at least $N$ whitespace, and are in the easy set once the whitespace has been stripped). This nicely converges to the classifier that trims all trailing whitespace and then asks its initial classifier.
What I’m trying to say here is: The space of cases to consider can be large in many dimensions. The countable limit of a sequence of extensions needs not be a fixed point of the magical improvement oracle.
Generally, I’d go into a different direction: Instead of arguing about iterated improvement, argue that of course you cannot correctly extrapolate all decision problems from a limited amount of labeled easy cases and limited context. The style of counter-example is to construct two settings (“models” in the lingo of logic) A and B with same labeled easy set (and context made available to the classifier), where the correct answer for some datapoint x differs in both settings. Hence, safe extrapolation must always conservatively answer NO to x, and cannot be expected to answer all queries correctly from limited training data (typical YES / NO / MAYBE split).
I think the discussion about the fixed point or limit iterative improvement does not lead to the actually relevant argument that extrapolation cannot conjure information out of nowhere?
You could cut it out completely without weakening the argument against certain types of automated ontology identification being impossible.
Indeed. We may need to put a measure on the set of cases and make a generalization guarantee that refers to solving X% of remaining cases. That would be a much stronger generalization guarantee.
I appreciate the suggestion but I think that line of argument would also conclude that statistical learning is impossible, no? When I give a classifier a set of labelled cat and dog images and ask it to classify which are cats and which are dogs, it’s always possible that I was really asking some question that was not exactly about cats versus dogs, but in practice it’s not like that.
Also, humans do communicate about concepts with one another, and they eventually “get it” with respect to each other’s concept boundaries, and it’s possible to see that someone “got it” and trust that they now have the same concept that I do. So it seems possible to learn concepts in a trustworthy way from very small datasets, though it’s not a very “black box” kind of phenomenon.