Well just so you know, the point of the write-up is that iteration makes no sense. We are saying “hey suppose you have an automated ontology identifier with a safety guarantee and a generalization guarantee, then uh oh it looks like this really counter-intuitive iteration thing becomes possible”.
However it’s not quite as simple as to rule out iteration as appealing to conservation of expected evidence, because it’s not clear exactly how much evidence is in the training data. Perhaps there is enough information in the training data to extrapolate all the way to C. In this case the iteration scheme would just be a series of computational steps that implement a single Bayes update. Yet for the reasons discussed under “implications” I don’t think this is reasonable.
Well just so you know, the point of the write-up is that iteration makes no sense.
True, not sure what I was thinking when I wrote the last sentence of my comment.
“hey suppose you have an automated ontology identifier with a safety guarantee and a generalization guarantee, then uh oh it looks like this really counter-intuitive iteration thing becomes possible”
For an automated ontology identifier with a possible safety guarantee (like 99.9% certainty), I don’t agree with your intuition that iteration seems like it could work significantly better than just doing predictions with the original training set. Iteration simply doesn’t seem promising to me, but maybe I’m overlooking something.
If your intuition that iteration might work doesn’t come from the sense that the new predicted training examples are basically certain (as I described in the main comment of that comment thread), then where does it come from? (I do still think that you are probably confused because of the reason I described, but maybe I’m wrong and there is another reason.)
Perhaps there is enough information in the training data to extrapolate all the way to C. In this case the iteration scheme would just be a series of computational steps that implement a single Bayes update.
Actually, in the case that the training data includes enough information to extrapolate all the way to C (which I think is rarely the case for most applications), it does seem plausible to me that the iteration approach finds the perfect decision boundary, but in this case, it seems also plausible to me that a normal classifier that only uses extrapolation from the training set also finds the perfect boundary.
I don’t see a reason why a normal classifier should perform a lot worse than an optimal Bayes update from the training set. Do you think it does perform a lot worse, and if so, why? (If we don’t think that it performs much worse than optimal, then it quite trivially follows that the iteration approach cannot be much better, since it cannot be better than the optimal Bayes error.)
Well just so you know, the point of the write-up is that iteration makes no sense. We are saying “hey suppose you have an automated ontology identifier with a safety guarantee and a generalization guarantee, then uh oh it looks like this really counter-intuitive iteration thing becomes possible”.
However it’s not quite as simple as to rule out iteration as appealing to conservation of expected evidence, because it’s not clear exactly how much evidence is in the training data. Perhaps there is enough information in the training data to extrapolate all the way to C. In this case the iteration scheme would just be a series of computational steps that implement a single Bayes update. Yet for the reasons discussed under “implications” I don’t think this is reasonable.
True, not sure what I was thinking when I wrote the last sentence of my comment.
For an automated ontology identifier with a possible safety guarantee (like 99.9% certainty), I don’t agree with your intuition that iteration seems like it could work significantly better than just doing predictions with the original training set. Iteration simply doesn’t seem promising to me, but maybe I’m overlooking something.
If your intuition that iteration might work doesn’t come from the sense that the new predicted training examples are basically certain (as I described in the main comment of that comment thread), then where does it come from? (I do still think that you are probably confused because of the reason I described, but maybe I’m wrong and there is another reason.)
Actually, in the case that the training data includes enough information to extrapolate all the way to C (which I think is rarely the case for most applications), it does seem plausible to me that the iteration approach finds the perfect decision boundary, but in this case, it seems also plausible to me that a normal classifier that only uses extrapolation from the training set also finds the perfect boundary.
I don’t see a reason why a normal classifier should perform a lot worse than an optimal Bayes update from the training set. Do you think it does perform a lot worse, and if so, why? (If we don’t think that it performs much worse than optimal, then it quite trivially follows that the iteration approach cannot be much better, since it cannot be better than the optimal Bayes error.)