Suppose I have some active learning setup, where I decide new points to investigate based on expected uncertainty reduction, or update to the model weights, or something. Then it seems like the internals of the model could be an example of these diagnostic prediction logs being relevant without having to have humans look at them. Then it seems like there might be competition among subnetworks to have the new training examples be places where they’ll do particularly well, or to somehow avoid areas where they’ll do poorly.
I have a hard time making this story one where this is a bug instead of a feature, tho; in order for a subnetwork to do particularly well, it has to know something about the real data-generating distribution that the rest of the model doesn’t. This only looks pathological if the thing that it knows is manufactured by the model, somehow. (Like, if I can write a fictional story and win trivia contests based on my fictional story, then I can hack points.)
Suppose I have some active learning setup, where I decide new points to investigate based on expected uncertainty reduction, or update to the model weights, or something. Then it seems like the internals of the model could be an example of these diagnostic prediction logs being relevant without having to have humans look at them. Then it seems like there might be competition among subnetworks to have the new training examples be places where they’ll do particularly well, or to somehow avoid areas where they’ll do poorly.
I have a hard time making this story one where this is a bug instead of a feature, tho; in order for a subnetwork to do particularly well, it has to know something about the real data-generating distribution that the rest of the model doesn’t. This only looks pathological if the thing that it knows is manufactured by the model, somehow. (Like, if I can write a fictional story and win trivia contests based on my fictional story, then I can hack points.)