My main point here is that distillation is doing 2 things: transitioning knowledge (from training data to a learned representation), and then compressing that knowledge.[1] The fact that it’s compressed in some ways arguably isn’t always particularly important; the fact that it’s transferred is the main element. If a team of forecasters basically learned a signal, but did so in a very uncompressed way (like, they wrote a bunch of books about said signal), but still were somewhat cost-effective, I think that would be fine.
Around “Profileration” vs. “Scaling”; I’d be curious if there are better words out there. I definitely considered scaling, but it sounds less concrete and less specific. To “proliferate” means “to generate more of”, but to “scale” could mean, “to make look bigger, even if nothing is really being done.”
I think my cynical guess is that “instillation/proliferation” won’t catch on because they are too uncommon, but also that “distillation” won’t catch on because it feels like a stretch from the ML use case. Could use more feedback here.
[1] Interestingly, there seem to be two distinct stages in Deep Learning that map to these two different things, according to Naftali Tishby’s claims.
Distillation vs. Instillation
My main point here is that distillation is doing 2 things: transitioning knowledge (from training data to a learned representation), and then compressing that knowledge.[1] The fact that it’s compressed in some ways arguably isn’t always particularly important; the fact that it’s transferred is the main element. If a team of forecasters basically learned a signal, but did so in a very uncompressed way (like, they wrote a bunch of books about said signal), but still were somewhat cost-effective, I think that would be fine.
Around “Profileration” vs. “Scaling”; I’d be curious if there are better words out there. I definitely considered scaling, but it sounds less concrete and less specific. To “proliferate” means “to generate more of”, but to “scale” could mean, “to make look bigger, even if nothing is really being done.”
I think my cynical guess is that “instillation/proliferation” won’t catch on because they are too uncommon, but also that “distillation” won’t catch on because it feels like a stretch from the ML use case. Could use more feedback here.
[1] Interestingly, there seem to be two distinct stages in Deep Learning that map to these two different things, according to Naftali Tishby’s claims.