Paul Christiano and Ought use the terminology of Distillation and Amplification to describe a high-level algorithm of one type of AI reasoning.
I’ve wanted to come up with an analogy to forecasting systems. I previously named a related concept Prediction-Augmented Evaluation Systems, one somewhat renamed to “Amplification” by Jacobjacob in this post.
I think one thing that’s going on is that “distillation” doesn’t have an exact equivalent with forecasting setups. The term “distillation” comes with the assumptions:
The “Distilled” information is compressed.
Once something is distilled, it’s trivial to execute.
I believe that (1) isn’t really necessary, and (2) doesn’t apply for other contexts.
A different proposal: Instillation, Proliferation, Amplification
In this proposal, we split the “distillation” step into “instillation” and “proliferation”. Instillation refers to the learning of system A into system B. Proliferation refers to the use of system B to apply this learning to various things in a straightforward manner. Amplification refers to the ability of either system A or system B to be able to spend marginal resources to marginally improve a specific estimate or knowledge set.
For instance, in a Prediction-Augmentation Evaluation System, imagine that “Evaluation Procedure A” is to rate movies on a 1-10 scale.
Instillation
Some acquisition process is done to help “Forecasting Team B” learn how “Evaluation Procedure A” does its’ evaluations.
Proliferation
“Forecasting Team B” now applies their understanding of the evaluations of “Evaluation Procedure A” to evaluate 10,000 movies.
Amplification
If there are movies that are particularly important to evaluate well, then there are specific methods available to do so.
I think this is a more complex but generic pattern. Instillation seems purely more generic than distillation, and proliferation like an important aspect that sometimes will be quite expensive.
Back to forecasting, instillation and proliferation are two different things and perhaps should eventually be studied separately. Instillation is about “can a group of forecasters learn & replicate an evaluation procedure”, and Proliferation is about “Can this group do that cost-effectively?”
Is there not a distillation phase in forecasting? One model of the forecasting process is person A builds up there model, distills a complicated question into a high information/highly compressed datum, which can then be used by others. In my mind its:
Model → Distill - > “amplify” (not sure if that’s actually the right word)
I prefer the term scalable instead of proliferation for “can this group do it cost-effectively” as it’s a similar concept to that in CS.
My main point here is that distillation is doing 2 things: transitioning knowledge (from training data to a learned representation), and then compressing that knowledge.[1] The fact that it’s compressed in some ways arguably isn’t always particularly important; the fact that it’s transferred is the main element. If a team of forecasters basically learned a signal, but did so in a very uncompressed way (like, they wrote a bunch of books about said signal), but still were somewhat cost-effective, I think that would be fine.
Around “Profileration” vs. “Scaling”; I’d be curious if there are better words out there. I definitely considered scaling, but it sounds less concrete and less specific. To “proliferate” means “to generate more of”, but to “scale” could mean, “to make look bigger, even if nothing is really being done.”
I think my cynical guess is that “instillation/proliferation” won’t catch on because they are too uncommon, but also that “distillation” won’t catch on because it feels like a stretch from the ML use case. Could use more feedback here.
[1] Interestingly, there seem to be two distinct stages in Deep Learning that map to these two different things, according to Naftali Tishby’s claims.
Instillation, Proliferation, Amplification
Paul Christiano and Ought use the terminology of Distillation and Amplification to describe a high-level algorithm of one type of AI reasoning.
I’ve wanted to come up with an analogy to forecasting systems. I previously named a related concept Prediction-Augmented Evaluation Systems, one somewhat renamed to “Amplification” by Jacobjacob in this post.
I think one thing that’s going on is that “distillation” doesn’t have an exact equivalent with forecasting setups. The term “distillation” comes with the assumptions:
The “Distilled” information is compressed.
Once something is distilled, it’s trivial to execute.
I believe that (1) isn’t really necessary, and (2) doesn’t apply for other contexts.
A different proposal: Instillation, Proliferation, Amplification
In this proposal, we split the “distillation” step into “instillation” and “proliferation”. Instillation refers to the learning of system A into system B. Proliferation refers to the use of system B to apply this learning to various things in a straightforward manner. Amplification refers to the ability of either system A or system B to be able to spend marginal resources to marginally improve a specific estimate or knowledge set.
For instance, in a Prediction-Augmentation Evaluation System, imagine that “Evaluation Procedure A” is to rate movies on a 1-10 scale.
Instillation
Some acquisition process is done to help “Forecasting Team B” learn how “Evaluation Procedure A” does its’ evaluations.
Proliferation
“Forecasting Team B” now applies their understanding of the evaluations of “Evaluation Procedure A” to evaluate 10,000 movies.
Amplification
If there are movies that are particularly important to evaluate well, then there are specific methods available to do so.
I think this is a more complex but generic pattern. Instillation seems purely more generic than distillation, and proliferation like an important aspect that sometimes will be quite expensive.
Back to forecasting, instillation and proliferation are two different things and perhaps should eventually be studied separately. Instillation is about “can a group of forecasters learn & replicate an evaluation procedure”, and Proliferation is about “Can this group do that cost-effectively?”
Is there not a distillation phase in forecasting? One model of the forecasting process is person A builds up there model, distills a complicated question into a high information/highly compressed datum, which can then be used by others. In my mind its:
Model → Distill - > “amplify” (not sure if that’s actually the right word)
I prefer the term scalable instead of proliferation for “can this group do it cost-effectively” as it’s a similar concept to that in CS.
Distillation vs. Instillation
My main point here is that distillation is doing 2 things: transitioning knowledge (from training data to a learned representation), and then compressing that knowledge.[1] The fact that it’s compressed in some ways arguably isn’t always particularly important; the fact that it’s transferred is the main element. If a team of forecasters basically learned a signal, but did so in a very uncompressed way (like, they wrote a bunch of books about said signal), but still were somewhat cost-effective, I think that would be fine.
Around “Profileration” vs. “Scaling”; I’d be curious if there are better words out there. I definitely considered scaling, but it sounds less concrete and less specific. To “proliferate” means “to generate more of”, but to “scale” could mean, “to make look bigger, even if nothing is really being done.”
I think my cynical guess is that “instillation/proliferation” won’t catch on because they are too uncommon, but also that “distillation” won’t catch on because it feels like a stretch from the ML use case. Could use more feedback here.
[1] Interestingly, there seem to be two distinct stages in Deep Learning that map to these two different things, according to Naftali Tishby’s claims.