I started off with writing distillations as was recommended and suggested and I wrote one on the first post of “Risks From Learned Optimization” by Evan Hubinger et al.
FWIW, I consider Risks From Learned Optimization to itself be a distillation—it distills a cluster of mental models which were already used by many people in alignment at the time. I also consider it one of the highest-value-add distillations in alignment to date, and a central example of what a really good distillation can achieve.
That’s fascinating, as someone new to the landscape. I found the sequence particularly thick, each sentence heavy when I read it the first time, right off the bat. Months later, it’s much easier on the eyes. Do you think it could use any further help in getting it out? (Also, do you have any examples of other good existing distillations as well, out of curiosity?)
FWIW, I consider Risks From Learned Optimization to itself be a distillation—it distills a cluster of mental models which were already used by many people in alignment at the time. I also consider it one of the highest-value-add distillations in alignment to date, and a central example of what a really good distillation can achieve.
That’s fascinating, as someone new to the landscape. I found the sequence particularly thick, each sentence heavy when I read it the first time, right off the bat. Months later, it’s much easier on the eyes. Do you think it could use any further help in getting it out? (Also, do you have any examples of other good existing distillations as well, out of curiosity?)