John_Maxwell comments on Alignment By Default

John_Maxwell 15 Aug 2020 10:15 UTC
LW: 8 AF: 3
0
AF
Some notes on the loss function in unsupervised learning:

Since an unsupervised learner is generally just optimized for predictive power

I think it’s worthwhile to distinguish the loss function that’s being optimized during unsupervised learning, vs what the practitioner is optimizing for. Yes, the loss function being optimized in an unsupervised learning system is frequently minimization of reconstruction error or similar. But when I search for “unsupervised learning review” on Google Scholar, I find this highly cited paper by Bengio et al. The abstract talks a lot about learning useful representations and says nothing about predictive power. In other words, learning “natural abstractions” appears to be pretty much the entire game from a practitioner perspective.

And in the same way supervised learning has dials such as regularization which let us control the complexity of our model, unsupervised learning has similar dials.

For clustering, we could achieve 0 reconstruction error (or equivalently, explain all the variation in the data) by putting every data point in its own cluster, but that would completely defeat the point. The elbow method is a well-known heuristic for figuring out what the “right” number of clusters in a dataset is.

Similarly, we could achieve 0 reconstruction error with an autoencoder by making the number of dimensions in the bottleneck be equal to the number of dimensions in the original input, but again, that would completely defeat the point. Someone on the Stats Stackexchange says that there is no standard way to select the number of dimensions for an autoencoder. (For reference, the standard way to select the regularization parameter which controls complexity in supervised learning would obviously be through cross-validation.) However, I suspect this is a tractable research problem.

It was interesting that you mentioned the noise of air molecules, because one unsupervised learning trick is to deliberately introduce noise into the input to see if the system has learned “natural” representations which allow it to reconstruct the original noise-free input. See denoising autoencoder. This is the kind of technique which might allow an autoencoder to learn natural representations even if the number of dimensions in the bottleneck is equal to the number of dimensions in the original input.

BTW, here’s an interesting-looking (pessimistic) paper I found while researching this comment: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

You brought up microscope AI. I think a promising research direction here may be to formulate a notion of “ease of interpretability” which can be added as an additional term to an unsupervised loss function (the same way we might, for example, add a term to a clustering algorithm’s loss function so that in addition to minimizing reconstruction error, it also seeks to minimize the number of clusters).

Hardcoding “human values” by hand is hopeless, but hardcoding “ease of human interpretability” by hand seems much more promising, since ease of human interpretability is likely to correspond to easily formalizable notions such as simplicity. Also, if your hardcoded notion of “ease of human interpretability” turns out to be slightly wrong, that’s not a catastrophe: you just get an ML model which is a bit harder to interpret than you might like.

Another option is to learn a notion of what constitutes an interpretable model by e.g. collecting “ease of interpretability” data from human microscope users.

Of course, one needs to be careful that any interpretability term does not get too much weight in the loss function, because if it does, we may stop learning the “natural” abstractions that we desire (assuming a worst-case scenario where human interpretability is anticorrelated with “naturalness”). The best approach may be to learn two models, one of which was optimized for interpretability and one of which wasn’t, and only allow our system to take action when the two models agree. I guess mesa-optimizers in the non-interpretable model are still a worry though.
What links here?
- John_Maxwell's comment on Testing The Natural Abstraction Hypothesis: Project Intro by johnswentworth (9 Apr 2021 5:40 UTC; 39 points)
- johnswentworth 15 Aug 2020 18:27 UTC
  LW: 6 AF: 3
  AF Parent
  This comment definitely wins the award for best comment on the post so far. Great ideas, highly relevant links.
  I especially like the deliberate noise idea. That plays really nicely with natural abstractions as information-relevant-far-away: we can intentionally insert noise along particular dimensions, and see how that messes with prediction far away (either via causal propagation or via loss of information directly). As long as most of the noise inserted is not along the dimensions relevant to the high-level abstraction, denoising should be possible. So it’s very plausible that denoising autoencoders are fairly-directly incentivized to learn natural abstractions. That’ll definitely be an interesting path to pursue further.
  Assuming that the denoising autoencoder objective more-or-less-directly incentivizes natural abstractions, further refinements on that setup could very plausibly turn into a useful “ease of interpretability” objective.
  - John_Maxwell 18 Aug 2020 9:31 UTC
    LW: 2 AF: 1
    AF Parent
    
    This comment definitely wins the award for best comment on the post so far.
    
    Thanks!
    
    I don’t consider myself an expert on the unsupervised learning literature by the way, I expect there is more cool stuff to be found.