In fact, we might be able to speed up the learning of common features even further:
Pierre Peigné at SERIMATS has done some interesting work that looks at initialization schemes that speed up learning. If you initialize the autoencoders with a sample of datapoints (e.g. initialize the weights with a sample from the MLP activations dataset), each of which we assume to contain a linear combination of only a few of the ground truth features, then the initial phases of feature recovery is much faster*. We haven’t had time to check, but it’s presumably biased to recover the most common features first since they’re the most likely to be in a given data point.
*The ground truth feature recovery metric (MMCS) starts higher at the beginning of autoencoder training, but converges to full recovery at about the same time.
I strongly suspect this is the case too!
In fact, we might be able to speed up the learning of common features even further:
Pierre Peigné at SERIMATS has done some interesting work that looks at initialization schemes that speed up learning. If you initialize the autoencoders with a sample of datapoints (e.g. initialize the weights with a sample from the MLP activations dataset), each of which we assume to contain a linear combination of only a few of the ground truth features, then the initial phases of feature recovery is much faster*. We haven’t had time to check, but it’s presumably biased to recover the most common features first since they’re the most likely to be in a given data point.
*The ground truth feature recovery metric (MMCS) starts higher at the beginning of autoencoder training, but converges to full recovery at about the same time.