I agree, but the point I’m making is that you had to know the labels in order to know where to detach the gradient. So it’s kind of like making something interpretable by imposing your interpretation on it, which I feel is tautological
For the record I’m excited by gradient routing, and I don’t want to come across as a downer, but this application doesn’t compel me
Edit: Here’s an intuition pump. Would you be similarly excited by having 10 different autoencoders which each reconstruct a single digit, then stitching them together into a single global autoencoder? Because conceptually that seems like what you’re doing
I disagree that this is the same as just stitching together different autoencoders. Presumably the encoder has some shared computation before specializing at the encoding level. I also don’t see how you could use 10 different autoencoders to classify an image from the encodings. I guess you could just look at the reconstruction loss and then the autoencoder which got the lowest loss would probably correspond to the label, but that seems different to what I’m doing.
However, I agree that this application is not useful. I shared it because I (and others) thought it was cool. It’s not really practical at all.
Hope this addresses your question :)
I see, if you disagree w the characterization then I’ve likely misunderstood what you were doing in this post, in which case I no longer endorse the above statements. Thanks for clarifying!
I agree, but the point I’m making is that you had to know the labels in order to know where to detach the gradient. So it’s kind of like making something interpretable by imposing your interpretation on it, which I feel is tautological
For the record I’m excited by gradient routing, and I don’t want to come across as a downer, but this application doesn’t compel me
Edit: Here’s an intuition pump. Would you be similarly excited by having 10 different autoencoders which each reconstruct a single digit, then stitching them together into a single global autoencoder? Because conceptually that seems like what you’re doing
I disagree that this is the same as just stitching together different autoencoders. Presumably the encoder has some shared computation before specializing at the encoding level. I also don’t see how you could use 10 different autoencoders to classify an image from the encodings. I guess you could just look at the reconstruction loss and then the autoencoder which got the lowest loss would probably correspond to the label, but that seems different to what I’m doing. However, I agree that this application is not useful. I shared it because I (and others) thought it was cool. It’s not really practical at all. Hope this addresses your question :)
I see, if you disagree w the characterization then I’ve likely misunderstood what you were doing in this post, in which case I no longer endorse the above statements. Thanks for clarifying!