Followup on tied vs untied weights: it looks like untied makes a small improvement over tied, primarily in layers 2-4 which already have the most classifiers. Still missing the middle ring features though.
Next steps are using the Li et al model and training the SAE on more data.
Ah sorry, I skipped over that derivation! Here’s how we’d approach this from first principals: to solve f=Df, we know we want to use the (1-x)=1+x+x^2+… trick, but now know that we need x=I instead of x=D. So that’s why we want to switch to an integral equation, and we get
f=Df
If=IDf = f-f(0)
where the final equality is the fundamental theorem of calculus. Then we rearrange:
f-If=f(0)
(1-I)f=f(0)
and solve from there using the (1-I)=1+I+I^2+… trick! What’s nice about this is it shows exactly how the initial condition of the DE shows up.