Another aesthetic similarity which my brain noted is between your concept of ‘information loss’ on inputs for layers-which-discriminate and layers-which-don’t and the concept of sufficient statistics.
A sufficient statistic is one for which the posterior y is independent of the data x, given the statistic ϕ
P(y|x=x0)=P(y|ϕ(x)=ϕ(x0))
which has the same flavour as
f(x,θa,θb)=g(a(x),θb)
In the respective cases, ϕ and a are ‘sufficient’ and induce an equivalence class between xs
Another aesthetic similarity which my brain noted is between your concept of ‘information loss’ on inputs for layers-which-discriminate and layers-which-don’t and the concept of sufficient statistics.
A sufficient statistic is one for which the posterior y is independent of the data x, given the statistic ϕ
P(y|x=x0)=P(y|ϕ(x)=ϕ(x0))
which has the same flavour as
f(x,θa,θb)=g(a(x),θb)
In the respective cases, ϕ and a are ‘sufficient’ and induce an equivalence class between xs
Yup, seems correct.