Yeah, I’m familiar with privileged bases. Once we generalize to a whole privileged coordinate system, the RELUs are no longer enough.
Isotropy of the initialization distribution still applies, but the key is that we only get to pick one rotation for the parameters, and that same rotation has to be used for all data points. That constraint is baked in to the framing when thinking about privileged bases, but it has to be derived when thinking about privileged coordinate systems.
Yeah, I’m familiar with privileged bases. Once we generalize to a whole privileged coordinate system, the RELUs are no longer enough.
Isotropy of the initialization distribution still applies, but the key is that we only get to pick one rotation for the parameters, and that same rotation has to be used for all data points. That constraint is baked in to the framing when thinking about privileged bases, but it has to be derived when thinking about privileged coordinate systems.