Take the example of the Laplace approximation. If there’s a local continuous symmetry in weight space, i.e., some direction you can walk that doesn’t affect the probability density, then your density isn’t locally Gaussian.
Haven’t finished the post, but doesn’t this assume the requirement that ϕ(w1)=ϕ(w2) when w1 and w2 induce the same function? This isn’t obvious to me, e.g. under the induced prior from weight decay / L2 regularization we often have ϕ(w1)≠ϕ(w2) for weights that induce the same function.
Yes, so an example of this would be the ReLU scaling symmetry discussed in “Neural networks are freaks of symmetries.” You’re right that regularization often breaks this kind of symmetry.
But even when there are no local symmetries, having other points that have the same posterior means this assumption of asymptotic normality doesn’t hold.
Haven’t finished the post, but doesn’t this assume the requirement that ϕ(w1)=ϕ(w2) when w1 and w2 induce the same function? This isn’t obvious to me, e.g. under the induced prior from weight decay / L2 regularization we often have ϕ(w1)≠ϕ(w2) for weights that induce the same function.
Yes, so an example of this would be the ReLU scaling symmetry discussed in “Neural networks are freaks of symmetries.” You’re right that regularization often breaks this kind of symmetry.
But even when there are no local symmetries, having other points that have the same posterior means this assumption of asymptotic normality doesn’t hold.