I’m looking forward to both this series, and the workshop!
I think I (and probably many other people) would find it helpful if there was an entry in this sequence which was purely the classical story told in a way/with language which makes its deficiencies clear and the contrasts with the Watanabe version very easy to point out. (Maybe a −1 entry, since 0 is already used?)
The way I’ve structured the sequence means these points are interspersed throughout the broader narrative, but its a great question so I’ll provide a brief summary here, and as they are released I will link to the relevant sections in this comment.
In regular model classes, the set of true parameters W0 that minimise the loss K(w) is a single point. In singular model classes, it can be significantly more than a single point. Generally, it is a higher-dimensional structure. See here in DSLT1.
In regular model classes, every point on K(w) can be approximated by a quadratic form. In singular model classes, this is not true. Thus, asymptotic normality of regular models (i.e. every regular posterior is just a Gaussian as n→∞) breaks down. See here in DSLT1, or here in DSLT2.
Relatedly, the Bayesian Information Criterion (BIC) doesn’t hold in singular models, because of a similar problem of non-quadraticness. Watanabe generalises the BIC to singular models with the WBIC, which shows complexity is measured by the RLCT 2λ∈Q>0, not the dimension of parameter space d. The RLCT satisfies λ≤d2 in general, and λ=d2 when its regular. See here in DSLT2.
In regular model classes, every parameter has the same complexity d2. In singular model classes, different parameters w have different complexities λw according to their RLCT. See here in DSLT2.
If you have a fixed model class, the BIC can only be minimised by optimising the accuracy (see here). But in a singular model class, the WBIC can be minimised according to an accuracy-complexity tradeoff. So “simpler” models exist on singular loss landscapes, but every model is equally complex in regular models. See here in DSLT2.
With this latter point in mind, phase transitions are anticipated in singular models because the free energy is comprised of accuracy and complexity, which is different across parameter space. In regular models, since complexity is fixed, phase transitions are far less natural or interesting, in general.
I’m looking forward to both this series, and the workshop!
I think I (and probably many other people) would find it helpful if there was an entry in this sequence which was purely the classical story told in a way/with language which makes its deficiencies clear and the contrasts with the Watanabe version very easy to point out. (Maybe a −1 entry, since 0 is already used?)
The way I’ve structured the sequence means these points are interspersed throughout the broader narrative, but its a great question so I’ll provide a brief summary here, and as they are released I will link to the relevant sections in this comment.
In regular model classes, the set of true parameters W0 that minimise the loss K(w) is a single point. In singular model classes, it can be significantly more than a single point. Generally, it is a higher-dimensional structure. See here in DSLT1.
In regular model classes, every point on K(w) can be approximated by a quadratic form. In singular model classes, this is not true. Thus, asymptotic normality of regular models (i.e. every regular posterior is just a Gaussian as n→∞) breaks down. See here in DSLT1, or here in DSLT2.
Relatedly, the Bayesian Information Criterion (BIC) doesn’t hold in singular models, because of a similar problem of non-quadraticness. Watanabe generalises the BIC to singular models with the WBIC, which shows complexity is measured by the RLCT 2λ∈Q>0, not the dimension of parameter space d. The RLCT satisfies λ≤d2 in general, and λ=d2 when its regular. See here in DSLT2.
In regular model classes, every parameter has the same complexity d2. In singular model classes, different parameters w have different complexities λw according to their RLCT. See here in DSLT2.
If you have a fixed model class, the BIC can only be minimised by optimising the accuracy (see here). But in a singular model class, the WBIC can be minimised according to an accuracy-complexity tradeoff. So “simpler” models exist on singular loss landscapes, but every model is equally complex in regular models. See here in DSLT2.
With this latter point in mind, phase transitions are anticipated in singular models because the free energy is comprised of accuracy and complexity, which is different across parameter space. In regular models, since complexity is fixed, phase transitions are far less natural or interesting, in general.