In removing the O(1) terms I think we’re removing all of the widths of the peak in the various dimensions. So in the case where the widths are radically different between the models this would mean that N would need to be even larger for BIC to be a useful approximation.
The widths issue might come up, for example, when an additional parameter is added which splits the data into 2 populations with drastically different population sizes—the small population is likely to have a wider peak.
In removing the O(1) terms I think we’re removing all of the widths of the peak in the various dimensions. So in the case where the widths are radically different between the models this would mean that N would need to be even larger for BIC to be a useful approximation.
The widths issue might come up, for example, when an additional parameter is added which splits the data into 2 populations with drastically different population sizes—the small population is likely to have a wider peak.
Is that right?
That is exactly correct.