Your reasoning here is basically correct; this is why Laplace’ approximation typically works very well on large datasets. One big catch is that it requires the number of data points be large relative to the dimension of the variables. The real world is decidedly high dimensional, so in practice the conditions for Gausianity usually happen when we pick some small set of “features” to focus on and then get a bunch of data on those (e.g. as is typically done in academic statistics).
There’s also another more subtle catch here: in e.g. a large causal model, once we have a decent number of variables, we often have all the info we’re going to get about some value of interest, and later updates add basically-zero information. Depending on how that plays out, it could mess up the Gaussianity convergence.
I hadn’t actually heard of Laplace’s approximation—definitely relevant! The catch about the dimension is a good one.
In the large causal model, is the issue just that
there is one multiplication per variable
some dependence chains don’t have very many variables in them
in those few-variable chains, we might not get enough multiplications to converge?
If that is the issue, weird nasty operations occur to me, like breaking variables up into sub and sub-sub variables, to get more multiplications, which might get more Gaussian. (For example, splitting the node “Maxwell finishes writing this comment” into “his computer doesn’t run out of battery” and “the police don’t suddenly bust into his apartment”). Whether or not it’s worth doing, I wonder—would this actually work to make things more Gaussian? Or is there some… conservation of convergence… that makes it so you can’t get closer to Gaussian by splitting variables up? [Don’t feel like you have to answer these—they’re more just me following up on thoughts I got from your comment].
You’ve got a solid talent for math research.
Your reasoning here is basically correct; this is why Laplace’ approximation typically works very well on large datasets. One big catch is that it requires the number of data points be large relative to the dimension of the variables. The real world is decidedly high dimensional, so in practice the conditions for Gausianity usually happen when we pick some small set of “features” to focus on and then get a bunch of data on those (e.g. as is typically done in academic statistics).
There’s also another more subtle catch here: in e.g. a large causal model, once we have a decent number of variables, we often have all the info we’re going to get about some value of interest, and later updates add basically-zero information. Depending on how that plays out, it could mess up the Gaussianity convergence.
Thank you!
I hadn’t actually heard of Laplace’s approximation—definitely relevant! The catch about the dimension is a good one.
In the large causal model, is the issue just that
there is one multiplication per variable
some dependence chains don’t have very many variables in them
in those few-variable chains, we might not get enough multiplications to converge?
If that is the issue, weird nasty operations occur to me, like breaking variables up into sub and sub-sub variables, to get more multiplications, which might get more Gaussian. (For example, splitting the node “Maxwell finishes writing this comment” into “his computer doesn’t run out of battery” and “the police don’t suddenly bust into his apartment”). Whether or not it’s worth doing, I wonder—would this actually work to make things more Gaussian? Or is there some… conservation of convergence… that makes it so you can’t get closer to Gaussian by splitting variables up? [Don’t feel like you have to answer these—they’re more just me following up on thoughts I got from your comment].
I accept your affordance, and thank you, this will make me more likely to comment on your posts in the future.