Gelman wants to throw everything he can into his models—and then use multilevel (a.k.a. hierarchical) models to share information between exchangeable (or conditionally exchangeable) batches of parameters. The key concept: multilevel model structure makes the “effective number of parameters” become a quantity that is itself inferred from the data. So he can afford to take his “against parsimony” stance (which is really a stance against leaving potentially useful predictors out of his models) because his default model choice will induce parsimony just when the data warrant it.
I think one of Gelman’s comments in the first link is helpful:
In principle, models (at least for social-science phenomena) should be ever-expanding flowers that have have within them the capacity to handle small data sets (in which case, inferences will be pulled toward prior knowledge) or large data sets (in which case, the model will automatically unfold to allow the data to reveal more about the phenomenon under study). A single model will have zillions of parameters, most of which will barely be “activated” if sample size is not large.
In practice, those of us who rely on regression-type models and estimation procedures can easily lose control of large models when fit to small datasets. So, in practice, we start with simple models that we understand, and then we complexify them as needed. This has sometimes been formalized as a “sieve” of models and is also related to Cantor’s “diagonal” argument from set theory. (In this context, I’m saying that for any finite class of models, there will be a dataset for which these models don’t fit, thus requiring model expansion.)
Gelman wants to throw everything he can into his models—and then use multilevel (a.k.a. hierarchical) models to share information between exchangeable (or conditionally exchangeable) batches of parameters. The key concept: multilevel model structure makes the “effective number of parameters” become a quantity that is itself inferred from the data. So he can afford to take his “against parsimony” stance (which is really a stance against leaving potentially useful predictors out of his models) because his default model choice will induce parsimony just when the data warrant it.
I think one of Gelman’s comments in the first link is helpful: