This seems broadly right to me, but it seems to me like metaheuristics (in the numerical optimization sense) are practical and have a structure like the one that you’re describing. Neural architecture search is the name people are using for this sort of thing in contemporary ML.
What’s different between them and the sort of thing you describe? Well, for one the softening is even stronger; rather than a performance-weighted average across all strategies, it’s a performance-weighted sampling strategy that has access to all strategies (but will only actually evaluate a small subset of them). But it seems like the core strategy—be both doing object-level cognition and meta-level cognition about how you’re doing object-level cognitive—is basically the same.
It remains unclear to me whether the right way to find these meta-strategies is something like “start at the impractical ideal and rescue what you can” or “start with something that works and build new features”; it seems like modern computational Bayesian methods look more like the former than the latter. When I think about how to describe human epistemology, it seems like computationally bounded Bayes is a promising approach (where probabilities change both by the standard updates among hypotheses that already exist, and new operations to be formalized to add or remove hypotheses; you want to be able to capture “Why didn’t you assign high probability to X?” “Because I didn’t think of it; now that I have, I do.”). But of course I’m using my judgment that already works to consider adding new features here, rather than having built how to think out of rescuing what I can from the impractical ideal of how to think.
But it seems like the core strategy—be both doing object-level cognition and meta-level cognition about how you’re doing object-level cognitive—is basically the same.
It remains unclear to me whether the right way to find these meta-strategies is something like “start at the impractical ideal and rescue what you can” or “start with something that works and build new features”; it seems like modern computational Bayesian methods look more like the former than the latter.
I’d argue that there’s usually a causal arrow from practical lore to impractical ideals first, even if the ideals also influence practice at a later stage. Occam’s Razor came before Solomonoff; “change your mind when you see surprising new evidence” came before formal Bayes. The “core strategy” you refer to sounds like “do both exploration and exploitation,” which is the sort of idea I’d imagine goes back millennia (albeit not in those exact terms).
One of my goals in writing this post was to formalize the feeling I get, when I think about an idealized theory of this kind, that it’s a “redundant step” added on top of something that already does all the work by itself—like taking a decision theory and appending the rule “take the actions this theory says to take.” But rather than being transparently vacuous, like that example, they are vacuous in a more hidden way, and the redundant steps they add tend to resemble legitimately good ideas familiar from practical experience.
Consider the following (ridiculous) theory of rationality: “do the most rational thing, and also, remember to stay hydrated :)”. In a certain inane sense, most rational behavior “conforms to” this theory, since the theory parasitizes on whatever existing notion of rationality you had, and staying hydrated is generally a good idea and thus does not tend to conflict with rationality. And whenever staying hydrated is a good idea, one could imagine pointing to this theory and saying “see, there’s the hydration theory of rationality at work again.” But, of course, none of this should actually count in the “hydration theory’s” favor: all the real work is hidden in the first step (“do the most rational thing”), and insofar as hydration is rational, there’s no need to specify it explicitly. This doesn’t quite map onto the R/S schema, but captures the way in which I think these theories tend to confuse people.
If the more serious ideals we’re talking about are like the “hydration theory,” we’d expect them to have the appearance of explaining existing practical methods, and of retrospectively explaining the success of new methods, while not being very useful for generating any new methods. And this seems generally true to me: there’s a lot of ensemble-like or regularization-like stuff in ML that can be interpreted as Bayesian averaging/updating over some base space of models, but most of the excitement in ML is in these base spaces. We didn’t get neural networks from Bayesian first principles.
This seems broadly right to me, but it seems to me like metaheuristics (in the numerical optimization sense) are practical and have a structure like the one that you’re describing. Neural architecture search is the name people are using for this sort of thing in contemporary ML.
What’s different between them and the sort of thing you describe? Well, for one the softening is even stronger; rather than a performance-weighted average across all strategies, it’s a performance-weighted sampling strategy that has access to all strategies (but will only actually evaluate a small subset of them). But it seems like the core strategy—be both doing object-level cognition and meta-level cognition about how you’re doing object-level cognitive—is basically the same.
It remains unclear to me whether the right way to find these meta-strategies is something like “start at the impractical ideal and rescue what you can” or “start with something that works and build new features”; it seems like modern computational Bayesian methods look more like the former than the latter. When I think about how to describe human epistemology, it seems like computationally bounded Bayes is a promising approach (where probabilities change both by the standard updates among hypotheses that already exist, and new operations to be formalized to add or remove hypotheses; you want to be able to capture “Why didn’t you assign high probability to X?” “Because I didn’t think of it; now that I have, I do.”). But of course I’m using my judgment that already works to consider adding new features here, rather than having built how to think out of rescuing what I can from the impractical ideal of how to think.
I’d argue that there’s usually a causal arrow from practical lore to impractical ideals first, even if the ideals also influence practice at a later stage. Occam’s Razor came before Solomonoff; “change your mind when you see surprising new evidence” came before formal Bayes. The “core strategy” you refer to sounds like “do both exploration and exploitation,” which is the sort of idea I’d imagine goes back millennia (albeit not in those exact terms).
One of my goals in writing this post was to formalize the feeling I get, when I think about an idealized theory of this kind, that it’s a “redundant step” added on top of something that already does all the work by itself—like taking a decision theory and appending the rule “take the actions this theory says to take.” But rather than being transparently vacuous, like that example, they are vacuous in a more hidden way, and the redundant steps they add tend to resemble legitimately good ideas familiar from practical experience.
Consider the following (ridiculous) theory of rationality: “do the most rational thing, and also, remember to stay hydrated :)”. In a certain inane sense, most rational behavior “conforms to” this theory, since the theory parasitizes on whatever existing notion of rationality you had, and staying hydrated is generally a good idea and thus does not tend to conflict with rationality. And whenever staying hydrated is a good idea, one could imagine pointing to this theory and saying “see, there’s the hydration theory of rationality at work again.” But, of course, none of this should actually count in the “hydration theory’s” favor: all the real work is hidden in the first step (“do the most rational thing”), and insofar as hydration is rational, there’s no need to specify it explicitly. This doesn’t quite map onto the R/S schema, but captures the way in which I think these theories tend to confuse people.
If the more serious ideals we’re talking about are like the “hydration theory,” we’d expect them to have the appearance of explaining existing practical methods, and of retrospectively explaining the success of new methods, while not being very useful for generating any new methods. And this seems generally true to me: there’s a lot of ensemble-like or regularization-like stuff in ML that can be interpreted as Bayesian averaging/updating over some base space of models, but most of the excitement in ML is in these base spaces. We didn’t get neural networks from Bayesian first principles.