I’m using SA closer to its original meaning as a metaphor for a process in chemistry rather than its precise mathematics as an algorithm. We start with lots of random trial and error, then dial it down until we just exploit. I do think that this, not just “some explore and some exploit” is how black-box optimization works in practice. It sounds like we agree that decomposition + explore/exploit is a major component of rationality.
I believe that the steering/search algorithm is also developed and applied via black box optimization. It sounds like you disagree.
Here’s an example of my concept of how explore/exploit can fully account for the success of a high-dimensional optimization problem.
Jack is choosing a hobby (explore). He loves fresh bread and decides to learn to bake it. He tries for a while (exploit), but eventually decides it’s not very fun, so he picks a different hobby (explore). This time, he settles on Indian food, also because it’s delicious. This hobby sticks, and he learns dish after dish, sometimes choosing his favorite dishes, other times picking randomly out of a cookbook (exploit).
The question is whether Jack is doing something fundamentally different from exploring and exploiting when he chooses how to pick which bread to bake or which Indian food to cook? To pick dishes, Jack might take suggestions from his friend Ananya, an experienced Indian cook; cook the dishes he’s enjoyed at restaurants before; or pick randomly out of a cookbook.
But where did these decision procedures come from?
Well, we’ve been trying each of them for millennia. Each depend on exploitation (of friends, restaurants, cookbooks), which in turn are the result of exploration (going to parties, searching on Yelp, looking up reviews), and so on ad infinitum.
I don’t think that any other mysterious search algorithm is needed to explain how Jack optimizes his Indian food cooking hobby. At some level where there is no decision procedure to provide guidance (how does Ananya choose from among the dishes she thinks would be best to suggest to Jack?), we use some arbitrary selection method, such as random choice or a pointless rule like “choose the first thing that comes to mind.”
Can you describe where in this story (or the gaps within it) explore/exploit couldn’t suffice? Alternatively, is this not a high-dimensional optimization problem of the kind you had in mind?
I had a long conversation with my girlfriend at one point about picking hobbies. She used to just try things at random, and I was encouraging her to be more strategic. We started by making a list of things she liked/didn’t like about previous hobbies she’d tried. Then, we turned those into criteria for potential hobbies—things like “should involve exercise”, “should involve producing something”, “should be good to show off to friends”, or “should grow in skill over time”. Then we considered broad classes of activities which addressed specific criteria, like “sports” or “art”. We tried to be systematic about generating the activity-classes—e.g. starting from “what possible kinds of things can we produce?” rather than just generating random ideas and checking them against the constraints.
This sort of reasoning doesn’t really look like uniformly-random exploration followed by exploitation. In particular, the pattern I want to highlight in this case is:
identify constraints (i.e. the criteria)
search by backtracking from the constraints, rather than trying things uniformly-randomly
That’s not incompatible with exploration+exploitation (blackbox search can be used in sub-steps), but it is an additional component which is not implied by exploitation/exploration alone. It avoids uniformly-random exploration, and instead steers toward more-likely-useful possibilities. The constraints tell us which-way-to-go in our high-dimensional search space.
This part in particular is interesting:
But where did these decision procedures come from?
Well, we’ve been trying each of them for millennia. Each depend on exploitation (of friends, restaurants, cookbooks), which in turn are the result of exploration (going to parties, searching on Yelp, looking up reviews), and so on ad infinitum.
One of the hobbies my girlfriend eventually picked up was baking, and she recently wrote a post about how baking is not about following opaque decision procedures handed down through the millennia. That post has some great examples of gearsy reasoning; I recommend checking it out.
Her baking post was in the back of my mind when I typed my last comment! I thought it was great. I had the same reaction to her post as I had to yours.
We can and often do use constraints to guide our optimization process. Here are some examples:
When choosing a hobby, we can brainstorm what we want to get out of our hobby before we consider concrete activities that fit the bill.
When solving on a math problem, we can invent some rules for how to format our work so that it’ll be legible later.
Yelp implicitly considers factors like price, location, quality, and type of food before giving us options.
When building a bridge, we’re going to use standardized materials, tools, and techniques, not invent our own.
These constraints might be selected through intuition, or through the design of meta-constraints that help us select which object-level constraints to use.
However, the idea behind each constraint had to be invented in the first place, and in any concrete decision, we have to improvise its application.
So how does a constraint get invented in the first place? How do we decide which ones to use for a given decision? And how do we decide how to apply them? Probably with more constraints, but then the same questions arise about those. At some point, we are just picking constraints at random from those that occur to us, choosing randomly from among options that fit them, and seeing how we feel about the result.
We associate good feelings with individual or combinations of constraints that have worked out well in the past or that we’ve been taught to use, so they’re more likely to occur to us in the future.
So the process by which we decompose a problem; invent, combine, and apply constraints; or decide how to evaluate a decision; is itself a random process. Plan it all out, and you move the randomness one meta-level higher. That’s not to say it’s pointless. Planning and meta-level reasoning is often a powerful way to make decisions. It’s just not fundamentally different from object-level explore/exploit, and it runs into similar problems and ambiguities, just on a more abstract level.
I think your description here is basically right, except for the very last sentence. If we just had one meta-level, then yeah, it would end up basically the same as explore/exploit but at a more abstract level. But by recursively applying the trick, it ends up working well on a much wider class of problems than explore/exploit by itself.
The really neat thing is that recursively applying this particular trick does not require moving up an infinite ladder of meta-levels. There are only two levels, and moving “up” alternates between them.
To see why, consider what it looks like on a math problem where we’re trying to prove some theorem:
A “constraint” would be something like a conditional counterexample—i.e. “for any proof which uses strategy X, here’s a counterexample”
When searching for such counterexamples, our “constraints” are partial proofs—i.e. if we have a proof assuming X, then any counterexample must leverage/assume the falsehood of X.
So when we use constraints to guide our search for proofs, we look for counterexamples. When we use constraints to guide our search for counterexamples, we look for proofs.
More visually, consider solving a maze. We can find constraints like this:
When we’re searching for that constraint, what are the meta-constraints to that search? Well, they’re paths. A constraint is a path of connected walls; a meta-constraint is a path of connected spaces.
In general, we’re switching between solving an optimization problem and solving the dual problem.
Of course, we may also use some random search at any given step before switching.
I’m using SA closer to its original meaning as a metaphor for a process in chemistry rather than its precise mathematics as an algorithm. We start with lots of random trial and error, then dial it down until we just exploit. I do think that this, not just “some explore and some exploit” is how black-box optimization works in practice. It sounds like we agree that decomposition + explore/exploit is a major component of rationality.
I believe that the steering/search algorithm is also developed and applied via black box optimization. It sounds like you disagree.
Here’s an example of my concept of how explore/exploit can fully account for the success of a high-dimensional optimization problem.
Jack is choosing a hobby (explore). He loves fresh bread and decides to learn to bake it. He tries for a while (exploit), but eventually decides it’s not very fun, so he picks a different hobby (explore). This time, he settles on Indian food, also because it’s delicious. This hobby sticks, and he learns dish after dish, sometimes choosing his favorite dishes, other times picking randomly out of a cookbook (exploit).
The question is whether Jack is doing something fundamentally different from exploring and exploiting when he chooses how to pick which bread to bake or which Indian food to cook? To pick dishes, Jack might take suggestions from his friend Ananya, an experienced Indian cook; cook the dishes he’s enjoyed at restaurants before; or pick randomly out of a cookbook.
But where did these decision procedures come from?
Well, we’ve been trying each of them for millennia. Each depend on exploitation (of friends, restaurants, cookbooks), which in turn are the result of exploration (going to parties, searching on Yelp, looking up reviews), and so on ad infinitum.
I don’t think that any other mysterious search algorithm is needed to explain how Jack optimizes his Indian food cooking hobby. At some level where there is no decision procedure to provide guidance (how does Ananya choose from among the dishes she thinks would be best to suggest to Jack?), we use some arbitrary selection method, such as random choice or a pointless rule like “choose the first thing that comes to mind.”
Can you describe where in this story (or the gaps within it) explore/exploit couldn’t suffice? Alternatively, is this not a high-dimensional optimization problem of the kind you had in mind?
I had a long conversation with my girlfriend at one point about picking hobbies. She used to just try things at random, and I was encouraging her to be more strategic. We started by making a list of things she liked/didn’t like about previous hobbies she’d tried. Then, we turned those into criteria for potential hobbies—things like “should involve exercise”, “should involve producing something”, “should be good to show off to friends”, or “should grow in skill over time”. Then we considered broad classes of activities which addressed specific criteria, like “sports” or “art”. We tried to be systematic about generating the activity-classes—e.g. starting from “what possible kinds of things can we produce?” rather than just generating random ideas and checking them against the constraints.
This sort of reasoning doesn’t really look like uniformly-random exploration followed by exploitation. In particular, the pattern I want to highlight in this case is:
identify constraints (i.e. the criteria)
search by backtracking from the constraints, rather than trying things uniformly-randomly
That’s not incompatible with exploration+exploitation (blackbox search can be used in sub-steps), but it is an additional component which is not implied by exploitation/exploration alone. It avoids uniformly-random exploration, and instead steers toward more-likely-useful possibilities. The constraints tell us which-way-to-go in our high-dimensional search space.
This part in particular is interesting:
One of the hobbies my girlfriend eventually picked up was baking, and she recently wrote a post about how baking is not about following opaque decision procedures handed down through the millennia. That post has some great examples of gearsy reasoning; I recommend checking it out.
Her baking post was in the back of my mind when I typed my last comment! I thought it was great. I had the same reaction to her post as I had to yours.
We can and often do use constraints to guide our optimization process. Here are some examples:
When choosing a hobby, we can brainstorm what we want to get out of our hobby before we consider concrete activities that fit the bill.
When solving on a math problem, we can invent some rules for how to format our work so that it’ll be legible later.
Yelp implicitly considers factors like price, location, quality, and type of food before giving us options.
When building a bridge, we’re going to use standardized materials, tools, and techniques, not invent our own.
These constraints might be selected through intuition, or through the design of meta-constraints that help us select which object-level constraints to use.
However, the idea behind each constraint had to be invented in the first place, and in any concrete decision, we have to improvise its application.
So how does a constraint get invented in the first place? How do we decide which ones to use for a given decision? And how do we decide how to apply them? Probably with more constraints, but then the same questions arise about those. At some point, we are just picking constraints at random from those that occur to us, choosing randomly from among options that fit them, and seeing how we feel about the result.
We associate good feelings with individual or combinations of constraints that have worked out well in the past or that we’ve been taught to use, so they’re more likely to occur to us in the future.
So the process by which we decompose a problem; invent, combine, and apply constraints; or decide how to evaluate a decision; is itself a random process. Plan it all out, and you move the randomness one meta-level higher. That’s not to say it’s pointless. Planning and meta-level reasoning is often a powerful way to make decisions. It’s just not fundamentally different from object-level explore/exploit, and it runs into similar problems and ambiguities, just on a more abstract level.
I think your description here is basically right, except for the very last sentence. If we just had one meta-level, then yeah, it would end up basically the same as explore/exploit but at a more abstract level. But by recursively applying the trick, it ends up working well on a much wider class of problems than explore/exploit by itself.
The really neat thing is that recursively applying this particular trick does not require moving up an infinite ladder of meta-levels. There are only two levels, and moving “up” alternates between them.
To see why, consider what it looks like on a math problem where we’re trying to prove some theorem:
A “constraint” would be something like a conditional counterexample—i.e. “for any proof which uses strategy X, here’s a counterexample”
When searching for such counterexamples, our “constraints” are partial proofs—i.e. if we have a proof assuming X, then any counterexample must leverage/assume the falsehood of X.
So when we use constraints to guide our search for proofs, we look for counterexamples. When we use constraints to guide our search for counterexamples, we look for proofs.
More visually, consider solving a maze. We can find constraints like this:
When we’re searching for that constraint, what are the meta-constraints to that search? Well, they’re paths. A constraint is a path of connected walls; a meta-constraint is a path of connected spaces.
In general, we’re switching between solving an optimization problem and solving the dual problem.
Of course, we may also use some random search at any given step before switching.
Mm i think “the higher process X is produced by SA” is different drill “the higher process X is implementing SA”