I’m sure you understand SA better than I do. I won’t argue the jargon.
And yet the tone of your post, plus the examples you use, make it sound like you’re saying that SA would not be a good choice for designing something like a bridge. The example of SA’s usefulness in designing computer chips seems to contradict that example.
If the intuition you’re trying to defend is “for complex problems, you need to precisely model and plan your design to achieve a workable solution,” then I think SA is a strong counterargument.
If instead, you’re arguing that “for complex problems, only a precise internal model of the problem will achieve an optimal solution,” then I’d say you’re right, but that arriving at that internal model is what an SA-like process is for. As it is written, defining the problem is most of the work. I think that’s what’s involved when “we break up the high-dimensional system into a bunch of low-level components, and reason about their interactions with each other” or make an “expensive investment in understanding the internal gears of the system.”
If I’m completely missing the point of your post, could you give a one-sentence tl;dr of the primary claim you’re making?
Ah, I see what you’re saying. This definitely gets at some interesting points.
First point is efficiency. As you mentioned in the top-level comment, SA “seems to work well if you have the ability to create and evaluate an enormous number of designs, at extremely low cost relative to the value of success, and high accuracy”. So SA would be a bad way for a human (or group of humans) to design a bridge, but if you have a cheap and accurate bridge-simulator, then it’s not unreasonable. There are probably much more efficient algorithms which leverage the problem structure more, but that requires writing complicated code.
Note that there’s nothing particularly special about SA in this regard—most black-box optimization methods work in largely the same cases.
(Ultimately, one of the main areas I want to apply the ideas in the OP to is better methods for human thinking—i.e. things humans can do to learn/problem-solve more quickly/successfully. So that’s largely the use-case I have in mind. Obviously computational optimization is a whole different game with different constraints, and we do use black-box optimizers in many computational problems with considerable success.)
Second point is problem specification. Once we move from talking about humans designing things to computers designing things (via black-box optimization), problem specification becomes the big bottleneck. We need both (a) a specification of the environment/problem-space, e.g. a physics simulator for bridge-design, and (b) specification of the objective, e.g. a function to compute whether a bridge has “failed” in a simulation and how much the bridge “costs”. The challenge is that both of these pieces need to work correctly over the entire (exponentially large) space of parameters which the optimizer could test. Obviously we can’t test correctness of the simulator and objective by checking it on every possible input, so we need some other way to check that it’s correct—which brings us back to gears.
In (certain parts of) chip design, this problem has turned out to be tractable. In that case, the “objective” is specified by humans using gearsy reasoning to design a chip, then black-boxing that chip design and requiring the optimizer to produce something functionally identical to the design (but more efficient by various metrics).
(Here again, gears matter mainly for the part of the thinking-work which the humans need to do.)
As to a one-sentence summary of my main claim: in complex high-dimensional problems, you (a human) need to break the system up into sparsely-coupled low-dimensional subsystems, and leverage that representation, in order to achieve a workable solution.
What I’m pushing on is the nature of “gearsy reasoning.” I think that it operates via SA.
Sometimes SA produces solutions to that provably calculate the global optimum. For example, mathematicians exploring geometry have been able to prove that if you’re trying to build a tank with the maximum volume:surface area ration, make it in the shape of a sphere.
Other times, it just works well enough. We use SA to invent a bunch of rivet and screw designs, concrete mixtures, beam angles. We testing the component solutions on other problems, try them out on model bridges, real bridges, computer simulations. Expert bridge builders spend their time exploring and exploiting heuristics for the details of the problems they routinely face. Expert bridge builder hirers explore and exploit options for who to hire. Somehow, bridges get built and rarely collapse.
So it’s SA all the way down. All reasoning boils down to SA, and it works as well as it possibly can.
It sounds like you might be arguing that there is something fundamentally different going on when we employ “gearsy reasoning.” If so, that is my point of disagreement. The account I’ve just given seems to me like an intuitive and accurate description of how I’ve solved every problem I’ve ever worked on, and how it appears to me that other people work as well.
Even in a case like dating, where the ability to explore is quite limited, I just wind up using a crappy version of SA. I find myself attracted to someone; we date and try to deepen/exploit our relationship; we either break up and try again with someone else, or we keep dating/exploiting indefinitely. It works well enough.
It sounds like you’re using “simulated annealing” as synonymous with something like “mix exploration and exploitation”? I’m not entirely sure what you’re using it to point to, but it definitely sounds like you’re using it as a metaphor for some particular aspect of search algorithms, rather than talking about the actual math of simulated annealing. What are you trying to emphasize?
Anyway, “mix exploration and exploitation” is not a search algorithm by itself. It’s a general trick which can be used in conjunction with a wide variety of search algorithms. This also applies to simulated annealing specifically: it’s a technique used in conjunction with other search algorithms. (E.g. scipy’s implementation takes an argument for which optimizer to use in the minimizer_kwargs.)
It sounds like you might be arguing that there is something fundamentally different going on when we employ “gearsy reasoning.” If so, that is my point of disagreement.
I am definitely arguing that gearsy reasoning does something fundamentally different. It does use black-box optimizers, but it uses them locally, operating on individual gears. Black-box optimization works badly on high-dimensional systems, so we decompose such systems into coupled low-dimensional systems, and then use black-box optimizers on the low-dimensional components. The interesting part of gearsy reasoning is how to account for the coupling, which is where things like causality and propagation come in.
The account I’ve just given seems to me like an intuitive and accurate description of how I’ve solved every problem I’ve ever worked on, and how it appears to me that other people work as well.
The account you’ve given may well be accurate, but it’s an inherently incomplete account. Simulated annealing—or whatever you’re trying to point to with that phrase—is not itself a search algorithm. Even if all problem solving involves some mix of exploration and exploitation, that still doesn’t tell us which direction to go in a thousand-dimensional space when “exploiting”. Likewise, exploration clearly isn’t completely random—so how do we decide which directions to explore? That’s what gears are for.
I’m using SA closer to its original meaning as a metaphor for a process in chemistry rather than its precise mathematics as an algorithm. We start with lots of random trial and error, then dial it down until we just exploit. I do think that this, not just “some explore and some exploit” is how black-box optimization works in practice. It sounds like we agree that decomposition + explore/exploit is a major component of rationality.
I believe that the steering/search algorithm is also developed and applied via black box optimization. It sounds like you disagree.
Here’s an example of my concept of how explore/exploit can fully account for the success of a high-dimensional optimization problem.
Jack is choosing a hobby (explore). He loves fresh bread and decides to learn to bake it. He tries for a while (exploit), but eventually decides it’s not very fun, so he picks a different hobby (explore). This time, he settles on Indian food, also because it’s delicious. This hobby sticks, and he learns dish after dish, sometimes choosing his favorite dishes, other times picking randomly out of a cookbook (exploit).
The question is whether Jack is doing something fundamentally different from exploring and exploiting when he chooses how to pick which bread to bake or which Indian food to cook? To pick dishes, Jack might take suggestions from his friend Ananya, an experienced Indian cook; cook the dishes he’s enjoyed at restaurants before; or pick randomly out of a cookbook.
But where did these decision procedures come from?
Well, we’ve been trying each of them for millennia. Each depend on exploitation (of friends, restaurants, cookbooks), which in turn are the result of exploration (going to parties, searching on Yelp, looking up reviews), and so on ad infinitum.
I don’t think that any other mysterious search algorithm is needed to explain how Jack optimizes his Indian food cooking hobby. At some level where there is no decision procedure to provide guidance (how does Ananya choose from among the dishes she thinks would be best to suggest to Jack?), we use some arbitrary selection method, such as random choice or a pointless rule like “choose the first thing that comes to mind.”
Can you describe where in this story (or the gaps within it) explore/exploit couldn’t suffice? Alternatively, is this not a high-dimensional optimization problem of the kind you had in mind?
I had a long conversation with my girlfriend at one point about picking hobbies. She used to just try things at random, and I was encouraging her to be more strategic. We started by making a list of things she liked/didn’t like about previous hobbies she’d tried. Then, we turned those into criteria for potential hobbies—things like “should involve exercise”, “should involve producing something”, “should be good to show off to friends”, or “should grow in skill over time”. Then we considered broad classes of activities which addressed specific criteria, like “sports” or “art”. We tried to be systematic about generating the activity-classes—e.g. starting from “what possible kinds of things can we produce?” rather than just generating random ideas and checking them against the constraints.
This sort of reasoning doesn’t really look like uniformly-random exploration followed by exploitation. In particular, the pattern I want to highlight in this case is:
identify constraints (i.e. the criteria)
search by backtracking from the constraints, rather than trying things uniformly-randomly
That’s not incompatible with exploration+exploitation (blackbox search can be used in sub-steps), but it is an additional component which is not implied by exploitation/exploration alone. It avoids uniformly-random exploration, and instead steers toward more-likely-useful possibilities. The constraints tell us which-way-to-go in our high-dimensional search space.
This part in particular is interesting:
But where did these decision procedures come from?
Well, we’ve been trying each of them for millennia. Each depend on exploitation (of friends, restaurants, cookbooks), which in turn are the result of exploration (going to parties, searching on Yelp, looking up reviews), and so on ad infinitum.
One of the hobbies my girlfriend eventually picked up was baking, and she recently wrote a post about how baking is not about following opaque decision procedures handed down through the millennia. That post has some great examples of gearsy reasoning; I recommend checking it out.
Her baking post was in the back of my mind when I typed my last comment! I thought it was great. I had the same reaction to her post as I had to yours.
We can and often do use constraints to guide our optimization process. Here are some examples:
When choosing a hobby, we can brainstorm what we want to get out of our hobby before we consider concrete activities that fit the bill.
When solving on a math problem, we can invent some rules for how to format our work so that it’ll be legible later.
Yelp implicitly considers factors like price, location, quality, and type of food before giving us options.
When building a bridge, we’re going to use standardized materials, tools, and techniques, not invent our own.
These constraints might be selected through intuition, or through the design of meta-constraints that help us select which object-level constraints to use.
However, the idea behind each constraint had to be invented in the first place, and in any concrete decision, we have to improvise its application.
So how does a constraint get invented in the first place? How do we decide which ones to use for a given decision? And how do we decide how to apply them? Probably with more constraints, but then the same questions arise about those. At some point, we are just picking constraints at random from those that occur to us, choosing randomly from among options that fit them, and seeing how we feel about the result.
We associate good feelings with individual or combinations of constraints that have worked out well in the past or that we’ve been taught to use, so they’re more likely to occur to us in the future.
So the process by which we decompose a problem; invent, combine, and apply constraints; or decide how to evaluate a decision; is itself a random process. Plan it all out, and you move the randomness one meta-level higher. That’s not to say it’s pointless. Planning and meta-level reasoning is often a powerful way to make decisions. It’s just not fundamentally different from object-level explore/exploit, and it runs into similar problems and ambiguities, just on a more abstract level.
I think your description here is basically right, except for the very last sentence. If we just had one meta-level, then yeah, it would end up basically the same as explore/exploit but at a more abstract level. But by recursively applying the trick, it ends up working well on a much wider class of problems than explore/exploit by itself.
The really neat thing is that recursively applying this particular trick does not require moving up an infinite ladder of meta-levels. There are only two levels, and moving “up” alternates between them.
To see why, consider what it looks like on a math problem where we’re trying to prove some theorem:
A “constraint” would be something like a conditional counterexample—i.e. “for any proof which uses strategy X, here’s a counterexample”
When searching for such counterexamples, our “constraints” are partial proofs—i.e. if we have a proof assuming X, then any counterexample must leverage/assume the falsehood of X.
So when we use constraints to guide our search for proofs, we look for counterexamples. When we use constraints to guide our search for counterexamples, we look for proofs.
More visually, consider solving a maze. We can find constraints like this:
When we’re searching for that constraint, what are the meta-constraints to that search? Well, they’re paths. A constraint is a path of connected walls; a meta-constraint is a path of connected spaces.
In general, we’re switching between solving an optimization problem and solving the dual problem.
Of course, we may also use some random search at any given step before switching.
I’m sure you understand SA better than I do. I won’t argue the jargon.
And yet the tone of your post, plus the examples you use, make it sound like you’re saying that SA would not be a good choice for designing something like a bridge. The example of SA’s usefulness in designing computer chips seems to contradict that example.
If the intuition you’re trying to defend is “for complex problems, you need to precisely model and plan your design to achieve a workable solution,” then I think SA is a strong counterargument.
If instead, you’re arguing that “for complex problems, only a precise internal model of the problem will achieve an optimal solution,” then I’d say you’re right, but that arriving at that internal model is what an SA-like process is for. As it is written, defining the problem is most of the work. I think that’s what’s involved when “we break up the high-dimensional system into a bunch of low-level components, and reason about their interactions with each other” or make an “expensive investment in understanding the internal gears of the system.”
If I’m completely missing the point of your post, could you give a one-sentence tl;dr of the primary claim you’re making?
Ah, I see what you’re saying. This definitely gets at some interesting points.
First point is efficiency. As you mentioned in the top-level comment, SA “seems to work well if you have the ability to create and evaluate an enormous number of designs, at extremely low cost relative to the value of success, and high accuracy”. So SA would be a bad way for a human (or group of humans) to design a bridge, but if you have a cheap and accurate bridge-simulator, then it’s not unreasonable. There are probably much more efficient algorithms which leverage the problem structure more, but that requires writing complicated code.
Note that there’s nothing particularly special about SA in this regard—most black-box optimization methods work in largely the same cases.
(Ultimately, one of the main areas I want to apply the ideas in the OP to is better methods for human thinking—i.e. things humans can do to learn/problem-solve more quickly/successfully. So that’s largely the use-case I have in mind. Obviously computational optimization is a whole different game with different constraints, and we do use black-box optimizers in many computational problems with considerable success.)
Second point is problem specification. Once we move from talking about humans designing things to computers designing things (via black-box optimization), problem specification becomes the big bottleneck. We need both (a) a specification of the environment/problem-space, e.g. a physics simulator for bridge-design, and (b) specification of the objective, e.g. a function to compute whether a bridge has “failed” in a simulation and how much the bridge “costs”. The challenge is that both of these pieces need to work correctly over the entire (exponentially large) space of parameters which the optimizer could test. Obviously we can’t test correctness of the simulator and objective by checking it on every possible input, so we need some other way to check that it’s correct—which brings us back to gears.
In (certain parts of) chip design, this problem has turned out to be tractable. In that case, the “objective” is specified by humans using gearsy reasoning to design a chip, then black-boxing that chip design and requiring the optimizer to produce something functionally identical to the design (but more efficient by various metrics).
(Here again, gears matter mainly for the part of the thinking-work which the humans need to do.)
As to a one-sentence summary of my main claim: in complex high-dimensional problems, you (a human) need to break the system up into sparsely-coupled low-dimensional subsystems, and leverage that representation, in order to achieve a workable solution.
What I’m pushing on is the nature of “gearsy reasoning.” I think that it operates via SA.
Sometimes SA produces solutions to that provably calculate the global optimum. For example, mathematicians exploring geometry have been able to prove that if you’re trying to build a tank with the maximum volume:surface area ration, make it in the shape of a sphere.
Other times, it just works well enough. We use SA to invent a bunch of rivet and screw designs, concrete mixtures, beam angles. We testing the component solutions on other problems, try them out on model bridges, real bridges, computer simulations. Expert bridge builders spend their time exploring and exploiting heuristics for the details of the problems they routinely face. Expert bridge builder hirers explore and exploit options for who to hire. Somehow, bridges get built and rarely collapse.
So it’s SA all the way down. All reasoning boils down to SA, and it works as well as it possibly can.
It sounds like you might be arguing that there is something fundamentally different going on when we employ “gearsy reasoning.” If so, that is my point of disagreement. The account I’ve just given seems to me like an intuitive and accurate description of how I’ve solved every problem I’ve ever worked on, and how it appears to me that other people work as well.
Even in a case like dating, where the ability to explore is quite limited, I just wind up using a crappy version of SA. I find myself attracted to someone; we date and try to deepen/exploit our relationship; we either break up and try again with someone else, or we keep dating/exploiting indefinitely. It works well enough.
It sounds like you’re using “simulated annealing” as synonymous with something like “mix exploration and exploitation”? I’m not entirely sure what you’re using it to point to, but it definitely sounds like you’re using it as a metaphor for some particular aspect of search algorithms, rather than talking about the actual math of simulated annealing. What are you trying to emphasize?
Anyway, “mix exploration and exploitation” is not a search algorithm by itself. It’s a general trick which can be used in conjunction with a wide variety of search algorithms. This also applies to simulated annealing specifically: it’s a technique used in conjunction with other search algorithms. (E.g. scipy’s implementation takes an argument for which optimizer to use in the minimizer_kwargs.)
I am definitely arguing that gearsy reasoning does something fundamentally different. It does use black-box optimizers, but it uses them locally, operating on individual gears. Black-box optimization works badly on high-dimensional systems, so we decompose such systems into coupled low-dimensional systems, and then use black-box optimizers on the low-dimensional components. The interesting part of gearsy reasoning is how to account for the coupling, which is where things like causality and propagation come in.
The account you’ve given may well be accurate, but it’s an inherently incomplete account. Simulated annealing—or whatever you’re trying to point to with that phrase—is not itself a search algorithm. Even if all problem solving involves some mix of exploration and exploitation, that still doesn’t tell us which direction to go in a thousand-dimensional space when “exploiting”. Likewise, exploration clearly isn’t completely random—so how do we decide which directions to explore? That’s what gears are for.
I’m using SA closer to its original meaning as a metaphor for a process in chemistry rather than its precise mathematics as an algorithm. We start with lots of random trial and error, then dial it down until we just exploit. I do think that this, not just “some explore and some exploit” is how black-box optimization works in practice. It sounds like we agree that decomposition + explore/exploit is a major component of rationality.
I believe that the steering/search algorithm is also developed and applied via black box optimization. It sounds like you disagree.
Here’s an example of my concept of how explore/exploit can fully account for the success of a high-dimensional optimization problem.
Jack is choosing a hobby (explore). He loves fresh bread and decides to learn to bake it. He tries for a while (exploit), but eventually decides it’s not very fun, so he picks a different hobby (explore). This time, he settles on Indian food, also because it’s delicious. This hobby sticks, and he learns dish after dish, sometimes choosing his favorite dishes, other times picking randomly out of a cookbook (exploit).
The question is whether Jack is doing something fundamentally different from exploring and exploiting when he chooses how to pick which bread to bake or which Indian food to cook? To pick dishes, Jack might take suggestions from his friend Ananya, an experienced Indian cook; cook the dishes he’s enjoyed at restaurants before; or pick randomly out of a cookbook.
But where did these decision procedures come from?
Well, we’ve been trying each of them for millennia. Each depend on exploitation (of friends, restaurants, cookbooks), which in turn are the result of exploration (going to parties, searching on Yelp, looking up reviews), and so on ad infinitum.
I don’t think that any other mysterious search algorithm is needed to explain how Jack optimizes his Indian food cooking hobby. At some level where there is no decision procedure to provide guidance (how does Ananya choose from among the dishes she thinks would be best to suggest to Jack?), we use some arbitrary selection method, such as random choice or a pointless rule like “choose the first thing that comes to mind.”
Can you describe where in this story (or the gaps within it) explore/exploit couldn’t suffice? Alternatively, is this not a high-dimensional optimization problem of the kind you had in mind?
I had a long conversation with my girlfriend at one point about picking hobbies. She used to just try things at random, and I was encouraging her to be more strategic. We started by making a list of things she liked/didn’t like about previous hobbies she’d tried. Then, we turned those into criteria for potential hobbies—things like “should involve exercise”, “should involve producing something”, “should be good to show off to friends”, or “should grow in skill over time”. Then we considered broad classes of activities which addressed specific criteria, like “sports” or “art”. We tried to be systematic about generating the activity-classes—e.g. starting from “what possible kinds of things can we produce?” rather than just generating random ideas and checking them against the constraints.
This sort of reasoning doesn’t really look like uniformly-random exploration followed by exploitation. In particular, the pattern I want to highlight in this case is:
identify constraints (i.e. the criteria)
search by backtracking from the constraints, rather than trying things uniformly-randomly
That’s not incompatible with exploration+exploitation (blackbox search can be used in sub-steps), but it is an additional component which is not implied by exploitation/exploration alone. It avoids uniformly-random exploration, and instead steers toward more-likely-useful possibilities. The constraints tell us which-way-to-go in our high-dimensional search space.
This part in particular is interesting:
One of the hobbies my girlfriend eventually picked up was baking, and she recently wrote a post about how baking is not about following opaque decision procedures handed down through the millennia. That post has some great examples of gearsy reasoning; I recommend checking it out.
Her baking post was in the back of my mind when I typed my last comment! I thought it was great. I had the same reaction to her post as I had to yours.
We can and often do use constraints to guide our optimization process. Here are some examples:
When choosing a hobby, we can brainstorm what we want to get out of our hobby before we consider concrete activities that fit the bill.
When solving on a math problem, we can invent some rules for how to format our work so that it’ll be legible later.
Yelp implicitly considers factors like price, location, quality, and type of food before giving us options.
When building a bridge, we’re going to use standardized materials, tools, and techniques, not invent our own.
These constraints might be selected through intuition, or through the design of meta-constraints that help us select which object-level constraints to use.
However, the idea behind each constraint had to be invented in the first place, and in any concrete decision, we have to improvise its application.
So how does a constraint get invented in the first place? How do we decide which ones to use for a given decision? And how do we decide how to apply them? Probably with more constraints, but then the same questions arise about those. At some point, we are just picking constraints at random from those that occur to us, choosing randomly from among options that fit them, and seeing how we feel about the result.
We associate good feelings with individual or combinations of constraints that have worked out well in the past or that we’ve been taught to use, so they’re more likely to occur to us in the future.
So the process by which we decompose a problem; invent, combine, and apply constraints; or decide how to evaluate a decision; is itself a random process. Plan it all out, and you move the randomness one meta-level higher. That’s not to say it’s pointless. Planning and meta-level reasoning is often a powerful way to make decisions. It’s just not fundamentally different from object-level explore/exploit, and it runs into similar problems and ambiguities, just on a more abstract level.
I think your description here is basically right, except for the very last sentence. If we just had one meta-level, then yeah, it would end up basically the same as explore/exploit but at a more abstract level. But by recursively applying the trick, it ends up working well on a much wider class of problems than explore/exploit by itself.
The really neat thing is that recursively applying this particular trick does not require moving up an infinite ladder of meta-levels. There are only two levels, and moving “up” alternates between them.
To see why, consider what it looks like on a math problem where we’re trying to prove some theorem:
A “constraint” would be something like a conditional counterexample—i.e. “for any proof which uses strategy X, here’s a counterexample”
When searching for such counterexamples, our “constraints” are partial proofs—i.e. if we have a proof assuming X, then any counterexample must leverage/assume the falsehood of X.
So when we use constraints to guide our search for proofs, we look for counterexamples. When we use constraints to guide our search for counterexamples, we look for proofs.
More visually, consider solving a maze. We can find constraints like this:
When we’re searching for that constraint, what are the meta-constraints to that search? Well, they’re paths. A constraint is a path of connected walls; a meta-constraint is a path of connected spaces.
In general, we’re switching between solving an optimization problem and solving the dual problem.
Of course, we may also use some random search at any given step before switching.
Mm i think “the higher process X is produced by SA” is different drill “the higher process X is implementing SA”