Garrett Baker comments on Atoms to Agents Proto-Lectures

Garrett Baker 23 Sep 2023 1:08 UTC
9 points
0
One thing I notice I’m confused about: What causes problems to be factorizable?

To me, there seems to be several different ways you can factor a problem:
1. Symmetry exploitation. Example: Binary search is a good example here. I have an upper and lower bound for a value, and I want to find the value. I only evaluate whether points between my upper and lower bounds are less than or greater than the value I want to find, and I can ignore all other points because if I have an upper bound, everything greater than that is going to also be greater than my target value.
2. Bottleneck/constraint enumeration. Example: The maze example from your video.
3. Bottleneck/constraint relaxation. Example: In maze solving, you can ignore the constraining walls in order to decide on heuristics in your search algorithm for which actions to evaluate the feasibility of next. Dual problems in convex optimization can also be seen as examples of this.
4. Inferences between local and global structures. Example: If we have a smooth objective, we can do gradient descent, and be confident we’ll at least get a pretty good value by the end.
5. Structure exploitation. Example: Lots of dynamic programming stuff I’m given to understand. Potentially just a collection of heuristics for maps from problem properties to solution symmetries induced by those problems.
6. Problem reformulation. Example: If you have a non-convex optimization problem, you can sometimes find an equivalent convex optimization problem if you do the right sorts of manipulations.
1, 4, and 5 seem similar because they’re all creating bounds on where the optimal solution could be according to misleadingly few observations. Though also they don’t have to be targeted at the bounds of the optimal solution. You can find symmetries in your constraints.

2 and 3 seem similar because they both have to do with bottlenecks.

6 is only doable when the problem reformulation step is factorizable (or else the operations you can do for the reformulation are small, which I guess potentially they can be even when your problem is very big).

These seem like they potentially have some common root which causes them all.

The obvious explanation to me mostly lies in number 5. The problem you have is often generated by the universe. Or at least, the solution space of your problem is a state space of the universe. And because our universe has locality and symmetry, the space of our problem and solution space also has symmetry and locality elements, which can often be exploited.
1. Symmetry comes directly.
2. Bottleneck/constraint enumeration. Due to locality, there are usually fairly few bottlenecks you can target independently.
3. Bottleneck/constraint relaxation. In the maze example, we can ignore the walls to construct heuristics both for the enumeration of constraints, and to decide on next actions. For deciding on next actions, if we had some insane ~~geometry~~ edit: topology with no kinds of nice symmetries that our [edit]: spacial topologies have (I’m picturing in my head a random assortment of pockmarked [edit:] branching topological holes throughout the maze), then shortest-path distances are difficult to calculate, and our heuristic takes the form of a potentially arbitrarily difficult shortest-path optimization problem (I don’t actually know if this is actually arbitrarily difficult, though I’m pretty sure it is, since you should be able to encode an arbitrary search problem into the shortest path in the topology). So I claim the usefulness of this comes from symmetries.
4. Inferences between local and global structures. Yeah, this is kinda just what “symmetry” means. Also potentially locality too, since lots of things are continuous in our universe, with derivatives easy to calculate, and so we can use SGD a bunch for solving stuff.
5. Structure exploitation. The inspiration for this hypothesis.
6. Problem reformulation. This one still feels kinda like a mystery. If I had to hazard a guess, this is because if we’re given a problem, the problem fundamentally has symmetry and locality properties for the reason given in the previous (non-enumerated) paragraph. And so if we got a problem, and it seems hard, in a sense, this is because there’s been some kind of (probably random) processing done to it so that these properties take more effort to figure out. But that processing likely wasn’t all that computationally intense, and likely wasn’t generated by a particularly computationally intense process, which itself wasn’t generated by a computationally intense process, etc. etc. and so undoing the process is doable.