Imagine a computational task that breaks up into solving many instances of problems A and B. Each instance reduces to at most n instances of problem A and at most m instances of problem B. However, these two maxima are never achieved both at once: The sum of the number of instances of A and instances of B is bounded above by some r<n+m. One way to compute this with a circuit is to include n copies of a circuit C0 for computing problem A and m copies of a circuit C1 for computing problem B. Another approach for solving the task is to include r copies of a circuit C2 which, with suitable control inputs, can compute either problem A or problem B. Although this approach requires more complicated control circuitry, if r is significantly less than n+m and the size of C2 is significantly less than the sum of the sizes of C0 and C1 (which may occur if problems A and B have common subproblems X and Y which can use a shared circuit) then this approach will use less logic gates overall.
More generally, consider some complex computational task that breaks down into a heterogeneous set of subproblems which are distributed in different ways depending on the exact instance. Analogous reasoning suggests that the minimal circuit for solving this task will involve a structure akin to emulating a CPU: There are many instances of optimized circuits for low-level tasks, connected by a complex dependency graph. In any particular instance of the problem the relevant data dependencies are only a small subgraph of this graph, with connections decided by some control circuitry. A particular low-level circuit need not have a fixed purpose, but is used in different ways in different instances.
So, our circuit has a dependency tree of low-level tasks optimized for solving our problem in the worst-case. Now, at a starting stage of this hierarchy it has to process information about how a particular instance is separated into subproblems and generate the control information for solving this particular instance. The control information might need to be recomputed as new information about the structure of the instance are made manifest, and sometimes a part of the circuit may perform this recomputation without full access to potentially conflicting control information calculated in other parts.
This may be relevant:
Imagine a computational task that breaks up into solving many instances of problems A and B. Each instance reduces to at most n instances of problem A and at most m instances of problem B. However, these two maxima are never achieved both at once: The sum of the number of instances of A and instances of B is bounded above by some r<n+m. One way to compute this with a circuit is to include n copies of a circuit C0 for computing problem A and m copies of a circuit C1 for computing problem B. Another approach for solving the task is to include r copies of a circuit C2 which, with suitable control inputs, can compute either problem A or problem B. Although this approach requires more complicated control circuitry, if r is significantly less than n+m and the size of C2 is significantly less than the sum of the sizes of C0 and C1 (which may occur if problems A and B have common subproblems X and Y which can use a shared circuit) then this approach will use less logic gates overall.
More generally, consider some complex computational task that breaks down into a heterogeneous set of subproblems which are distributed in different ways depending on the exact instance. Analogous reasoning suggests that the minimal circuit for solving this task will involve a structure akin to emulating a CPU: There are many instances of optimized circuits for low-level tasks, connected by a complex dependency graph. In any particular instance of the problem the relevant data dependencies are only a small subgraph of this graph, with connections decided by some control circuitry. A particular low-level circuit need not have a fixed purpose, but is used in different ways in different instances.
So, our circuit has a dependency tree of low-level tasks optimized for solving our problem in the worst-case. Now, at a starting stage of this hierarchy it has to process information about how a particular instance is separated into subproblems and generate the control information for solving this particular instance. The control information might need to be recomputed as new information about the structure of the instance are made manifest, and sometimes a part of the circuit may perform this recomputation without full access to potentially conflicting control information calculated in other parts.