eternal_neophyte comments on “The Bitter Lesson”, an article about compute vs human knowledge in AI

eternal_neophyte 24 Jun 2019 20:17 UTC
1 point
So is brainf*ck, & like NNs bf programs are simple in the sense of being trivial to enumerate and hence search through. Defining a search space for a complex domain is equivalent to defining a subspace of BF programs or NNs which could and probably does have a highly convoluted, warped separating surface. In the context of deep learning your ability to approximate that surface is limited by your ability to encode it as a loss function.
- dxu 24 Jun 2019 20:26 UTC
  2 points
  Parent
  Defining a search space for a complex domain is equivalent to defining a subspace of BF programs or NNs which could and probably does have a highly convoluted, warped separating surface.
  The task of locating points in such subspaces is what optimization algorithms (including search algorithms) are meant to address. The goal isn’t to “define” your search space in such a way that only useful solutions to the problem are included (if you could do that, you wouldn’t have a problem in the first place!); the point is to have a search space general enough to encompass all possible solutions, and then converge on useful solutions using some kind of optimization.
  EDIT: There is an analogue in machine learning to the kind of problem you seemed to be gesturing at when you mentioned “more complex domains”—namely, the problem of how to choose a good objective function to optimize. It’s true that for more abstract domains, it’s harder to define a criterion (or set of criteria) that we want our optimizer to satisfy, and this is (to a first approximation) a large part of the AI alignment problem. But there’s a significant difference between choosing an objective function and “defining your search space” (whatever that means), and the latter concept doesn’t have much use as far as I can see.
  - eternal_neophyte 24 Jun 2019 21:11 UTC
    1 point
    Parent
    for more abstract domains, it’s harder to define a criterion (or set of criteria) that we want our optimizer to satisfy
    Yes.
    But there’s a significant difference between choosing an objective function and “defining your search space” (whatever that means), and the latter concept doesn’t have much use as far as I can see.
    If you don’t know what it means, how do you know that it’s significantly different from choosing an “objective function” and why do you feel comfortable in making a judgment about whether or not the concept is useful?
    In any case, to define a search space is to provide a spanning set of production rules which allow you to derive all elements in the target set. For example, Peano arithmetic provides a spanning set of rules for arithmetic computations, and hence define ( in one particular way ) the set of computations a search algorithm can search through in order to find arithmetic derivations satisfying whatever property. Similarly the rules of chess define the search-space for valid board-state sequences in games of chess. For neural networks, it could be defining a set of topologies, or a set of composition rules for layering networks together; and in a looser sense a loss function induces “search space” on network weights, insofar as it practically excludes certain regions of the error surface from the region of space any training run is ever likely to explore.
    - dxu 24 Jun 2019 22:16 UTC
      4 points
      Parent
      If you don’t know what it means, how do you know that it’s significantly different from choosing an “objective function” and why do you feel comfortable in making a judgment about whether or not the concept is useful?
      Because words tend to mean things, and when you use the phrase “define a search space”, the typical meaning of those words does not bring to mind the same concept as the phrase “choose an objective function”. (And the concept it does bring to mind is not very useful, as I described in the grandparent comment.)
      Now, perhaps your contention is that these two phrases ought to bring to mind the same concept. I’d argue that this is unrealistic, but fine; it serves no purpose to argue whether I think you used the right phrase when you did, in fact, clarify what you meant later on:
      in a looser sense a loss function induces “search space” on network weights, insofar as it practically excludes certain regions of the error surface from the region of space any training run is ever likely to explore.
      All right, I’m happy to accept this as an example of defining (or “inducing”) a search space, though I would maintain that it’s not a very obvious example (and I think you would agree, considering that you prefixed it with “in a looser sense”). But then it’s not at all obvious what your original objection to the article is! To quote your initial comment:
      It only makes sense to talk about “search” in the context of a *search space*; and all extent search algorithms / learning methods involve searching through a comparatively simple space of structures, such as the space of weights on a deep neural network or the space of board-states in Go and Chess. As we move on to attack more complex domains, such as abstract mathematics, or philosophy or procedurally generated music or literature which stands comparison to the best products of human genius, the problem of even /defining/ the search space in which you intend to leverage search-based techniques becomes massively involved.
      Taken at face value, this seems to be an argument that the original article overstates the importance of search-based techniques (and potentially other optimization techniques as well), because there are some problems to which search is inapplicable, owing to the lack of a well-defined search space. This is a meaningful objection to make, even though I happen to think it’s untrue (for reasons described in the grandparent comment).
      But if by “lack of a well-defined search space” you actually mean “the lack of a good objective function”, then it’s not clear to me where you think the article errs. Not having a good objective function for some domains certainly presents an obstacle, but this is not an issue with search-based optimization techniques; it’s simply a consequence of the fact that you’re dealing with an ill-posed problem. Since the article makes no claims about ill-posed problems, this does not seem like a salient objection.
      - eternal_neophyte 25 Jun 2019 8:56 UTC
        1 point
        Parent
        it’s not a very obvious example
        I honestly regret that I didn’t make it as clear as I possibly could the first time around, but expressing original, partially developed ideas is not the same thing as reciting facts about well-understood concepts that have been explained and re-explained many times. Flippancy is needlessly hostile.
        there are some problems to which search is inapplicable, owing to the lack of a well-defined search space
        If not wholly inapplicable, then not performant, yes. Though the problem isn’t that the search-space is not defined at all, but that the definitions which are easiest to give are also the least helpful ( to return to the previous example, in the Platonic real there exists a brainf*ck program that implements an optimal map from symptoms to diagnoses—good luck finding it ). As the original author points out, there’s a tradeoff between knowledge and the need for brute-force. It may be that you can have an agent synthesize knowledge by consolidating the results of a brute-force search into a formal representation which an agent can then use to tune or reformulate the search-space previously given to fit some particular purposes; but this is quite a level of sophistication above pure brute force.
        Edit:
        this is not an issue with search-based optimization techniques; it’s simply a consequence of the fact that you’re dealing with an ill-posed problem
        If the problems of literature or philosophy were not in some sense “ill posed” they would also be dead subjects. The ‘general’ part in AGI would seem to imply some capacity for dealing with vague, partially defined ideas in useful ways.