Nora Belrose comments on Counting arguments provide no evidence for AI doom

Nora Belrose 28 Feb 2024 1:20 UTC
25 points
2
I’m pleasantly surprised that you think the post is “pretty decent.”
I’m curious which parts of the Goal Realism section you find “philosophically confused,” because we are trying to correct what we consider to be deep philosophical confusion fairly pervasive on LessWrong.
I recall hearing your compression argument for general-purpose search a long time ago, and it honestly seems pretty confused / clearly wrong to me. I would like to see a much more rigorous definition of “search” and why search would actually be “compressive” in the relevant sense for NN inductive biases. My current take is something like “a lot of the references to internal search on LW are just incoherent” and to the extent you can make them coherent, NNs are either actively biased away from search, or they are only biased toward “search” in ways that are totally benign.
More generally, I’m quite skeptical of the jump from any mechanistic notion of search, and the kind of grabby consequentialism that people tend to be worried about. I suspect there’s a double dissociation between these things, where “mechanistic search” is almost always benign, and grabby consequentialism need not be backed by mechanistic search.
- johnswentworth 28 Feb 2024 2:27 UTC
  30 points
  12
  Parent
  I would like to see a much more rigorous definition of “search” and why search would actually be “compressive” in the relevant sense for NN inductive biases. My current take is something like “a lot of the references to internal search on LW are just incoherent” and to the extent you can make them coherent, NNs are either actively biased away from search, or they are only biased toward “search” in ways that are totally benign.
  More generally, I’m quite skeptical of the jump from any mechanistic notion of search, and the kind of grabby consequentialism that people tend to be worried about. I suspect there’s a double dissociation between these things, where “mechanistic search” is almost always benign, and grabby consequentialism need not be backed by mechanistic search.
  Some notes on this:
  - I don’t think general-purpose search is sufficiently well-understood yet to give a rigorous mechanistic definition. (Well, unless one just gives a very wrong definition.)
  - Likewise, I don’t think we understand either search or NN biases well enough yet to make a formal compression argument. Indeed, that sounds like a roughly-agent-foundations-complete problem.
  - I’m pretty skeptical that internal general-purpose search is compressive in current architectures. (And this is one reason why I expect most AI x-risk to come from importantly-different future architectures.) Low confidence, though.
    Also, current architectures do have at least some “externalized” general-purpose search capabilities, insofar as they can mimic the “unrolled” search process of a human or group of humans thinking out loud. That general-purpose search process is basically AgentGPT. Notably, it doesn’t work very well to date.
  - Insofar as I need a working not-very-formal definition of general-purpose search, I usually use a behavioral definition: a system which can take in a representation of a problem in some fairly-broad class of problems (typically in a ~fixed environment), and solve it.
  - The argument that a system which satisfies that behavioral definition will tend to also have an “explicit search-architecture”, in some sense, comes from the recursive nature of problems. E.g. humans solve large novel problems by breaking them into subproblems, and then doing their general-purpose search/problem-solving on the subproblems; that’s an explicit search architecture.
  - I definitely agree that grabby consequentialism need not be backed by mechanistic search. More skeptical of the claim mechanistic search is usually benign, at least if by “mechanistic search” we mean general-purpose search (though I’d agree with a version of this which talks about a weaker notion of “search”).
  Also, one maybe relevant deeper point, since you seem familiar with some of the philosophical literature: IIUC the most popular way philosophers ground semantics is in the role played by some symbol/signal in the evolutionary environment. I view this approach as a sort of placeholder: it’s definitely not the “right” way to ground semantics, but philosophy as a field is using it as a stand-in until people work out better models of grounding (regardless of whether the philosophers themselves know that they’re doing so). This is potentially relevant to the “representation of a problem” part of general-purpose search.
  I’m curious which parts of the Goal Realism section you find “philosophically confused,” because we are trying to correct what we consider to be deep philosophical confusion fairly pervasive on LessWrong.
  (I’ll briefly comment on each section, feel free to double-click.)
  Against Goal Realism: Huemer… indeed seems confused about all sorts of things, and I wouldn’t consider either the “goal realism” or “goal reductionism” picture solid grounds for use of an indifference principle (not sure if we agree on that?). Separately, “reductionism as a general philosophical thesis” does not imply the thing you call “goal reductionism”—for instance one could reduce “goals” to some internal mechanistic thing, rather than thinking about “goals” behaviorally, and that would be just as valid for the general philosophical/scientific project of reductionism. (Not that I necessarily think that’s the right way to do it.)
  Goal Slots Are Expensive: just because it’s “generally better to train a whole network end-to-end for a particular task than to compose it out of separately trained, reusable modules” doesn’t mean the end-to-end trained system will turn out non-modular. Biological organisms were trained end-to-end by evolution, yet they ended up very modular.
  Inner Goals Would Be Irrelevant: I think the point this section was trying to make is something I’d classify as a pointer problem? I.e. the internal symbolic “goal” does not necessarily neatly correspond to anything in the environment at all. If that was the point, then I’m basically on-board, though I would mention that I’d expect evolution/SGD/cultural evolution/within-lifetime learning/etc to drive the internal symbolic “goal” to roughly match natural structures in the world. (Where “natural structures” cashes out in terms of natural latents, but that’s a whole other conversation.)
  Goal Realism Is Anti-Darwinian: Fodor obviously is deeply confused, but I think you’ve misdiagnosed what he’s confused about. “The physical world has no room for goals with precise contents” is somewhere between wrong and a nonsequitur, depending on how we interpret the claim. “The problem faced by evolution and by SGD is much easier than this: producing systems that behave the right way in all scenarios they are likely to encounter” is correct, but very incomplete as a response to Fodor.
  Goal Reductionism Is Powerful: While most of this section sounds basically-correct as written, the last few sentences seem to be basically arguing for behaviorism for LLMs. There are good reasons behaviorism was abandoned in psychology, and I expect those reasons carry over to LLMs.
  - Nora Belrose 28 Feb 2024 2:45 UTC
    13 points
    −1
    Parent
    Some incomplete brief replies:
    Huemer… indeed seems confused about all sorts of things
    Sure, I was just searching for professional philosopher takes on the indifference principle, and that chapter in Paradox Lost was among the first things I found.
    Separately, “reductionism as a general philosophical thesis” does not imply the thing you call “goal reductionism”
    Did you see the footnote I wrote on this? I give a further argument for it.
    doesn’t mean the end-to-end trained system will turn out non-modular.
    I looked into modularity for a bit 1.5 years ago and concluded that the concept is way too vague and seemed useless for alignment or interpretability purposes. If you have a good definition I’m open to hearing it.
    There are good reasons behaviorism was abandoned in psychology, and I expect those reasons carry over to LLMs.
    To me it looks like people abandoned behaviorism for pretty bad reasons. The ongoing replication crisis in psychology does not inspire confidence in that field’s ability to correctly diagnose bullshit.
    That said, I don’t think my views depend on behaviorism being the best framework for human psychology. The case for behaviorism in the AI case is much, much stronger: the equations for an algorithm like REINFORCE or DPO directly push up the probability of some actions and push down the probability of others.
    - johnswentworth 28 Feb 2024 3:41 UTC
      12 points
      3
      Parent
      Did you see the footnote I wrote on this? I give a further argument for it.
      Ah yeah, I indeed missed that the first time through. I’d still say I don’t buy it, but that’s a more complicated discussion, and it is at least a decent argument.
      I looked into modularity for a bit 1.5 years ago and concluded that the concept is way too vague and seemed useless for alignment or interpretability purposes. If you have a good definition I’m open to hearing it.
      This is another place where I’d say we don’t understand it well enough to give a good formal definition or operationalization yet.
      Though I’d note here, and also above w.r.t. search, that “we don’t know how to give a good formal definition yet” is very different from “there is no good formal definition” or “the underlying intuitive concept is confused” or “we can’t effectively study the concept at all” or “arguments which rely on this concept are necessarily wrong/uninformative”. Every scientific field was pre-formal/pre-paradigmatic once.
      To me it looks like people abandoned behaviorism for pretty bad reasons. The ongoing replication crisis in psychology does not inspire confidence in that field’s ability to correctly diagnose bullshit.
      That said, I don’t think my views depend on behaviorism being the best framework for human psychology. The case for behaviorism in the AI case is much, much stronger: the equations for an algorithm like REINFORCE or DPO directly push up the probability of some actions and push down the probability of others.
      Man, that is one hell of a bullet to bite. Much kudos for intellectual bravery and chutzpah!
      That might be a fun topic for a longer discussion at some point, though not right now.