Daniel Paleka comments on AGI Timelines Are Mostly Not Strategically Relevant To Alignment

Daniel Paleka 23 Aug 2022 21:50 UTC
LW: 23 AF: 12
11
AF
I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to “will first dangerous models look like current models”, which I think matters more for research directions than what you allow in the second paragraph.

For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.
- johnswentworth 23 Aug 2022 22:14 UTC
  LW: 9 AF: 5
  4
  AF Parent
  Yup, I definitely agree that something like “will roughly the current architectures take off first” is a highly relevant question. Indeed, I think that gathering arguments and evidence relevant to that question (and the more general question of “what kind of architecture will take off first?” or “what properties will the first architecture to take off have?”) is the main way that work on timelines actually provides value.
  But it is a separate question from timelines, and I think most people trying to do timelines estimates would do more useful work if they instead explicitly focused on what architecture will take off first, or on what properties the first architecture to take off will have.
  - Thane Ruthenis 23 Aug 2022 22:47 UTC
    12 points
    10
    Parent
    I think that gathering arguments and evidence relevant to that question (and the more general question of “what kind of architecture will take off first?” or “what properties will the first architecture to take off have?”) is the main way that work on timelines actually provides value.
    Uh, I feel the need to off-topically note this is also the primary way to accidentally feed the AI industry capability insights. Those won’t even have the format of illegible arcane theoretical results, they’d be just straightforward easy-to-check suggestions on how to improve extant architecture. If they’re also backed by empirical evidence, that’s your flashy demos stand-in right there.
    Not saying it shouldn’t be done, but here be dragons.
  - Adam Jermyn 23 Aug 2022 23:41 UTC
    LW: 4 AF: 4
    0
    AF Parent
    I think timelines are a useful input to what architecture takes off first. If the timelines are short, I expect AGI to look like something like DL/Transformers/etc. If timelines are longer there might be time for not-yet-invented architectures to take off first. There can be multiple routes to AGI, and “how fast do we go down each route” informs which one happens first.
    - johnswentworth 24 Aug 2022 0:00 UTC
      LW: 4 AF: 4
      2
      AF Parent
      Correlationally this seems true, but causally it’s “which architecture takes off first?” which influences timelines, not vice versa.
      Though I could imagine a different argument which says that timeline until the current architecture takes off (assuming it’s not superseded by some other architecture) is a key causal input to “which architecture takes off first?”. That argument I’d probably buy.
      - Adam Jermyn 24 Aug 2022 0:19 UTC
        LW: 5 AF: 5
        0
        AF Parent
        I definitely endorse the argument you’d buy, but I also endorse a broader one. My claim is that there is information which goes into timelines which is not just downstream of which architecture I think gets there first.
        For example, if you told me that humanity loses the ability to make chips “tomorrow until forever” my timeline gets a lot longer in a way that isn’t just downstream of which architecture I think is going to happen first. That then changes which architectures I think are going to get there first (strongly away from DL) primarily by making my estimated timeline long enough for capabilities folks to discover some theoretically-more-efficient but far-from-implementable-today architectures.
  - Zach Stein-Perlman 23 Aug 2022 22:27 UTC
    3 points
    3
    Parent
    I think that gathering arguments and evidence relevant to that question . . . is the main way that work on timelines actually provides value.
    I think policy people think timelines work is quite decision-relevant for them; I believe work on timelines mainly/largely provides value by informing their prioritization.
    Relatedly, I sense some readers of this post will unintentionally do a motte-and-bailey with a motte of “timelines are mostly not strategically relevant to alignment research” and a bailey of “timelines are mostly not strategically relevant.”
    - johnswentworth 23 Aug 2022 22:31 UTC
      3 points
      0
      Parent
      What are the main strategic decisions policy people face right now, and how are timelines relevant to those decisions?
      - Steven Byrnes 24 Aug 2022 2:29 UTC
        9 points
        5
        Parent
        See Carl Shulman’s comments on https://forum.effectivealtruism.org/posts/SEqJoRL5Y8cypFasr/why-agi-timeline-research-discourse-might-be-overrated
        johnswentworth 24 Aug 2022 18:37 UTC
        2 points
        0
        Parent
        Things like “buy all the chips/chip companies” still seem like they only depend on timelines on a very short timescale, like <5 years. Buy all the chips, and the chip companies will (1) raise prices (which I’d guess happens on a timescale of months) and (2) increase production (which I’d guess happens on a timescale of ~2 years). Buy the chip companies, and new companies will enter the market on a somewhat slower timescale, but I’d still guess it’s on the order of ~5 years. (Yes, I’ve heard people argue that replacing the full stack of Taiwan semi could take decades, but I don’t expect that the full stack would actually be bought in a “buy the chip companies” scenario, and “decades” seems unrealistically long anyway.)
        None of this sounds like it depends on the difference between e.g. 30 years vs 100 years, though the most ambitious versions of such strategies could maybe be slightly more appealing on 10 year vs 30 year timelines. But really, we’d have to get down to ~5 before something like “buy the chip companies” starts to sound like a sufficiently clearly good idea that I’d expect anyone to seriously consider it.