I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to “will first dangerous models look like current models”, which I think matters more for research directions than what you allow in the second paragraph.
For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.
Yup, I definitely agree that something like “will roughly the current architectures take off first” is a highly relevant question. Indeed, I think that gathering arguments and evidence relevant to that question (and the more general question of “what kind of architecture will take off first?” or “what properties will the first architecture to take off have?”) is the main way that work on timelines actually provides value.
But it is a separate question from timelines, and I think most people trying to do timelines estimates would do more useful work if they instead explicitly focused on what architecture will take off first, or on what properties the first architecture to take off will have.
I think that gathering arguments and evidence relevant to that question (and the more general question of “what kind of architecture will take off first?” or “what properties will the first architecture to take off have?”) is the main way that work on timelines actually provides value.
Uh, I feel the need to off-topically note this is also the primary way to accidentally feed the AI industry capability insights. Those won’t even have the format of illegible arcane theoretical results, they’d be just straightforward easy-to-check suggestions on how to improve extant architecture. If they’re also backed by empirical evidence, that’s your flashy demos stand-in right there.
Not saying it shouldn’t be done, but here be dragons.
I think timelines are a useful input to what architecture takes off first. If the timelines are short, I expect AGI to look like something like DL/Transformers/etc. If timelines are longer there might be time for not-yet-invented architectures to take off first. There can be multiple routes to AGI, and “how fast do we go down each route” informs which one happens first.
Correlationally this seems true, but causally it’s “which architecture takes off first?” which influences timelines, not vice versa.
Though I could imagine a different argument which says that timeline until the current architecture takes off (assuming it’s not superseded by some other architecture) is a key causal input to “which architecture takes off first?”. That argument I’d probably buy.
I definitely endorse the argument you’d buy, but I also endorse a broader one. My claim is that there is information which goes into timelines which is not just downstream of which architecture I think gets there first.
For example, if you told me that humanity loses the ability to make chips “tomorrow until forever” my timeline gets a lot longer in a way that isn’t just downstream of which architecture I think is going to happen first. That then changes which architectures I think are going to get there first (strongly away from DL) primarily by making my estimated timeline long enough for capabilities folks to discover some theoretically-more-efficient but far-from-implementable-today architectures.
I think that gathering arguments and evidence relevant to that question . . . is the main way that work on timelines actually provides value.
I think policy people think timelines work is quite decision-relevant for them; I believe work on timelines mainly/largely provides value by informing their prioritization.
Relatedly, I sense some readers of this post will unintentionally do a motte-and-bailey with a motte of “timelines are mostly not strategically relevant to alignment research” and a bailey of “timelines are mostly not strategically relevant.”
Things like “buy all the chips/chip companies” still seem like they only depend on timelines on a very short timescale, like <5 years. Buy all the chips, and the chip companies will (1) raise prices (which I’d guess happens on a timescale of months) and (2) increase production (which I’d guess happens on a timescale of ~2 years). Buy the chip companies, and new companies will enter the market on a somewhat slower timescale, but I’d still guess it’s on the order of ~5 years. (Yes, I’ve heard people argue that replacing the full stack of Taiwan semi could take decades, but I don’t expect that the full stack would actually be bought in a “buy the chip companies” scenario, and “decades” seems unrealistically long anyway.)
None of this sounds like it depends on the difference between e.g. 30 years vs 100 years, though the most ambitious versions of such strategies could maybe be slightly more appealing on 10 year vs 30 year timelines. But really, we’d have to get down to ~5 before something like “buy the chip companies” starts to sound like a sufficiently clearly good idea that I’d expect anyone to seriously consider it.
I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to “will first dangerous models look like current models”, which I think matters more for research directions than what you allow in the second paragraph.
For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.
Yup, I definitely agree that something like “will roughly the current architectures take off first” is a highly relevant question. Indeed, I think that gathering arguments and evidence relevant to that question (and the more general question of “what kind of architecture will take off first?” or “what properties will the first architecture to take off have?”) is the main way that work on timelines actually provides value.
But it is a separate question from timelines, and I think most people trying to do timelines estimates would do more useful work if they instead explicitly focused on what architecture will take off first, or on what properties the first architecture to take off will have.
Uh, I feel the need to off-topically note this is also the primary way to accidentally feed the AI industry capability insights. Those won’t even have the format of illegible arcane theoretical results, they’d be just straightforward easy-to-check suggestions on how to improve extant architecture. If they’re also backed by empirical evidence, that’s your flashy demos stand-in right there.
Not saying it shouldn’t be done, but here be dragons.
I think timelines are a useful input to what architecture takes off first. If the timelines are short, I expect AGI to look like something like DL/Transformers/etc. If timelines are longer there might be time for not-yet-invented architectures to take off first. There can be multiple routes to AGI, and “how fast do we go down each route” informs which one happens first.
Correlationally this seems true, but causally it’s “which architecture takes off first?” which influences timelines, not vice versa.
Though I could imagine a different argument which says that timeline until the current architecture takes off (assuming it’s not superseded by some other architecture) is a key causal input to “which architecture takes off first?”. That argument I’d probably buy.
I definitely endorse the argument you’d buy, but I also endorse a broader one. My claim is that there is information which goes into timelines which is not just downstream of which architecture I think gets there first.
For example, if you told me that humanity loses the ability to make chips “tomorrow until forever” my timeline gets a lot longer in a way that isn’t just downstream of which architecture I think is going to happen first. That then changes which architectures I think are going to get there first (strongly away from DL) primarily by making my estimated timeline long enough for capabilities folks to discover some theoretically-more-efficient but far-from-implementable-today architectures.
I think policy people think timelines work is quite decision-relevant for them; I believe work on timelines mainly/largely provides value by informing their prioritization.
Relatedly, I sense some readers of this post will unintentionally do a motte-and-bailey with a motte of “timelines are mostly not strategically relevant to alignment research” and a bailey of “timelines are mostly not strategically relevant.”
What are the main strategic decisions policy people face right now, and how are timelines relevant to those decisions?
See Carl Shulman’s comments on https://forum.effectivealtruism.org/posts/SEqJoRL5Y8cypFasr/why-agi-timeline-research-discourse-might-be-overrated
Things like “buy all the chips/chip companies” still seem like they only depend on timelines on a very short timescale, like <5 years. Buy all the chips, and the chip companies will (1) raise prices (which I’d guess happens on a timescale of months) and (2) increase production (which I’d guess happens on a timescale of ~2 years). Buy the chip companies, and new companies will enter the market on a somewhat slower timescale, but I’d still guess it’s on the order of ~5 years. (Yes, I’ve heard people argue that replacing the full stack of Taiwan semi could take decades, but I don’t expect that the full stack would actually be bought in a “buy the chip companies” scenario, and “decades” seems unrealistically long anyway.)
None of this sounds like it depends on the difference between e.g. 30 years vs 100 years, though the most ambitious versions of such strategies could maybe be slightly more appealing on 10 year vs 30 year timelines. But really, we’d have to get down to ~5 before something like “buy the chip companies” starts to sound like a sufficiently clearly good idea that I’d expect anyone to seriously consider it.