Three different claims: (1) doom is likely, (2) doom is likely if AI development proceeds with modern methods, (3) if AGI works as a result of being developed with modern methods, then doom is likely. You seem to be gesturing at antiprediction of (1). I think the claim where the arguments are most legible is (3), and the most decision relevant claim is (2).
I think the standard logic goes as follows: GPT-style models are not necessarily what will reach the extinction-level threat, but they show that such a threat is not far away, partially because interpretability is far behind capabilities. If by “modern methods” you mean generative pre-trained transformers or similar, not some totally new future architecture, then I do not think anyone claims (2), let alone (3)? I think what people claim is that just scaling gets us way further than was expected, the amount and power of GPUs can be a bottleneck for capabilities all the way to superintelligence, regardless of the underlying architecture, and therefore (1).
By “modern methods” I meant roughly what Ape in the coat noted, more specifically ~end-to-end DNNs (possibly some model-based RL setup, possibly with some of the nets bootstrapped from pre-trained language models, or trained on data or in situations LLMs generate).
As opposed to cognitive architectures that are more like programs in the classical sense, even if they are using DNNs for some of what they do, like CoEm. Or DNNs iteratively and automatically “decompiled” into explicit and modular giant but locally human-understandable program code using AI-assisted interpretability tools (in a way that forces change of behavior and requires retraining of remaining black box DNN parts to maintain capability), taking human-understandable features forming in DNNs as inspiration to write code that more carefully computes them. I’m also guessing alignment-themed decision theory has a use in shaping something like this (whether the top-level program architecture or synthetic data that trains the DNNs), this motivates my own efforts. Or something else entirely; this paragraph is non-examples for “modern methods” in the sense I intended, the kind of stuff that could use some giant training run pause to have a chance to grow up.
Three different claims: (1) doom is likely, (2) doom is likely if AI development proceeds with modern methods, (3) if AGI works as a result of being developed with modern methods, then doom is likely. You seem to be gesturing at antiprediction of (1). I think the claim where the arguments are most legible is (3), and the most decision relevant claim is (2).
Thanks, this is helpful, though not 100% clear.
I think the standard logic goes as follows: GPT-style models are not necessarily what will reach the extinction-level threat, but they show that such a threat is not far away, partially because interpretability is far behind capabilities. If by “modern methods” you mean generative pre-trained transformers or similar, not some totally new future architecture, then I do not think anyone claims (2), let alone (3)? I think what people claim is that just scaling gets us way further than was expected, the amount and power of GPUs can be a bottleneck for capabilities all the way to superintelligence, regardless of the underlying architecture, and therefore (1).
It’s less about GPT-style in particular and more “gradient decent producing black boxes”-style in general.
The claim goes that if we develop AGI this way we are doomed. And we are on the track to do it.
By “modern methods” I meant roughly what Ape in the coat noted, more specifically ~end-to-end DNNs (possibly some model-based RL setup, possibly with some of the nets bootstrapped from pre-trained language models, or trained on data or in situations LLMs generate).
As opposed to cognitive architectures that are more like programs in the classical sense, even if they are using DNNs for some of what they do, like CoEm. Or DNNs iteratively and automatically “decompiled” into explicit and modular giant but locally human-understandable program code using AI-assisted interpretability tools (in a way that forces change of behavior and requires retraining of remaining black box DNN parts to maintain capability), taking human-understandable features forming in DNNs as inspiration to write code that more carefully computes them. I’m also guessing alignment-themed decision theory has a use in shaping something like this (whether the top-level program architecture or synthetic data that trains the DNNs), this motivates my own efforts. Or something else entirely; this paragraph is non-examples for “modern methods” in the sense I intended, the kind of stuff that could use some giant training run pause to have a chance to grow up.