However, the above is a nitpick. The real issue I have with your comment is that you seem to be criticizing me for not addressing the “capabilities come from not-SGD” threat scenario, when addressing that threat scenario is what this entire post is about.
That’s not what I’m criticizing you for. I elaborated a bit more here; my criticism is that this post sets up a straw version of the SLT argument to knock down, of which assuming it applies narrowly to “spiky” capability gains via SGD is one example.
The actual SLT argument is about a capabilities regime (human-level+), not a specific method for reaching it or how many OOM of optimization power are applied before or after.
The reasons to expect a phase shift in such a regime are because (by definition) a human-level AI is capable of reflection, deception, having insights that no other human has had before (as current humans sometimes do), etc.
Note, I’m not saying that setting up a strawman automatically invalidates all the rest of your claims, nor that you’re obligated to address every possible kind of criticism. But I am claiming that you aren’t passing the ITT of someone who accepts the original SLT argument (and probably its author).
But it does mean you can’t point to support for your claim that evolution provides no evidence for the Pope!SLT, as support for the claim that evolution provides no evidence for the Soares!SLT, and expect that to be convincing to anyone who doesn’t already accept that Pope!SLT == Soares!SLT.
Then my disagreement is that I disagree with the claim that the human regime is very special, or that there’s any reason to attach much specialness to human-level intelligence.
In essence, I agree with a weaker version of Quintin Pope’s comment here:
“AGI” is not the point at which the nascent “core of general intelligence” within the model “wakes up”, becomes an “I”, and starts planning to advance its own agenda. AGI is just shorthand for when we apply a sufficiently flexible and regularized function approximator to a dataset that covers a sufficiently wide range of useful behavioral patterns.
There are no “values”, “wants”, “hostility”, etc. outside of those encoded in the structure of the training data (and to a FAR lesser extent, the model/optimizer inductive biases). You can’t deduce an AGI’s behaviors from first principles without reference to that training data. If you don’t want an AGI capable and inclined to escape, don’t train it on data[1] that gives it the capabilities and inclination to escape.
Putting it another way, I suspect you’re suffering from the fallacy of generalizing from fiction, since fictional portrayals make it far more discontinuous and misaligned ala the Terminator than what happens in reality.
That’s not what I’m criticizing you for. I elaborated a bit more here; my criticism is that this post sets up a straw version of the SLT argument to knock down, of which assuming it applies narrowly to “spiky” capability gains via SGD is one example.
The actual SLT argument is about a capabilities regime (human-level+), not a specific method for reaching it or how many OOM of optimization power are applied before or after.
The reasons to expect a phase shift in such a regime are because (by definition) a human-level AI is capable of reflection, deception, having insights that no other human has had before (as current humans sometimes do), etc.
Note, I’m not saying that setting up a strawman automatically invalidates all the rest of your claims, nor that you’re obligated to address every possible kind of criticism. But I am claiming that you aren’t passing the ITT of someone who accepts the original SLT argument (and probably its author).
But it does mean you can’t point to support for your claim that evolution provides no evidence for the Pope!SLT, as support for the claim that evolution provides no evidence for the Soares!SLT, and expect that to be convincing to anyone who doesn’t already accept that Pope!SLT == Soares!SLT.
Then my disagreement is that I disagree with the claim that the human regime is very special, or that there’s any reason to attach much specialness to human-level intelligence.
In essence, I agree with a weaker version of Quintin Pope’s comment here:
https://forum.effectivealtruism.org/posts/zd5inbT4kYKivincm/?commentId=Zyz9j9vW8Ai5eZiFb
Putting it another way, I suspect you’re suffering from the fallacy of generalizing from fiction, since fictional portrayals make it far more discontinuous and misaligned ala the Terminator than what happens in reality.
Link below:
https://www.lesswrong.com/posts/rHBdcHGLJ7KvLJQPk/the-logical-fallacy-of-generalization-from-fictional