In that quote, he only rules out a large class of modern approaches to alignment, which again is nothing new; he’s been very vocal about how doomed he thinks alignment is in this paradigm.
I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.
That claim seems to be advanced due to… there not being enough similarities between ANNs and human brains—that without enough similarity in mechanisms wich were selected for by evolution, you simply can’t get the AI to generalize in the mentioned human-like way. Not as a matter of the AI’s substrate, but as a matter of the AI’s policy not generalizing like that.
I think this is a dubious claim, and it’s made based off of analogies to evolution / some unknown importance of having evolution-selected mechanisms which guide value formation (and not SGD-based mechanisms).
From the Alexander/Yudkowsky debate:
[Alexander][14:41]
Okay, then let me try to directly resolve my confusion. My current understanding is something like—in both humans and AIs, you have a blob of compute with certain structural parameters, and then you feed it training data. On this model, we’ve screened off evolution, the size of the genome, etc—all of that is going into the “with certain structural parameters” part of the blob of compute. So could an AI engineer create an AI blob of compute the same size as the brain, with its same structural parameters, feed it the same training data, and get the same result (“don’t steal” rather than “don’t get caught”)?
[Yudkowsky][14:42]
The answer to that seems sufficiently obviously “no” that I want to check whether you also think the answer is obviously no, but want to hear my answer, or if the answer is not obviously “no” to you.
[Alexander][14:43]
Then I’m missing something, I expected the answer to be yes, maybe even tautologically (if it’s the same structural parameters and the same training data, what’s the difference?)
[Yudkowsky][14:46]
Maybe I’m failing to have understood the question. Evolution got human brains by evaluating increasingly large blobs of compute against a complicated environment containing other blobs of compute, got in each case a differential replication score, and millions of generations later you have humans with 7.5MB of evolution-learned data doing runtime learning on some terabytes of runtime data, using their whole-brain impressive learning algorithms which learn faster than evolution or gradient descent.
Your question sounded like “Well, can we take one blob of compute the size of a human brain, and expose it to what a human sees in their lifetime, and do gradient descent on that, and get a human?” and the answer is “That dataset ain’t even formatted right for gradient descent.”
There’s some assertion like “no, there’s not a way to get an ANN, even if incorporating structural parameters and information encoded in human genome, to actually unfold into a mind which has human-like values (like ‘don’t steal’).” (And maybe Eliezer comes and says “no that’s not what I mean”, but, man, I sure don’t know what he does mean, then.)
Here’s some more evidence along those lines:
[Yudkowsky][14:08]
I mean, the evolutionary builtin part is not “humans have morals” but “humans have an internal language in which your Nice Morality, among other things, can potentially be written”...
Humans, arguably, do have an imperfect unless-I-get-caught term, which is manifested in children testing what they can get away with? Maybe if nothing unpleasant ever happens to them when they’re bad, the innate programming language concludes that this organism is in a spoiled aristocrat environment and should behave accordingly as an adult? But I am not an expert on this form of child developmental psychology since it unfortunately bears no relevance to my work of AI alignment.
[Alexander][14:11]
Do you feel like you understand very much about what evolutionary builtins are in a neural network sense? EG if you wanted to make an AI with “evolutionary builtins”, would you have any idea how to do it?
[Yudkowsky][14:13]
Well, for one thing, they happen when you’re doing sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, not when you’re doing gradient descent relative to a loss function on much larger neural networks.
Again, why is this true? This is an argument that should be engaging in technical questions about inductive biases, but instead seems to wave at (my words) “the original way we got property P was by sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, and good luck trying to get it otherwise.”
Hopefully this helps clarify what I’m trying to critique?
I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.
Ok, I don’t disagree with this. I certainly didn’t develop a gears-level understanding of why [building a brain-like thing with gradient descent on giant matrices] is doomed after reading the 2021 conversations. But that doesn’t seem very informative either way; I didn’t spend that much time trying to grok his arguments.
Responding to part of your comment:
I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.
That claim seems to be advanced due to… there not being enough similarities between ANNs and human brains—that without enough similarity in mechanisms wich were selected for by evolution, you simply can’t get the AI to generalize in the mentioned human-like way. Not as a matter of the AI’s substrate, but as a matter of the AI’s policy not generalizing like that.
I think this is a dubious claim, and it’s made based off of analogies to evolution / some unknown importance of having evolution-selected mechanisms which guide value formation (and not SGD-based mechanisms).
From the Alexander/Yudkowsky debate:
There’s some assertion like “no, there’s not a way to get an ANN, even if incorporating structural parameters and information encoded in human genome, to actually unfold into a mind which has human-like values (like ‘don’t steal’).” (And maybe Eliezer comes and says “no that’s not what I mean”, but, man, I sure don’t know what he does mean, then.)
Here’s some more evidence along those lines:
Again, why is this true? This is an argument that should be engaging in technical questions about inductive biases, but instead seems to wave at (my words) “the original way we got property P was by sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, and good luck trying to get it otherwise.”
Hopefully this helps clarify what I’m trying to critique?
Ok, I don’t disagree with this. I certainly didn’t develop a gears-level understanding of why [building a brain-like thing with gradient descent on giant matrices] is doomed after reading the 2021 conversations. But that doesn’t seem very informative either way; I didn’t spend that much time trying to grok his arguments.