Rafael Harth comments on My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

Rafael Harth 21 Mar 2023 10:57 UTC

LW: 16 AF: 8

I also don’t really get your position. You say that,

[Eliezer] confidently dismisses ANNs

but you haven’t shown this!

In Surface Analogies and Deep Causes, I read him as saying that neural networks don’t automatically yield intelligence just because they share surface similarities with the brain. This is clearly true; at the very least, using token-prediction (which is a task for which (a) lots of training data exist and (b) lots of competence in many different domains is helpful) is a second requirement. If you take the network of GPT-4 and trained it to play chess instead, you won’t get something with cross-domain competence.
In Failure by Analogy he makes a very similar abstract point—and wrt to neural networks in particular, he says that the surface similarity to the brain is a bad reason to be confident in them. This also seems true. Do you really think that neural networks work because they are similar to brains on the surface?

You also said,

The important part is the last part. It’s invalid. Finding a design X which exhibits property P, doesn’t mean that for design Y to exhibit property P, Y must be very similar to X.

But Eliezer says this too in the post you linked! (Failure by Analogy). His example of airplanes not flapping is an example where the design that worked was less close to the biological thing. So clearly the point isn’t that X has to be similar to Y; the point is that reasoning from analogy doesn’t tell you this either way. (I kinda feel like you already got this, but then I don’t understand what point you are trying to make.)

Which is actually consistent with thinking that large ANNs will get you to general intelligence. You can both hold that “X is true” and “almost everyone who thinks X is true does so for poor reasons”. I’m not saying Eliezer did predict this, but nothing I’ve read proves that he didn’t.

Also—and this is another thing—the fact that he didn’t publicly make the prediction “ANNs will lead to AGI” is only weak evidence that he didn’t privately think it because this is exactly the kind of prediction you would shut up about. One thing he’s been very vocal on is that the current paradigm is bad for safety, so if he was bullish about the potential of that paradigm, he’d want to keep that to himself.

Didn’t he? He at least confidently rules out a very large class of modern approaches.

Relevant quote:

because nothing you do with a loss function and gradient descent over 100 quadrillion neurons, will result in an AI coming out the other end which looks like an evolved human with 7.5MB of brain-wiring information and a childhood.

In that quote, he only rules out a large class of modern approaches to alignment, which again is nothing new; he’s been very vocal about how doomed he thinks alignment is in this paradigm.

Something Eliezer does say which is relevant (in the post on Ajeya’s biology anchors model) is

Or, more likely, it’s not MoE [mixture of experts] that forms the next little trend. But there is going to be something, especially if we’re sitting around waiting until 2050. Three decades is enough time for some big paradigm shifts in an intensively researched field. Maybe we’d end up using neural net tech very similar to today’s tech if the world ends in 2025, but in that case, of course, your prediction must have failed somewhere else.

So here he’s saying that there is a more effective paradigm than large neural nets, and we’d get there if we don’t have AGI in 30 years. So this is genuinely a kind of bearishness on ANNs, but not one that precludes them giving us AGI.

TurnTrout 21 Mar 2023 21:40 UTC

LW: 6 AF: 3

AF Parent

Responding to part of your comment:

In that quote, he only rules out a large class of modern approaches to alignment, which again is nothing new; he’s been very vocal about how doomed he thinks alignment is in this paradigm.

I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.

That claim seems to be advanced due to… there not being enough similarities between ANNs and human brains—that without enough similarity in mechanisms wich were selected for by evolution, you simply can’t get the AI to generalize in the mentioned human-like way. Not as a matter of the AI’s substrate, but as a matter of the AI’s policy not generalizing like that.

I think this is a dubious claim, and it’s made based off of analogies to evolution / some unknown importance of having evolution-selected mechanisms which guide value formation (and not SGD-based mechanisms).

From the Alexander/Yudkowsky debate:

[Alexander][14:41]
Okay, then let me try to directly resolve my confusion. My current understanding is something like—in both humans and AIs, you have a blob of compute with certain structural parameters, and then you feed it training data. On this model, we’ve screened off evolution, the size of the genome, etc—all of that is going into the “with certain structural parameters” part of the blob of compute. So could an AI engineer create an AI blob of compute the same size as the brain, with its same structural parameters, feed it the same training data, and get the same result (“don’t steal” rather than “don’t get caught”)?
[Yudkowsky][14:42]
The answer to that seems sufficiently obviously “no” that I want to check whether you also think the answer is obviously no, but want to hear my answer, or if the answer is not obviously “no” to you.
[Alexander][14:43]
Then I’m missing something, I expected the answer to be yes, maybe even tautologically (if it’s the same structural parameters and the same training data, what’s the difference?)
[Yudkowsky][14:46]
Maybe I’m failing to have understood the question. Evolution got human brains by evaluating increasingly large blobs of compute against a complicated environment containing other blobs of compute, got in each case a differential replication score, and millions of generations later you have humans with 7.5MB of evolution-learned data doing runtime learning on some terabytes of runtime data, using their whole-brain impressive learning algorithms which learn faster than evolution or gradient descent.
Your question sounded like “Well, can we take one blob of compute the size of a human brain, and expose it to what a human sees in their lifetime, and do gradient descent on that, and get a human?” and the answer is “That dataset ain’t even formatted right for gradient descent.”

There’s some assertion like “no, there’s not a way to get an ANN, even if incorporating structural parameters and information encoded in human genome, to actually unfold into a mind which has human-like values (like ‘don’t steal’).” (And maybe Eliezer comes and says “no that’s not what I mean”, but, man, I sure don’t know what he does mean, then.)

Here’s some more evidence along those lines:

[Yudkowsky][14:08]
I mean, the evolutionary builtin part is not “humans have morals” but “humans have an internal language in which your Nice Morality, among other things, can potentially be written”...
Humans, arguably, do have an imperfect unless-I-get-caught term, which is manifested in children testing what they can get away with? Maybe if nothing unpleasant ever happens to them when they’re bad, the innate programming language concludes that this organism is in a spoiled aristocrat environment and should behave accordingly as an adult? But I am not an expert on this form of child developmental psychology since it unfortunately bears no relevance to my work of AI alignment.
[Alexander][14:11]
Do you feel like you understand very much about what evolutionary builtins are in a neural network sense? EG if you wanted to make an AI with “evolutionary builtins”, would you have any idea how to do it?
[Yudkowsky][14:13]
Well, for one thing, they happen when you’re doing sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, not when you’re doing gradient descent relative to a loss function on much larger neural networks.

Again, why is this true? This is an argument that should be engaging in technical questions about inductive biases, but instead seems to wave at (my words) “the original way we got property P was by sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, and good luck trying to get it otherwise.”

Hopefully this helps clarify what I’m trying to critique?

Rafael Harth 23 Mar 2023 10:29 UTC
LW: 2 AF: 1
0
AF Parent

I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.

Ok, I don’t disagree with this. I certainly didn’t develop a gears-level understanding of why [building a brain-like thing with gradient descent on giant matrices] is doomed after reading the 2021 conversations. But that doesn’t seem very informative either way; I didn’t spend that much time trying to grok his arguments.