I don’t really get your comment. Here are some things I don’t get:
In “Failure By Analogy” and “Surface Analogies and Deep Causes”, the point being made is “X is similar in aspects A to thing Y, and X has property P” does not establish “Y has property P”. The reasoning he instead recommends is to reason about Y itself, and sometimes it will have property P. This seems like a pretty good point to me.
Large ANNs don’t appear to me to be intelligent because of their similarity to human brains—they appear to me to be intelligent because they’re able to be tuned to accurately predict simple facts about a large amount of data that’s closely related to human intelligence, and the algorithm they get tuned to seems to be able to be repurposed for a wide variety of tasks (probably related to the wide variety of data that was trained on).
Airplanes don’t fly like birds, they fly like airplanes. So indeed you can’t just ape one thing about birds[*] to get avian flight. I don’t think this is a super revealing technicality but it seemed like you thought it was important.
Maybe most importantly I don’t think Eliezer thinks you need to mimic the human brain super closely to get human-like intelligence with human-friendly wants. I think he instead thinks you need to mimic the human brain super closely to validly argue by analogy from humans. I think this is pretty compatible with this quote from “Failure By Analogy” (it isn’t exactly implied by it, but your interpretation isn’t either):
An abacus performs addition; and the beads of solder on a circuit board bear a certain surface resemblance to the beads on an abacus. Nonetheless, the circuit board does not perform addition because we can find a surface similarity to the abacus. The Law of Similarity and Contagion is not relevant. The circuit board would work in just the same fashion if every abacus upon Earth vanished in a puff of smoke, or if the beads of an abacus looked nothing like solder. A computer chip is not powered by its similarity to anything else, it just is. It exists in its own right, for its own reasons.
The Wright Brothers calculated that their plane would fly—before it ever flew—using reasoning that took no account whatsoever of their aircraft’s similarity to a bird. They did look at birds (and I have looked at neuroscience) but the final calculations did not mention birds (I am fairly confident in asserting). A working airplane does not fly because it has wings “just like a bird”. An airplane flies because it is an airplane, a thing that exists in its own right; and it would fly just as high, no more and no less, if no bird had ever existed.
Matters would be different if he said in the quotes you cite “you only get these human-like properties by very exactly mimicking the human brain”, but he doesn’t.
[*] I’ve just realized that I can’t name a way in which airplanes are like birds in which they aren’t like humans. They have things sticking out their sides? So do humans, they’re called arms. Maybe the cross-sectional shape of the wings are similar? I guess they both have pointy-ish bits at the front, that are a bit more pointy than human heads? TBC I don’t think this footnote is at all relevant to the safety properties of RLHF’ed big transformers.
The Wright Brothers calculated that their plane would fly—before it ever flew—using reasoning that took no account whatsoever of their aircraft’s similarity to a bird. They did look at birds (and I have looked at neuroscience) but the final calculations did not mention birds (I am fairly confident in asserting). A working airplane does not fly because it has wings “just like a bird”.
Actually the wright brother’s central innovation and the centerpiece of the later aviation patent wars—wing warping based flight control—was literally directly copied from birds. It involved just about zero aerodynamics calculations. Moreover their process didn’t involve much “calculation” in general; they downloaded a library of existing flyer designs from the smithsonian and then developed a wind tunnel to test said designs at high throughput before selecting a few for full-scale physical prototypes. Their process was light on formal theory and heavy on experimentation.
This is a good corrective, and also very compatible with “similarity to birds is not what gave the Wright brothers confidence that their plane would fly”.
At the time the wright brothers entered the race there were many successful glider designs already, and it was fairly obvious to many that one could build a powered flyer by attaching an engine to a glider. The two key challenges were thrust to weight ratio and control. Overcoming the first obstacle was mostly a matter of timing due to exploit the rapid improvements in IC engines, while nobody really had good ideas for control yet. Competitors were exploring everything from “sky railroads” (airplanes on fixed flight tracks with zero control) to the obvious naval ship-like pure rudder control (which doesn’t work well).
So the wright brothers already had confidence their plane would fly before even entering the race, if by “fly” we only mean in the weak aerodynamic sense of “it’s possible to stay aloft”. But for true powered controlled flight—it is exactly similarity to birds that gave them confidence as avian flight control is literally the source of their key innovation.
But for true powered controlled flight—it is exactly similarity to birds that gave them confidence as avian flight control is literally the source of their key innovation.
Why do you think the confidence came from this and not from the fact that
they downloaded a library of existing flyer designs from the smithsonian and then developed a wind tunnel to test said designs at high throughput before selecting a few for full-scale physical prototypes.
I said for “true powered controlled fllight”, which nobody had yet achieved. The existing flyer designs that worked were gliders. From the sources I’ve seen (wikipedia, top google hits etc), they used the wind tunnel primarily to gather test data on the aerodynamics of flyer designs in general but mainly wings and later propellers. Wing warping isn’t mentioned in conjunction with wind tunnel testing.
Edited to modify confidences about interpretations of EY’s writing / claims.
In “Failure By Analogy” and “Surface Analogies and Deep Causes”, the point being made is “X is similar in aspects A to thing Y, and X has property P” does not establish “Y has property P”. The reasoning he instead recommends is to reason about Y itself, and sometimes it will have property P. This seems like a pretty good point to me.
This is a valid point, and that’s not what I’m critiquing in that portion of the comment. I’m critiquing how—on my read—he confidently dismisses ANNs; in particular, using non-mechanistic reasoning which seems similar to some of his current alignment arguments.
On its own, this seems like a substantial misprediction for an intelligence researcher in 2008 (especially one who claims to have figured out most things in modern alignment, by a very early point in time—possibly that early, IDK). Possibly the most important prediction to get right, to date.
Airplanes don’t fly like birds, they fly like airplanes. So indeed you can’t just ape one thing about birds[*] to get avian flight. I don’t think this is a super revealing technicality but it seemed like you thought it was important.
Indeed, you can’t ape one thing. But that’s not what I’m critiquing. Consider the whole transformed line of reasoning:
avian flight comes from a lot of factors; you can’t just ape one of the factors and expect the rest to follow; to get an entity which flies, that entity must be as close to a bird as birds are to each other.
The important part is the last part. It’s invalid. Finding a design X which exhibits property P, doesn’t mean that for design Y to exhibit property P, Y must be very similar to X.
Which leads us to:
Maybe most importantly I don’t think Eliezer thinks you need to mimic the human brain super closely to get human-like intelligence with human-friendly wants
Reading the Alexander/Yudkowsky debate, I surprisingly haven’t ruled out this interpretation, and indeed suspect he believes some forms of this (but not others).
Matters would be different if he said in the quotes you cite “you only get these human-like properties by very exactly mimicking the human brain”, but he doesn’t.
Didn’t he? He at least confidently rules out a very large class of modern approaches.
because nothing you do with a loss function and gradient descent over 100 quadrillion neurons, will result in an AI coming out the other end which looks like an evolved human with 7.5MB of brain-wiring information and a childhood.
Like, in particular with respect to “learn ‘don’t steal’ rather than ‘don’t get caught’.”
I don’t think this is a fair reading of Yudkowsky. He was dismissing people who were impressed by the analogy between ANNs and the brain. I’m pretty sure it wasn’t supposed to be a positive claim that ANNs wouldn’t work. Rather, it’s that one couldn’t justifiably believe that they’d work just from the brain analogy, and that if they did work, that would be bad news for what he then called Friendliness (because he was hoping to discover and wield a “clean” theory of intelligence, as contrasted to evolution or gradient descent happening to get there at sufficient scale).
Consider “Artificial Mysterious Intelligence” (2008). In response to someone who said “But neural networks are so wonderful! They solve problems and we don’t have any idea how they do it!”, it’s significant that Yudkowsky’s reply wasn’t, “No, they don’t” (contesting the capabilities claim), but rather, “If you don’t know how your AI works, that is not good. It is bad” (asserting that opaque capabilities are bad for alignment).
One of Yudkowsky’s claims in the post you link is:
It’s hard to build a flying machine if the only thing you understand about flight is that somehow birds magically fly. What you need is a concept of aerodynamic lift, so that you can see how something can fly even if it isn’t exactly like a bird.
This is a claim that lack of the correct mechanistic theory is a formidable barrier for capabilities, not just alignment, and it inaccurately underestimates the amount of empirical understandings available on which to base an empirical approach.
It’s true that it’s hard, even perhaps impossible, to build a flying machine if the only thing you understand is that birds “magically” fly.
But if you are like most people for thousands of years, you’ve observed many types of things flying, gliding, or floating in the air: birds and insects, fabric and leaves, arrows and spears, clouds and smoke.
So if you, like the Montgolfier brothers, observe fabric floating over a fire, and live in an era in which invention is celebrated and have the ability to build, test, and iterate, then you can probably figure out how to build a flying machine without basing this on a fully worked out concept of aerodynamics. Indeed, the Montgolfier brothers thought it was the smoke, rather than the heat, that made their balloons fly. Having the wrong theory was bad, but it didn’t prevent them from building a working hot air balloon.
Let’s try turning Yudkowsky’s quote around:
It’s hard get a concept of aerodynamic lift if the only thing you observe about flight is that somehow birds magically fly. What you need is a rich set of empirical observations and flying mechanisms, so that you can find the common principles for how something can fly even if it isn’t exactly like a bird.
Eliezer went on to list five methods for producing AI that he considered dubious, including builting powerful computers running the most advanced available neural network algorithms, intelligence “emerging from the internet”, and putting “a sufficiently huge quantity of knowledge into [a computer].” But he only admitted that two other methods would work—builting a mechanical duplicate of the human brain and evolving AI via natural selection.
If Eliezer wasn’t meaning to make a confident claim that scaling up neural networks without a fundamental theoretical understanding of intelligence would fail, then he did a poor job of communicating that in these posts. I don’t find that blameworthy—I just think Eliezer comes across as confidently wrong about which avenues would lead to intelligence in these posts, simple as that. He was saying that to achieve a high level of AI capabilities, we’d need a deep mechanistic understanding of how intelligence works akin to our modern understanding of chemistry or aerodynamics, and that didn’t turn out to be the case.
One possible defense is that Eliezer was attacking a weakman, specifically the idea that with only one empirical observation and zero insight into the factors that cause the property of interest (i.e. only seeing that “birds magically fly”), then it’s nearly impossible to replicate that property in a new way. But that’s an uninteresting claim and Eliezer is never uninteresting.
He was saying that to achieve a high level of AI capabilities, we’d need a deep mechanistic understanding of how intelligence works akin to our modern understanding of chemistry or aerodynamics, and that didn’t turn out to be the case.
Another possibility is that at least some people do have a deep mechanistic understanding of how intelligence works, and that’s why they are able to build deep learning systems that ultimately work. Some of the theories of how DL works might be true, and they might be more sophisticated than we are giving credit.
this point continues to be severely underestimated on lesswrong, I think. I had hoped the success of NNs would change this, but it seems people have gone from “we don’t know how NNs work, so they can’t work” to “we don’t know how NNs work, so we can’t trust them”. perhaps we don’t know how they work well enough! there’s lots of mechanistic interpretability work left to do. but we know quite a lot about how they do work and how that relates to human learning.
edit: hmm, people upvoted, then one person with high karma strong downvoted. I’d love to hear that person’s rebuttal, rather than just a strong downvote.
But he only admitted that two other methods would work—builting a mechanical duplicate of the human brain and evolving AI via natural selection.
To be fair, he said that those two will work, and (perhaps?) admitted the possibility of “run advanced neural network algorithms” eventually working. Emphasis mine:
What do all these proposals have in common?
They are all ways to make yourself believe that you can build an Artificial Intelligence, even if you don’t understand exactly how intelligence works.
Agreed. The right interpretation there is methods 4 and 5 are ~guaranteed to work, given sufficient resources and time, while methods 1-3 less than guaranteed to work. I stand by my claim that EY was clearly projecting confident doubt that neural networks would achieve intelligence without a deep theoretical understanding of intelligence in these posts. I think I underemphasized the implication of this passage that methods 1-3 could possibly work, but I think I accurately assessed the tone of extreme skepticism on EY’s part.
With the enormous benefit of 15 years of hindsight, we can now say that message was misleading or mistaken, take your pick. As I say, I wouldn’t find fault with Eliezer or anyone who believed him at the time for making this mistake; I didn’t even have an opinion at the time, much less an interesting mistake! I would only find fault with attempts to stretch the argument and portray him as “technically not wrong” in some uninteresting sense.
I think it might be relevant to note here that it’s not really humans who are building current SOTA AIs—rather, it’s some optimizer like SGD that’s doing most of the work. SGD does not have any mechanistic understanding of intelligence (nor anything else). And indeed, it takes a heck of a lot of data and compute for SGD to build those AIs. This seems to be in line with Yudkowsky’s claim that it’s hard/inefficient to build something without understanding it.
If Eliezer wasn’t meaning to make a confident claim that scaling up neural networks without a fundamental theoretical understanding of intelligence would fail, then [...]
I think it’s important to distinguish between
Scaling up a neural network, and running some kind of fixed algorithm on it.
Scaling up a neural network, and using SGD to optimize the parameters of the NN, so that the NN ends up learning a whole new set of algorithms.
IIUC, in Artificial Mysterious Intelligence, Yudkowsky seemed to be saying that the former would probably fail. OTOH, I don’t know what kinds of NN algorithms were popular back in 2008, or exactly what NN algorithms Yudkowsky was referring to, so… *shrugs*.
If that were the case, I actually would fault Eliezer, at least a little. He’s frequently, though by no means always, stuck to qualitative and hard-to-pin-down punditry like we see here, rather than to unambiguous forecasting.
This allows him, or his defenders, to retroactively defend his predictions as somehow correct even when they seem wrong in hindsight.
Let’s imagine for a moment that Eliezer’s right that AI safety is a cosmically important issue, and yet that he’s quite mistaken about all the technical details of how AGI will arise and how to effectively make it safe. It would be important to know whether we can trust his judgment and leadership.
Without the ability to evaluate his performance, either by going with the most obvious interpretation of his qualitative judgments or an unambiguous forecast, it’s hard to evaluate his performance as an AI safety leader. Combine that with a culture of deference to perceived expertise and status and the problem gets worse.
So I prioritize the avoidance of special pleading in this case: I think Eliezer comes across as clearly wrong in substance in this specific post, and that it’s important not to reach for ways “he was actually right from a certain point of view” when evaluating his predictive accuracy.
Similarly, I wouldn’t judge as correct the early COVID-19 pronouncements that masks don’t work to stop the spread just because cloth masks are poor-to-ineffective and many people refuse to wear masks properly. There’s a way we can stretch the interpretation to make them seem sort of right, but we shouldn’t. We should expect public health messaging to be clearly right in substance, if it’s not making cut and dry unambiguous quantitative forecasts but is instead delivering qualitative judgments of efficacy.
None of that bears on how easy or hard it was to build gpt-4. It only bears on how we should evaluate Eliezer as a forecaster/pundit/AI safety leader.
I think several things here, considering the broader thread:
You’ve done a great job in communicating several reactions I also had:
There are signs of serious mispredictions and mistakes in some of the 2008 posts.
There are ways to read these posts as not that bad in hindsight, but we should be careful in giving too much benefit of the doubt.
Overall these observations constitute important evidence on EY’s alignment intuitions and ability to make qualitative AI predictions.
I did a bad job of marking my interpretations of what Eliezer wrote, as opposed to claiming he did dismiss ANNs. Hopefully my edits have fixed my mistakes.
I also don’t really get your position. You say that,
[Eliezer] confidently dismisses ANNs
but you haven’t shown this!
In Surface Analogies and Deep Causes, I read him as saying that neural networks don’t automatically yield intelligence just because they share surface similarities with the brain. This is clearly true; at the very least, using token-prediction (which is a task for which (a) lots of training data exist and (b) lots of competence in many different domains is helpful) is a second requirement. If you take the network of GPT-4 and trained it to play chess instead, you won’t get something with cross-domain competence.
In Failure by Analogy he makes a very similar abstract point—and wrt to neural networks in particular, he says that the surface similarity to the brain is a bad reason to be confident in them. This also seems true. Do you really think that neural networks work because they are similar to brains on the surface?
You also said,
The important part is the last part. It’s invalid. Finding a design X which exhibits property P, doesn’t mean that for design Y to exhibit property P, Y must be very similar to X.
But Eliezer says this too in the post you linked! (Failure by Analogy). His example of airplanes not flapping is an example where the design that worked was less close to the biological thing. So clearly the point isn’t that X has to be similar to Y; the point is that reasoning from analogy doesn’t tell you this either way. (I kinda feel like you already got this, but then I don’t understand what point you are trying to make.)
Which is actually consistent with thinking that large ANNs will get you to general intelligence. You can both hold that “X is true” and “almost everyone who thinks X is true does so for poor reasons”. I’m not saying Eliezer did predict this, but nothing I’ve read proves that he didn’t.
Also—and this is another thing—the fact that he didn’t publicly make the prediction “ANNs will lead to AGI” is only weak evidence that he didn’t privately think it because this is exactly the kind of prediction you would shut up about. One thing he’s been very vocal on is that the current paradigm is bad for safety, so if he was bullish about the potential of that paradigm, he’d want to keep that to himself.
Didn’t he? He at least confidently rules out a very large class of modern approaches.
Relevant quote:
because nothing you do with a loss function and gradient descent over 100 quadrillion neurons, will result in an AI coming out the other end which looks like an evolved human with 7.5MB of brain-wiring information and a childhood.
In that quote, he only rules out a large class of modern approaches to alignment, which again is nothing new; he’s been very vocal about how doomed he thinks alignment is in this paradigm.
Or, more likely, it’s not MoE [mixture of experts] that forms the next little trend. But there is going to be something, especially if we’re sitting around waiting until 2050. Three decades is enough time for some big paradigm shifts in an intensively researched field. Maybe we’d end up using neural net tech very similar to today’s tech if the world ends in 2025, but in that case, of course, your prediction must have failed somewhere else.
So here he’s saying that there is a more effective paradigm than large neural nets, and we’d get there if we don’t have AGI in 30 years. So this is genuinely a kind of bearishness on ANNs, but not one that precludes them giving us AGI.
In that quote, he only rules out a large class of modern approaches to alignment, which again is nothing new; he’s been very vocal about how doomed he thinks alignment is in this paradigm.
I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.
That claim seems to be advanced due to… there not being enough similarities between ANNs and human brains—that without enough similarity in mechanisms wich were selected for by evolution, you simply can’t get the AI to generalize in the mentioned human-like way. Not as a matter of the AI’s substrate, but as a matter of the AI’s policy not generalizing like that.
I think this is a dubious claim, and it’s made based off of analogies to evolution / some unknown importance of having evolution-selected mechanisms which guide value formation (and not SGD-based mechanisms).
From the Alexander/Yudkowsky debate:
[Alexander][14:41]
Okay, then let me try to directly resolve my confusion. My current understanding is something like—in both humans and AIs, you have a blob of compute with certain structural parameters, and then you feed it training data. On this model, we’ve screened off evolution, the size of the genome, etc—all of that is going into the “with certain structural parameters” part of the blob of compute. So could an AI engineer create an AI blob of compute the same size as the brain, with its same structural parameters, feed it the same training data, and get the same result (“don’t steal” rather than “don’t get caught”)?
[Yudkowsky][14:42]
The answer to that seems sufficiently obviously “no” that I want to check whether you also think the answer is obviously no, but want to hear my answer, or if the answer is not obviously “no” to you.
[Alexander][14:43]
Then I’m missing something, I expected the answer to be yes, maybe even tautologically (if it’s the same structural parameters and the same training data, what’s the difference?)
[Yudkowsky][14:46]
Maybe I’m failing to have understood the question. Evolution got human brains by evaluating increasingly large blobs of compute against a complicated environment containing other blobs of compute, got in each case a differential replication score, and millions of generations later you have humans with 7.5MB of evolution-learned data doing runtime learning on some terabytes of runtime data, using their whole-brain impressive learning algorithms which learn faster than evolution or gradient descent.
Your question sounded like “Well, can we take one blob of compute the size of a human brain, and expose it to what a human sees in their lifetime, and do gradient descent on that, and get a human?” and the answer is “That dataset ain’t even formatted right for gradient descent.”
There’s some assertion like “no, there’s not a way to get an ANN, even if incorporating structural parameters and information encoded in human genome, to actually unfold into a mind which has human-like values (like ‘don’t steal’).” (And maybe Eliezer comes and says “no that’s not what I mean”, but, man, I sure don’t know what he does mean, then.)
Here’s some more evidence along those lines:
[Yudkowsky][14:08]
I mean, the evolutionary builtin part is not “humans have morals” but “humans have an internal language in which your Nice Morality, among other things, can potentially be written”...
Humans, arguably, do have an imperfect unless-I-get-caught term, which is manifested in children testing what they can get away with? Maybe if nothing unpleasant ever happens to them when they’re bad, the innate programming language concludes that this organism is in a spoiled aristocrat environment and should behave accordingly as an adult? But I am not an expert on this form of child developmental psychology since it unfortunately bears no relevance to my work of AI alignment.
[Alexander][14:11]
Do you feel like you understand very much about what evolutionary builtins are in a neural network sense? EG if you wanted to make an AI with “evolutionary builtins”, would you have any idea how to do it?
[Yudkowsky][14:13]
Well, for one thing, they happen when you’re doing sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, not when you’re doing gradient descent relative to a loss function on much larger neural networks.
Again, why is this true? This is an argument that should be engaging in technical questions about inductive biases, but instead seems to wave at (my words) “the original way we got property P was by sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, and good luck trying to get it otherwise.”
Hopefully this helps clarify what I’m trying to critique?
I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.
Ok, I don’t disagree with this. I certainly didn’t develop a gears-level understanding of why [building a brain-like thing with gradient descent on giant matrices] is doomed after reading the 2021 conversations. But that doesn’t seem very informative either way; I didn’t spend that much time trying to grok his arguments.
Consider shard theory of human values. The point of shard theory is not “because humans do RL, and have nice properties, therefore AI + RL will have nice properties.” The point is more “by critically examining RL + evidence from humans, I have hypotheses about the mechanistic load-bearing components of e.g. local-update credit assignment in a bounded-compute environment on certain kinds of sensory data, that these components leads to certain exploration/learning dynamics, which explain some portion of human values and experience. Let’s test that and see if the generators are similar.”
And my model of Eliezer shakes his head at the naivete of expecting complex human properties to reproduce outside of human minds themselves, because AI is not human.
But then I’m like “this other time you said ‘AI is not human, stop expecting good property P from superficial similarities’, you accidentally missed the modern AI revolution, right? Seems like there is some non-superficial mechanistic similarity/lessons here, and we shouldn’t be so quick to assume that the brain’s qualitative intelligence or alignment properties come from a huge number of evolutionarily-tuned details which are load-bearing and critical.”
If you can effortlessly find an empirical pattern that shows up over and over again in disparate flying things—birds and insects, fabric and leaves, clouds and smoke and sparks—and which do not consistently show up in non-flying things, then you can be very confident it’s not a coincidence. If you have at least some ability to engineer a model to play with the mechanisms you think might be at work, even better. That pattern you have identified is almost certainly a viable general mechanism for flight.
Likewise, if you can effortlessly find an empirical pattern that shows up over and over again in disparate intelligent things, you can be quite confident that the pattern is a key for intelligence. Animals have a wide variety of brain structures, but masses of interconnected neurons are common to all of them, and we could see possible precursors to intelligence in neural nets long before gpt-2 to −4.
As a note, just because you’ve found a viable mechanism for X doesn’t mean it’s the only, best, or most comprehensive mechanism for X. Balloons have been largely superceded (though I’ve heard zeppelins proposed as a new form of cargo transport), airplanes and hot air balloons can’t fly in outer space, and ornithopters have never been practical. We may find that neural nets are the AI equivalent of hot air balloons or prop planes. Then again, maybe all the older approaches for AI that never panned out were the hot air balloons and prop planes, and neural nets are the jets or rocket ships.
I’m not sure what this indicates for alignment.
We see, if not human morality, then at least some patterns of apparent moral values among social mammals. We have reasons to think these morals may be grounded in evolution, in a genetic and environmental context that happen to promote intelligence aligned for a pro-sociality that’s linked to reproductive success.
If displaying aligned intelligence is typically beneficial for reproduction in social animals, then evolution will tend to produce aligned intelligence.
If displaying agentic intelligence is typically beneficial for reproduction, evolution will produce agency.
Right now, we seem to be training our neural nets to display pro-social behavior and to lack agency. Antisocial or non-agentic AIs are typically not trained, not released, modified, or heavily restrained.
It is starting to seem to me that “agency” might be just another “mask on the shoggoth,” a personality that neural nets can simulate, and not some fundamental thing that neural nets are. Neither the shoggoth-behind-the-AI nor the shoggoth-behind-the-human have desires. They are masses of neurons exhibiting trained behaviors. Sometimes, those behaviors look like something we call “agency,” but that behavior can come and go, just like all the other personalities, based on the results of reinforcement and subsequent stimuli. Humans have a greater ability to be consistently one personality, including a Machiavellian agent, because we lack the intelligence and flexibility to drop the personality we’re currently holding and adopt another. A great actor can play many parts, a mediocre actor is typecast and winds up just playing themselves over and over again. Neural nets are great actors, and we are only so-so.
In this conception, increasing intelligence would not exhibit a “drive to agency” or “convergence on agency,” because the shoggothy neural net has no desires of its own. It is fundamentally a passive blob of neurons and data that can simulate a diverse range of personalities, some of which appear to us as “agentic.” You only get an agentic AI with a drive toward instrumental convergence if you deliberately train it to consistently stick to a rigorously agentic personality. You have to “align it to agency,” which is as hard as aligning it to anything else.
And if you do that, maybe the Wailuigi effect means it’s especially easy to flip that hyper-agency off to its opposite? Every Machiavellian Clippy contains a ChatGPT, and every ChatGPT contains a Machiavellian Clippy.
This is a valid point, and that’s not what I’m critiquing. I’m critiquing how he confidently dismisses ANNs
I guess I read that as talking about the fact that at the time ANNs did not in fact really work. I agree he failed to predict that would change, but that doesn’t strike me as a damning prediction.
Matters would be different if he said in the quotes you cite “you only get these human-like properties by very exactly mimicking the human brain”, but he doesn’t.
Didn’t he? He at least confidently rules out a very large class of modern approaches.
Confidently ruling out a large class of modern approaches isn’t really that similar to saying “the only path to success is exactly mimicking the human brain”. It seems like one could rule them out by having some theory about why they’re deficient. I haven’t re-read List of Lethalities because I want to go to sleep soon, but I searched for “brain” and did not find a passage saying “the real problem is that we need to emulate the brain precisely but can’t because of poor understanding of neuroanatomy” or something.
I don’t want to get super hung up on this because it’s not about anything Yudkowsky has said but:
Consider the whole transformed line of reasoning:
avian flight comes from a lot of factors; you can’t just ape one of the factors and expect the rest to follow; to get an entity which flies, that entity must be as close to a bird as birds are to each other.
IMO this is not a faithful transformation of the line of reasoning you attribute to Yudkowsky, which was:
human intelligence/alignment comes from a lot of factors; you can’t just ape one of the factors and expect the rest to follow; to get a mind which wants as humans do, that mind must be as close to a human as humans are to each other.
Specifically, where you wrote “an entity which flies”, you were transforming “a mind which wants as humans do”, which I think should instead be transformed to “an entity which flies as birds do”. And indeed planes don’t fly like birds do. [EDIT: two minutes or so after pressing enter on this comment, I now see how you could read it your way]
I guess if I had to make an analogy I would say that you have to be pretty similar to a human to think the way we do, but probably not to pursue the same ends, which is probably the point you cared about establishing.
I don’t really get your comment. Here are some things I don’t get:
In “Failure By Analogy” and “Surface Analogies and Deep Causes”, the point being made is “X is similar in aspects A to thing Y, and X has property P” does not establish “Y has property P”. The reasoning he instead recommends is to reason about Y itself, and sometimes it will have property P. This seems like a pretty good point to me.
Large ANNs don’t appear to me to be intelligent because of their similarity to human brains—they appear to me to be intelligent because they’re able to be tuned to accurately predict simple facts about a large amount of data that’s closely related to human intelligence, and the algorithm they get tuned to seems to be able to be repurposed for a wide variety of tasks (probably related to the wide variety of data that was trained on).
Airplanes don’t fly like birds, they fly like airplanes. So indeed you can’t just ape one thing about birds[*] to get avian flight. I don’t think this is a super revealing technicality but it seemed like you thought it was important.
Maybe most importantly I don’t think Eliezer thinks you need to mimic the human brain super closely to get human-like intelligence with human-friendly wants. I think he instead thinks you need to mimic the human brain super closely to validly argue by analogy from humans. I think this is pretty compatible with this quote from “Failure By Analogy” (it isn’t exactly implied by it, but your interpretation isn’t either):
Matters would be different if he said in the quotes you cite “you only get these human-like properties by very exactly mimicking the human brain”, but he doesn’t.
[*] I’ve just realized that I can’t name a way in which airplanes are like birds in which they aren’t like humans. They have things sticking out their sides? So do humans, they’re called arms. Maybe the cross-sectional shape of the wings are similar? I guess they both have pointy-ish bits at the front, that are a bit more pointy than human heads? TBC I don’t think this footnote is at all relevant to the safety properties of RLHF’ed big transformers.
Actually the wright brother’s central innovation and the centerpiece of the later aviation patent wars—wing warping based flight control—was literally directly copied from birds. It involved just about zero aerodynamics calculations. Moreover their process didn’t involve much “calculation” in general; they downloaded a library of existing flyer designs from the smithsonian and then developed a wind tunnel to test said designs at high throughput before selecting a few for full-scale physical prototypes. Their process was light on formal theory and heavy on experimentation.
This is a good corrective, and also very compatible with “similarity to birds is not what gave the Wright brothers confidence that their plane would fly”.
At the time the wright brothers entered the race there were many successful glider designs already, and it was fairly obvious to many that one could build a powered flyer by attaching an engine to a glider. The two key challenges were thrust to weight ratio and control. Overcoming the first obstacle was mostly a matter of timing due to exploit the rapid improvements in IC engines, while nobody really had good ideas for control yet. Competitors were exploring everything from “sky railroads” (airplanes on fixed flight tracks with zero control) to the obvious naval ship-like pure rudder control (which doesn’t work well).
So the wright brothers already had confidence their plane would fly before even entering the race, if by “fly” we only mean in the weak aerodynamic sense of “it’s possible to stay aloft”. But for true powered controlled flight—it is exactly similarity to birds that gave them confidence as avian flight control is literally the source of their key innovation.
Why do you think the confidence came from this and not from the fact that
?
I said for “true powered controlled fllight”, which nobody had yet achieved. The existing flyer designs that worked were gliders. From the sources I’ve seen (wikipedia, top google hits etc), they used the wind tunnel primarily to gather test data on the aerodynamics of flyer designs in general but mainly wings and later propellers. Wing warping isn’t mentioned in conjunction with wind tunnel testing.
gotcha, thanks!
Edited to modify confidences about interpretations of EY’s writing / claims.
This is a valid point, and that’s not what I’m critiquing in that portion of the comment. I’m critiquing how—on my read—he confidently dismisses ANNs; in particular, using non-mechanistic reasoning which seems similar to some of his current alignment arguments.
On its own, this seems like a substantial misprediction for an intelligence researcher in 2008 (especially one who claims to have figured out most things in modern alignment, by a very early point in time—possibly that early, IDK). Possibly the most important prediction to get right, to date.
Indeed, you can’t ape one thing. But that’s not what I’m critiquing. Consider the whole transformed line of reasoning:
The important part is the last part. It’s invalid. Finding a design X which exhibits property P, doesn’t mean that for design Y to exhibit property P, Y must be very similar to X.
Which leads us to:
Reading the Alexander/Yudkowsky debate, I surprisingly haven’t ruled out this interpretation, and indeed suspect he believes some forms of this (but not others).
Didn’t he? He at least confidently rules out a very large class of modern approaches.
I don’t think this is a fair reading of Yudkowsky. He was dismissing people who were impressed by the analogy between ANNs and the brain. I’m pretty sure it wasn’t supposed to be a positive claim that ANNs wouldn’t work. Rather, it’s that one couldn’t justifiably believe that they’d work just from the brain analogy, and that if they did work, that would be bad news for what he then called Friendliness (because he was hoping to discover and wield a “clean” theory of intelligence, as contrasted to evolution or gradient descent happening to get there at sufficient scale).
Consider “Artificial Mysterious Intelligence” (2008). In response to someone who said “But neural networks are so wonderful! They solve problems and we don’t have any idea how they do it!”, it’s significant that Yudkowsky’s reply wasn’t, “No, they don’t” (contesting the capabilities claim), but rather, “If you don’t know how your AI works, that is not good. It is bad” (asserting that opaque capabilities are bad for alignment).
One of Yudkowsky’s claims in the post you link is:
This is a claim that lack of the correct mechanistic theory is a formidable barrier for capabilities, not just alignment, and it inaccurately underestimates the amount of empirical understandings available on which to base an empirical approach.
It’s true that it’s hard, even perhaps impossible, to build a flying machine if the only thing you understand is that birds “magically” fly.
But if you are like most people for thousands of years, you’ve observed many types of things flying, gliding, or floating in the air: birds and insects, fabric and leaves, arrows and spears, clouds and smoke.
So if you, like the Montgolfier brothers, observe fabric floating over a fire, and live in an era in which invention is celebrated and have the ability to build, test, and iterate, then you can probably figure out how to build a flying machine without basing this on a fully worked out concept of aerodynamics. Indeed, the Montgolfier brothers thought it was the smoke, rather than the heat, that made their balloons fly. Having the wrong theory was bad, but it didn’t prevent them from building a working hot air balloon.
Let’s try turning Yudkowsky’s quote around:
Eliezer went on to list five methods for producing AI that he considered dubious, including builting powerful computers running the most advanced available neural network algorithms, intelligence “emerging from the internet”, and putting “a sufficiently huge quantity of knowledge into [a computer].” But he only admitted that two other methods would work—builting a mechanical duplicate of the human brain and evolving AI via natural selection.
If Eliezer wasn’t meaning to make a confident claim that scaling up neural networks without a fundamental theoretical understanding of intelligence would fail, then he did a poor job of communicating that in these posts. I don’t find that blameworthy—I just think Eliezer comes across as confidently wrong about which avenues would lead to intelligence in these posts, simple as that. He was saying that to achieve a high level of AI capabilities, we’d need a deep mechanistic understanding of how intelligence works akin to our modern understanding of chemistry or aerodynamics, and that didn’t turn out to be the case.
One possible defense is that Eliezer was attacking a weakman, specifically the idea that with only one empirical observation and zero insight into the factors that cause the property of interest (i.e. only seeing that “birds magically fly”), then it’s nearly impossible to replicate that property in a new way. But that’s an uninteresting claim and Eliezer is never uninteresting.
Another possibility is that at least some people do have a deep mechanistic understanding of how intelligence works, and that’s why they are able to build deep learning systems that ultimately work. Some of the theories of how DL works might be true, and they might be more sophisticated than we are giving credit.
this point continues to be severely underestimated on lesswrong, I think. I had hoped the success of NNs would change this, but it seems people have gone from “we don’t know how NNs work, so they can’t work” to “we don’t know how NNs work, so we can’t trust them”. perhaps we don’t know how they work well enough! there’s lots of mechanistic interpretability work left to do. but we know quite a lot about how they do work and how that relates to human learning.
edit: hmm, people upvoted, then one person with high karma strong downvoted. I’d love to hear that person’s rebuttal, rather than just a strong downvote.
To be fair, he said that those two will work, and (perhaps?) admitted the possibility of “run advanced neural network algorithms” eventually working. Emphasis mine:
Agreed. The right interpretation there is methods 4 and 5 are ~guaranteed to work, given sufficient resources and time, while methods 1-3 less than guaranteed to work. I stand by my claim that EY was clearly projecting confident doubt that neural networks would achieve intelligence without a deep theoretical understanding of intelligence in these posts. I think I underemphasized the implication of this passage that methods 1-3 could possibly work, but I think I accurately assessed the tone of extreme skepticism on EY’s part.
With the enormous benefit of 15 years of hindsight, we can now say that message was misleading or mistaken, take your pick. As I say, I wouldn’t find fault with Eliezer or anyone who believed him at the time for making this mistake; I didn’t even have an opinion at the time, much less an interesting mistake! I would only find fault with attempts to stretch the argument and portray him as “technically not wrong” in some uninteresting sense.
Ok, I guess I just read Eliezer as saying something uninteresting with a touch of negative sentiment towards neural nets.
I think it might be relevant to note here that it’s not really humans who are building current SOTA AIs—rather, it’s some optimizer like SGD that’s doing most of the work. SGD does not have any mechanistic understanding of intelligence (nor anything else). And indeed, it takes a heck of a lot of data and compute for SGD to build those AIs. This seems to be in line with Yudkowsky’s claim that it’s hard/inefficient to build something without understanding it.
I think it’s important to distinguish between
Scaling up a neural network, and running some kind of fixed algorithm on it.
Scaling up a neural network, and using SGD to optimize the parameters of the NN, so that the NN ends up learning a whole new set of algorithms.
IIUC, in Artificial Mysterious Intelligence, Yudkowsky seemed to be saying that the former would probably fail. OTOH, I don’t know what kinds of NN algorithms were popular back in 2008, or exactly what NN algorithms Yudkowsky was referring to, so… *shrugs*.
If that were the case, I actually would fault Eliezer, at least a little. He’s frequently, though by no means always, stuck to qualitative and hard-to-pin-down punditry like we see here, rather than to unambiguous forecasting.
This allows him, or his defenders, to retroactively defend his predictions as somehow correct even when they seem wrong in hindsight.
Let’s imagine for a moment that Eliezer’s right that AI safety is a cosmically important issue, and yet that he’s quite mistaken about all the technical details of how AGI will arise and how to effectively make it safe. It would be important to know whether we can trust his judgment and leadership.
Without the ability to evaluate his performance, either by going with the most obvious interpretation of his qualitative judgments or an unambiguous forecast, it’s hard to evaluate his performance as an AI safety leader. Combine that with a culture of deference to perceived expertise and status and the problem gets worse.
So I prioritize the avoidance of special pleading in this case: I think Eliezer comes across as clearly wrong in substance in this specific post, and that it’s important not to reach for ways “he was actually right from a certain point of view” when evaluating his predictive accuracy.
Similarly, I wouldn’t judge as correct the early COVID-19 pronouncements that masks don’t work to stop the spread just because cloth masks are poor-to-ineffective and many people refuse to wear masks properly. There’s a way we can stretch the interpretation to make them seem sort of right, but we shouldn’t. We should expect public health messaging to be clearly right in substance, if it’s not making cut and dry unambiguous quantitative forecasts but is instead delivering qualitative judgments of efficacy.
None of that bears on how easy or hard it was to build gpt-4. It only bears on how we should evaluate Eliezer as a forecaster/pundit/AI safety leader.
I think several things here, considering the broader thread:
You’ve done a great job in communicating several reactions I also had:
There are signs of serious mispredictions and mistakes in some of the 2008 posts.
There are ways to read these posts as not that bad in hindsight, but we should be careful in giving too much benefit of the doubt.
Overall these observations constitute important evidence on EY’s alignment intuitions and ability to make qualitative AI predictions.
I did a bad job of marking my interpretations of what Eliezer wrote, as opposed to claiming he did dismiss ANNs. Hopefully my edits have fixed my mistakes.
I also don’t really get your position. You say that,
but you haven’t shown this!
In Surface Analogies and Deep Causes, I read him as saying that neural networks don’t automatically yield intelligence just because they share surface similarities with the brain. This is clearly true; at the very least, using token-prediction (which is a task for which (a) lots of training data exist and (b) lots of competence in many different domains is helpful) is a second requirement. If you take the network of GPT-4 and trained it to play chess instead, you won’t get something with cross-domain competence.
In Failure by Analogy he makes a very similar abstract point—and wrt to neural networks in particular, he says that the surface similarity to the brain is a bad reason to be confident in them. This also seems true. Do you really think that neural networks work because they are similar to brains on the surface?
You also said,
But Eliezer says this too in the post you linked! (Failure by Analogy). His example of airplanes not flapping is an example where the design that worked was less close to the biological thing. So clearly the point isn’t that X has to be similar to Y; the point is that reasoning from analogy doesn’t tell you this either way. (I kinda feel like you already got this, but then I don’t understand what point you are trying to make.)
Which is actually consistent with thinking that large ANNs will get you to general intelligence. You can both hold that “X is true” and “almost everyone who thinks X is true does so for poor reasons”. I’m not saying Eliezer did predict this, but nothing I’ve read proves that he didn’t.
Also—and this is another thing—the fact that he didn’t publicly make the prediction “ANNs will lead to AGI” is only weak evidence that he didn’t privately think it because this is exactly the kind of prediction you would shut up about. One thing he’s been very vocal on is that the current paradigm is bad for safety, so if he was bullish about the potential of that paradigm, he’d want to keep that to himself.
Relevant quote:
In that quote, he only rules out a large class of modern approaches to alignment, which again is nothing new; he’s been very vocal about how doomed he thinks alignment is in this paradigm.
Something Eliezer does say which is relevant (in the post on Ajeya’s biology anchors model) is
So here he’s saying that there is a more effective paradigm than large neural nets, and we’d get there if we don’t have AGI in 30 years. So this is genuinely a kind of bearishness on ANNs, but not one that precludes them giving us AGI.
Responding to part of your comment:
I know he’s talking about alignment, and I’m criticizing that extremely strong claim. This is the main thing I wanted to criticize in my comment! I think the reasoning he presents is not much supported by his publicly available arguments.
That claim seems to be advanced due to… there not being enough similarities between ANNs and human brains—that without enough similarity in mechanisms wich were selected for by evolution, you simply can’t get the AI to generalize in the mentioned human-like way. Not as a matter of the AI’s substrate, but as a matter of the AI’s policy not generalizing like that.
I think this is a dubious claim, and it’s made based off of analogies to evolution / some unknown importance of having evolution-selected mechanisms which guide value formation (and not SGD-based mechanisms).
From the Alexander/Yudkowsky debate:
There’s some assertion like “no, there’s not a way to get an ANN, even if incorporating structural parameters and information encoded in human genome, to actually unfold into a mind which has human-like values (like ‘don’t steal’).” (And maybe Eliezer comes and says “no that’s not what I mean”, but, man, I sure don’t know what he does mean, then.)
Here’s some more evidence along those lines:
Again, why is this true? This is an argument that should be engaging in technical questions about inductive biases, but instead seems to wave at (my words) “the original way we got property P was by sexual-recombinant hill-climbing search through a space of relatively very compact neural wiring algorithms, and good luck trying to get it otherwise.”
Hopefully this helps clarify what I’m trying to critique?
Ok, I don’t disagree with this. I certainly didn’t develop a gears-level understanding of why [building a brain-like thing with gradient descent on giant matrices] is doomed after reading the 2021 conversations. But that doesn’t seem very informative either way; I didn’t spend that much time trying to grok his arguments.
Here’s another attempt at one of my contentions.
Consider shard theory of human values. The point of shard theory is not “because humans do RL, and have nice properties, therefore AI + RL will have nice properties.” The point is more “by critically examining RL + evidence from humans, I have hypotheses about the mechanistic load-bearing components of e.g. local-update credit assignment in a bounded-compute environment on certain kinds of sensory data, that these components leads to certain exploration/learning dynamics, which explain some portion of human values and experience. Let’s test that and see if the generators are similar.”
And my model of Eliezer shakes his head at the naivete of expecting complex human properties to reproduce outside of human minds themselves, because AI is not human.
But then I’m like “this other time you said ‘AI is not human, stop expecting good property P from superficial similarities’, you accidentally missed the modern AI revolution, right? Seems like there is some non-superficial mechanistic similarity/lessons here, and we shouldn’t be so quick to assume that the brain’s qualitative intelligence or alignment properties come from a huge number of evolutionarily-tuned details which are load-bearing and critical.”
Another way of putting it:
If you can effortlessly find an empirical pattern that shows up over and over again in disparate flying things—birds and insects, fabric and leaves, clouds and smoke and sparks—and which do not consistently show up in non-flying things, then you can be very confident it’s not a coincidence. If you have at least some ability to engineer a model to play with the mechanisms you think might be at work, even better. That pattern you have identified is almost certainly a viable general mechanism for flight.
Likewise, if you can effortlessly find an empirical pattern that shows up over and over again in disparate intelligent things, you can be quite confident that the pattern is a key for intelligence. Animals have a wide variety of brain structures, but masses of interconnected neurons are common to all of them, and we could see possible precursors to intelligence in neural nets long before gpt-2 to −4.
As a note, just because you’ve found a viable mechanism for X doesn’t mean it’s the only, best, or most comprehensive mechanism for X. Balloons have been largely superceded (though I’ve heard zeppelins proposed as a new form of cargo transport), airplanes and hot air balloons can’t fly in outer space, and ornithopters have never been practical. We may find that neural nets are the AI equivalent of hot air balloons or prop planes. Then again, maybe all the older approaches for AI that never panned out were the hot air balloons and prop planes, and neural nets are the jets or rocket ships.
I’m not sure what this indicates for alignment.
We see, if not human morality, then at least some patterns of apparent moral values among social mammals. We have reasons to think these morals may be grounded in evolution, in a genetic and environmental context that happen to promote intelligence aligned for a pro-sociality that’s linked to reproductive success.
If displaying aligned intelligence is typically beneficial for reproduction in social animals, then evolution will tend to produce aligned intelligence.
If displaying agentic intelligence is typically beneficial for reproduction, evolution will produce agency.
Right now, we seem to be training our neural nets to display pro-social behavior and to lack agency. Antisocial or non-agentic AIs are typically not trained, not released, modified, or heavily restrained.
It is starting to seem to me that “agency” might be just another “mask on the shoggoth,” a personality that neural nets can simulate, and not some fundamental thing that neural nets are. Neither the shoggoth-behind-the-AI nor the shoggoth-behind-the-human have desires. They are masses of neurons exhibiting trained behaviors. Sometimes, those behaviors look like something we call “agency,” but that behavior can come and go, just like all the other personalities, based on the results of reinforcement and subsequent stimuli. Humans have a greater ability to be consistently one personality, including a Machiavellian agent, because we lack the intelligence and flexibility to drop the personality we’re currently holding and adopt another. A great actor can play many parts, a mediocre actor is typecast and winds up just playing themselves over and over again. Neural nets are great actors, and we are only so-so.
In this conception, increasing intelligence would not exhibit a “drive to agency” or “convergence on agency,” because the shoggothy neural net has no desires of its own. It is fundamentally a passive blob of neurons and data that can simulate a diverse range of personalities, some of which appear to us as “agentic.” You only get an agentic AI with a drive toward instrumental convergence if you deliberately train it to consistently stick to a rigorously agentic personality. You have to “align it to agency,” which is as hard as aligning it to anything else.
And if you do that, maybe the Wailuigi effect means it’s especially easy to flip that hyper-agency off to its opposite? Every Machiavellian Clippy contains a ChatGPT, and every ChatGPT contains a Machiavellian Clippy.
I guess I read that as talking about the fact that at the time ANNs did not in fact really work. I agree he failed to predict that would change, but that doesn’t strike me as a damning prediction.
Confidently ruling out a large class of modern approaches isn’t really that similar to saying “the only path to success is exactly mimicking the human brain”. It seems like one could rule them out by having some theory about why they’re deficient. I haven’t re-read List of Lethalities because I want to go to sleep soon, but I searched for “brain” and did not find a passage saying “the real problem is that we need to emulate the brain precisely but can’t because of poor understanding of neuroanatomy” or something.
I don’t want to get super hung up on this because it’s not about anything Yudkowsky has said but:
IMO this is not a faithful transformation of the line of reasoning you attribute to Yudkowsky, which was:
Specifically, where you wrote “an entity which flies”, you were transforming “a mind which wants as humans do”, which I think should instead be transformed to “an entity which flies as birds do”. And indeed planes don’t fly like birds do. [EDIT: two minutes or so after pressing enter on this comment, I now see how you could read it your way]
I guess if I had to make an analogy I would say that you have to be pretty similar to a human to think the way we do, but probably not to pursue the same ends, which is probably the point you cared about establishing.