Cascades are when one thing leads to another. Human brains are effectively discontinuous with chimpanzee brains due to a whole bag of design improvements, even though they and we share 95% genetic material and only a few million years have elapsed since the branch. Why this whole series of improvements in us, relative to chimpanzees? Why haven’t some of the same improvements occurred in other primates?
Well, this is not a question on which one may speak with authority (so far as I know). But I would venture an unoriginal guess that, in the hominid line, one thing led to another.
The chimp-level task of modeling others, in the hominid line, led to improved self-modeling which supported recursion which enabled language which birthed politics that increased the selection pressure for outwitting which led to sexual selection on wittiness...
...or something. It’s hard to tell by looking at the fossil record what happened in what order and why. The point being that it wasn’t one optimization that pushed humans ahead of chimps, but rather a cascade of optimizations that, in Pan, never got started.
We fell up the stairs, you might say. It’s not that the first stair ends the world, but if you fall up one stair, you’re more likely to fall up the second, the third, the fourth...
I will concede that farming was a watershed invention in the history of
the human species, though it intrigues me for a different reason than
Robin. Robin, presumably, is interested because the economy grew by
two orders of magnitude, or something like that. But did having a
hundred times as many humans, lead to a hundred times as much thought-optimization accumulating
per unit time? It doesn’t seem likely, especially in the age before
writing and telephones. But farming, because of its sedentary and
repeatable nature, led to repeatable trade, which led to debt records. Aha!
- now we have writing.There’s a significant invention, from the perspective of cumulative optimization by brains. Farming isn’t writing but it cascaded to writing.
Farming also cascaded (by way of surpluses and cities) to support professional specialization. I suspect that having someone spend their whole life
thinking about topic X instead of a hundred farmers occasionally
pondering it, is a more significant jump in cumulative optimization
than the gap between a hundred farmers and one hunter-gatherer pondering something.
Farming is not the same trick as professional specialization or writing, but it cascaded to professional specialization and writing, and so the pace of human history picked up enormously after agriculture. Thus I would interpret the story.
From a zoomed-out perspective, cascades can lead to what look like discontinuities in the historical record, even given a steady optimization pressure in the background. It’s not that natural selection sped up during hominid evolution. But the search neighborhood contained a low-hanging fruit of high slope… that led to another fruit… which led to another fruit… and so, walking at a constant rate, we fell up the stairs. If you see what I’m saying.
Predicting what sort of things are likely to cascade, seems like a very difficult sort of problem.
But I will venture the observation that—with a sample size of one, and an optimization process very different from human thought—there was a cascade in the region of the transition from primate to human intelligence.
Cycles happen when you connect the output pipe to the input pipe in a repeatable transformation. You might think of them as a special case of cascades with very high regularity. (From which you’ll note that in the cases above, I talked about cascades through differing events: farming → writing.)
The notion of cycles as a source of discontinuity might seem counterintuitive, since it’s so regular. But consider this important lesson of history:
Once upon a time, in a squash court beneath Stagg Field at the University of Chicago, physicists were building a shape like a giant doorknob out of alternate layers of graphite and uranium...
The key number for the “pile” is the effective neutron multiplication factor. When a uranium atom splits, it releases neutrons—some right away, some after delay while byproducts decay further. Some neutrons escape the pile, some neutrons strike another uranium atom and cause an additional fission. The effective neutron multiplication factor, denoted k, is the average number of neutrons from a single fissioning uranium atom that cause another fission. At k less than 1, the pile is “subcritical”. At k >= 1, the pile is “critical”. Fermi calculates that the pile will reach k=1 between layers 56 and 57.
On December 2nd in 1942, with layer 57 completed, Fermi orders the final experiment to begin. All but one of the control rods (strips of wood covered with neutron-absorbing cadmium foil) are withdrawn. At 10:37am, Fermi orders the final control rod withdrawn about half-way out. The geiger counters click faster, and a graph pen moves upward. “This is not it,” says Fermi, “the trace will go to this point and level off,” indicating a spot on the graph. In a few minutes the graph pen comes to the indicated point, and does not go above it. Seven minutes later, Fermi orders the rod pulled out another foot. Again the radiation rises, then levels off. The rod is pulled out another six inches, then another, then another.
At 11:30, the slow rise of the graph pen is punctuated by an enormous CRASH—an emergency control rod, triggered by an ionization chamber, activates and shuts down the pile, which is still short of criticality.
Fermi orders the team to break for lunch.
At 2pm the team reconvenes, withdraws and locks the emergency control rod, and moves the control rod to its last setting. Fermi makes some measurements and calculations, then again begins the process of withdrawing the rod in slow increments. At 3:25pm, Fermi orders the rod withdrawn another twelve inches. “This is going to do it,” Fermi says. “Now it will become self-sustaining. The trace will climb and continue to climb. It will not level off.”
Herbert Anderson recounted (as told in Rhodes’s The Making of the Atomic Bomb):
“At first you could hear the sound of the neutron counter, clickety-clack, clickety-clack. Then the clicks came more and more rapidly, and after a while they began to merge into a roar; the counter couldn’t follow anymore. That was the moment to switch to the chart recorder. But when the switch was made, everyone watched in the sudden silence the mounting deflection of the recorder’s pen. It was an awesome silence. Everyone realized the significance of that switch; we were in the high intensity regime and the counters were unable to cope with the situation anymore. Again and again, the scale of the recorder had to be changed to accomodate the neutron intensity which was increasing more and more rapidly. Suddenly Fermi raised his hand. ‘The pile has gone critical,’ he announced. No one present had any doubt about it.”
Fermi kept the pile running for twenty-eight minutes, with the neutron intensity doubling every two minutes.
That first critical reaction had k of 1.0006.
It might seem that a cycle, with the same thing happening over and over again, ought to exhibit continuous behavior. In one sense it does. But if you pile on one more uranium brick, or pull out the control rod another twelve inches, there’s one hell of a big difference between k of 0.9994 and k of 1.0006.
If, rather than being able to calculate, rather than foreseeing and taking cautions, Fermi had just reasoned that 57 layers ought not to behave all that differently from 56 layers—well, it wouldn’t have been a good year to be a student at the University of Chicago.
The inexact analogy to the domain of self-improving AI is left as an exercise for the reader, at least for now.
Economists like to measure cycles because they happen repeatedly. You take a potato and an hour of labor and make a potato clock which you sell for two potatoes; and you do this over and over and over again, so an economist can come by and watch how you do it.
As I noted here at some length, economists are much less likely to go around measuring how many scientific discoveries it takes to produce a new scientific discovery. All the discoveries are individually dissimilar and it’s hard to come up with a common currency for them. The analogous problem will prevent a self-improving AI from being directly analogous to a uranium heap, with almost perfectly smooth exponential increase at a calculable rate. You can’t apply the same software improvement to the same line of code over and over again, you’ve got to invent a new improvement each time. But if self-improvements are triggering more self-improvements with great regularity, you might stand a long way back from the AI, blur your eyes a bit, and ask: What is the AI’s average neutron multiplication factor?
Economics seems to me to be largely the study of production cycles—highly regular repeatable value-adding actions. This doesn’t seem to me like a very deep abstraction so far as the study of optimization goes, because it leaves out the creation of novel knowledge and novel designs—further informational optimizations. Or rather, treats productivity improvements as a mostly exogenous factor produced by black-box engineers and scientists. (If I underestimate your power and merely parody your field, by all means inform me what kind of economic study has been done of such things.) (Answered: This literature goes by the name “endogenous growth”. See comments starting here.) So far as I can tell, economists do not venture into asking where discoveries come from, leaving the mysteries of the brain to cognitive scientists.
(Nor do I object to this division of labor—it just means that you may have to drag in some extra concepts from outside economics if you want an account of self-improving Artificial Intelligence. Would most economists even object to that statement? But if you think you can do the whole analysis using standard econ concepts, then I’m willing to see it...)
Insight is that mysterious thing humans do by grokking the search space, wherein one piece of highly abstract knowledge (e.g. Newton’s calculus) provides the master key to a huge set of problems. Since humans deal in the compressibility of compressible search spaces (at least the part we can compress) we can bite off huge chunks in one go. This is not mere cascading, where one solution leads to another:
Rather, an “insight” is a chunk of knowledge which, if you possess it, decreases the cost of solving a whole range of governed problems.
There’s a parable I once wrote—I forget what for, I think ev-bio—which dealt with creatures who’d evolved addition in response to some kind of environmental problem, and not with overly sophisticated brains—so they started with the ability to add 5 to things (which was a significant fitness advantage because it let them solve some of their problems), then accreted another adaptation to add 6 to odd numbers. Until, some time later, there wasn’t a reproductive advantage to “general addition”, because the set of special cases covered almost everything found in the environment.
There may be even be a real-world example of this. If you glance at a set, you should be able to instantly distinguish the numbers one, two, three, four, and five, but seven objects in an arbitrary (non-canonical pattern) will take at least one noticeable instant to count. IIRC, it’s been suggested that we have hardwired numerosity-detectors but only up to five.
I say all this, to note the difference between evolution nibbling bits off the immediate search neighborhood, versus the human ability to do things in one fell swoop.
Our compression of the search space is also responsible for ideas cascading much more easily than adaptations. We actively examine good ideas, looking for neighbors.
But an insight is higher-level than this; it consists of understanding what’s “good” about an idea in a way that divorces it from any single point in the search space. In this way you can crack whole volumes of the solution space in one swell foop. The insight of calculus apart from gravity is again a good example, or the insight of mathematical physics apart from calculus, or the insight of math apart from mathematical physics.
Evolution is not completely barred from making “discoveries” that decrease the cost of a very wide range of further discoveries. Consider e.g. the ribosome, which was capable of manufacturing a far wider range of proteins than whatever it was actually making at the time of its adaptation: this is a general cost-decreaser for a wide range of adaptations. It likewise seems likely that various types of neuron have reasonably-general learning paradigms built into them (gradient descent, Hebbian learning, more sophisticated optimizers) that have been reused for many more problems than they were originally invented for.
A ribosome is something like insight: an item of “knowledge” that tremendously decreases the cost of inventing a wide range of solutions. But even evolution’s best “insights” are not quite like the human kind. A sufficiently powerful human insight often approaches a closed form—it doesn’t feel like you’re exploring even a compressed search space. You just apply the insight-knowledge to whatever your problem, and out pops the now-obvious solution.
Insights have often cascaded, in human history—even
major insights. But they don’t quite cycle—you can’t repeat the
identical pattern Newton used originally to get a new kind of calculus
that’s twice and then three times as powerful.
Human AI programmers who have insights into intelligence may acquire discontinuous advantages over others who lack those insights. AIs themselves will experience discontinuities in their growth trajectory associated with becoming able to do AI theory itself—a watershed moment in the FOOM.
Cascades, Cycles, Insight...
Followup to: Surprised by Brains
Five sources of discontinuity: 1, 2, and 3...
Cascades are when one thing leads to another. Human brains are effectively discontinuous with chimpanzee brains due to a whole bag of design improvements, even though they and we share 95% genetic material and only a few million years have elapsed since the branch. Why this whole series of improvements in us, relative to chimpanzees? Why haven’t some of the same improvements occurred in other primates?
Well, this is not a question on which one may speak with authority (so far as I know). But I would venture an unoriginal guess that, in the hominid line, one thing led to another.
The chimp-level task of modeling others, in the hominid line, led to improved self-modeling which supported recursion which enabled language which birthed politics that increased the selection pressure for outwitting which led to sexual selection on wittiness...
...or something. It’s hard to tell by looking at the fossil record what happened in what order and why. The point being that it wasn’t one optimization that pushed humans ahead of chimps, but rather a cascade of optimizations that, in Pan, never got started.
We fell up the stairs, you might say. It’s not that the first stair ends the world, but if you fall up one stair, you’re more likely to fall up the second, the third, the fourth...
I will concede that farming was a watershed invention in the history of the human species, though it intrigues me for a different reason than Robin. Robin, presumably, is interested because the economy grew by two orders of magnitude, or something like that. But did having a hundred times as many humans, lead to a hundred times as much thought-optimization accumulating per unit time? It doesn’t seem likely, especially in the age before writing and telephones. But farming, because of its sedentary and repeatable nature, led to repeatable trade, which led to debt records. Aha! - now we have writing. There’s a significant invention, from the perspective of cumulative optimization by brains. Farming isn’t writing but it cascaded to writing.
Farming also cascaded (by way of surpluses and cities) to support professional specialization. I suspect that having someone spend their whole life thinking about topic X instead of a hundred farmers occasionally pondering it, is a more significant jump in cumulative optimization than the gap between a hundred farmers and one hunter-gatherer pondering something.
Farming is not the same trick as professional specialization or writing, but it cascaded to professional specialization and writing, and so the pace of human history picked up enormously after agriculture. Thus I would interpret the story.
From a zoomed-out perspective, cascades can lead to what look like discontinuities in the historical record, even given a steady optimization pressure in the background. It’s not that natural selection sped up during hominid evolution. But the search neighborhood contained a low-hanging fruit of high slope… that led to another fruit… which led to another fruit… and so, walking at a constant rate, we fell up the stairs. If you see what I’m saying.
Predicting what sort of things are likely to cascade, seems like a very difficult sort of problem.
But I will venture the observation that—with a sample size of one, and an optimization process very different from human thought—there was a cascade in the region of the transition from primate to human intelligence.
Cycles happen when you connect the output pipe to the input pipe in a repeatable transformation. You might think of them as a special case of cascades with very high regularity. (From which you’ll note that in the cases above, I talked about cascades through differing events: farming → writing.)
The notion of cycles as a source of discontinuity might seem counterintuitive, since it’s so regular. But consider this important lesson of history:
Once upon a time, in a squash court beneath Stagg Field at the University of Chicago, physicists were building a shape like a giant doorknob out of alternate layers of graphite and uranium...
The key number for the “pile” is the effective neutron multiplication factor. When a uranium atom splits, it releases neutrons—some right away, some after delay while byproducts decay further. Some neutrons escape the pile, some neutrons strike another uranium atom and cause an additional fission. The effective neutron multiplication factor, denoted k, is the average number of neutrons from a single fissioning uranium atom that cause another fission. At k less than 1, the pile is “subcritical”. At k >= 1, the pile is “critical”. Fermi calculates that the pile will reach k=1 between layers 56 and 57.
On December 2nd in 1942, with layer 57 completed, Fermi orders the final experiment to begin. All but one of the control rods (strips of wood covered with neutron-absorbing cadmium foil) are withdrawn. At 10:37am, Fermi orders the final control rod withdrawn about half-way out. The geiger counters click faster, and a graph pen moves upward. “This is not it,” says Fermi, “the trace will go to this point and level off,” indicating a spot on the graph. In a few minutes the graph pen comes to the indicated point, and does not go above it. Seven minutes later, Fermi orders the rod pulled out another foot. Again the radiation rises, then levels off. The rod is pulled out another six inches, then another, then another.
At 11:30, the slow rise of the graph pen is punctuated by an enormous CRASH—an emergency control rod, triggered by an ionization chamber, activates and shuts down the pile, which is still short of criticality.
Fermi orders the team to break for lunch.
At 2pm the team reconvenes, withdraws and locks the emergency control rod, and moves the control rod to its last setting. Fermi makes some measurements and calculations, then again begins the process of withdrawing the rod in slow increments. At 3:25pm, Fermi orders the rod withdrawn another twelve inches. “This is going to do it,” Fermi says. “Now it will become self-sustaining. The trace will climb and continue to climb. It will not level off.”
Herbert Anderson recounted (as told in Rhodes’s The Making of the Atomic Bomb):
Fermi kept the pile running for twenty-eight minutes, with the neutron intensity doubling every two minutes.
That first critical reaction had k of 1.0006.
It might seem that a cycle, with the same thing happening over and over again, ought to exhibit continuous behavior. In one sense it does. But if you pile on one more uranium brick, or pull out the control rod another twelve inches, there’s one hell of a big difference between k of 0.9994 and k of 1.0006.
If, rather than being able to calculate, rather than foreseeing and taking cautions, Fermi had just reasoned that 57 layers ought not to behave all that differently from 56 layers—well, it wouldn’t have been a good year to be a student at the University of Chicago.
The inexact analogy to the domain of self-improving AI is left as an exercise for the reader, at least for now.
Economists like to measure cycles because they happen repeatedly. You take a potato and an hour of labor and make a potato clock which you sell for two potatoes; and you do this over and over and over again, so an economist can come by and watch how you do it.
As I noted here at some length, economists are much less likely to go around measuring how many scientific discoveries it takes to produce a new scientific discovery. All the discoveries are individually dissimilar and it’s hard to come up with a common currency for them. The analogous problem will prevent a self-improving AI from being directly analogous to a uranium heap, with almost perfectly smooth exponential increase at a calculable rate. You can’t apply the same software improvement to the same line of code over and over again, you’ve got to invent a new improvement each time. But if self-improvements are triggering more self-improvements with great regularity, you might stand a long way back from the AI, blur your eyes a bit, and ask: What is the AI’s average neutron multiplication factor?
Economics seems to me to be largely the study of production cycles—highly regular repeatable value-adding actions. This doesn’t seem to me like a very deep abstraction so far as the study of optimization goes, because it leaves out the creation of novel knowledge and novel designs—further informational optimizations. Or rather, treats productivity improvements as a mostly exogenous factor produced by black-box engineers and scientists. (If I underestimate your power and merely parody your field, by all means inform me what kind of economic study has been done of such things.) (Answered: This literature goes by the name “endogenous growth”. See comments starting here.) So far as I can tell, economists do not venture into asking where discoveries come from, leaving the mysteries of the brain to cognitive scientists.
(Nor do I object to this division of labor—it just means that you may have to drag in some extra concepts from outside economics if you want an account of self-improving Artificial Intelligence. Would most economists even object to that statement? But if you think you can do the whole analysis using standard econ concepts, then I’m willing to see it...)
Insight is that mysterious thing humans do by grokking the search space, wherein one piece of highly abstract knowledge (e.g. Newton’s calculus) provides the master key to a huge set of problems. Since humans deal in the compressibility of compressible search spaces (at least the part we can compress) we can bite off huge chunks in one go. This is not mere cascading, where one solution leads to another:
Rather, an “insight” is a chunk of knowledge which, if you possess it, decreases the cost of solving a whole range of governed problems.
There’s a parable I once wrote—I forget what for, I think ev-bio—which dealt with creatures who’d evolved addition in response to some kind of environmental problem, and not with overly sophisticated brains—so they started with the ability to add 5 to things (which was a significant fitness advantage because it let them solve some of their problems), then accreted another adaptation to add 6 to odd numbers. Until, some time later, there wasn’t a reproductive advantage to “general addition”, because the set of special cases covered almost everything found in the environment.
There may be even be a real-world example of this. If you glance at a set, you should be able to instantly distinguish the numbers one, two, three, four, and five, but seven objects in an arbitrary (non-canonical pattern) will take at least one noticeable instant to count. IIRC, it’s been suggested that we have hardwired numerosity-detectors but only up to five.
I say all this, to note the difference between evolution nibbling bits off the immediate search neighborhood, versus the human ability to do things in one fell swoop.
Our compression of the search space is also responsible for ideas cascading much more easily than adaptations. We actively examine good ideas, looking for neighbors.
But an insight is higher-level than this; it consists of understanding what’s “good” about an idea in a way that divorces it from any single point in the search space. In this way you can crack whole volumes of the solution space in one swell foop. The insight of calculus apart from gravity is again a good example, or the insight of mathematical physics apart from calculus, or the insight of math apart from mathematical physics.
Evolution is not completely barred from making “discoveries” that decrease the cost of a very wide range of further discoveries. Consider e.g. the ribosome, which was capable of manufacturing a far wider range of proteins than whatever it was actually making at the time of its adaptation: this is a general cost-decreaser for a wide range of adaptations. It likewise seems likely that various types of neuron have reasonably-general learning paradigms built into them (gradient descent, Hebbian learning, more sophisticated optimizers) that have been reused for many more problems than they were originally invented for.
A ribosome is something like insight: an item of “knowledge” that tremendously decreases the cost of inventing a wide range of solutions. But even evolution’s best “insights” are not quite like the human kind. A sufficiently powerful human insight often approaches a closed form—it doesn’t feel like you’re exploring even a compressed search space. You just apply the insight-knowledge to whatever your problem, and out pops the now-obvious solution.
Insights have often cascaded, in human history—even major insights. But they don’t quite cycle—you can’t repeat the identical pattern Newton used originally to get a new kind of calculus that’s twice and then three times as powerful.
Human AI programmers who have insights into intelligence may acquire discontinuous advantages over others who lack those insights. AIs themselves will experience discontinuities in their growth trajectory associated with becoming able to do AI theory itself—a watershed moment in the FOOM.