Super obvious re-rebut: sociopaths exist, and yet civilization endures.
Also, we can rather obviously test in safe simulation sandboxes and avoid copying sociopaths. The argument that sociopaths are a fundemental showstopper must be based then on some magical view of the brain (because obviously evolution succeeds in producing non sociopaths, so we can copy its techniques if they are nonmagical).
Remember the argument is against existential threat level UFAI, not some fraction of evil AIs in a large population.
I think you misunderstand my argument. The point is that it’s ridiculous to say that human beings are ‘universal learning machines’ and you can just raise any learning algorithm as a human child and it’ll turn out fine. We can’t even raise 2-5% of HUMAN CHILDREN as human children and have it reliably turn out okay.
Sociopaths are different from baseline humans by a tiny degree. It’s got to be a small number of single-gene mutations. A tiny shift in information. And that’s all it takes to make them consistently UnFriendly, regardless of how well they’re raised. Obviously, AIs are going to be more different from us than that. And that’s a pretty good reason to think that we can’t just blithely assume that putting Skynet through preschool is going to keep us safe.
Human values are obviously hard coded in large part, and the hard coded portions seem to be crucial. That hard coding is not going to be present in an arbitrary AI, which means we have to go and duplicate it out of a human brain. Which is HARD. Which is why we’re having this discussion in the first place.
The point is that it’s ridiculous to say that human beings are ‘universal learning machines’
No—it is not. See the article for the in depth argument and citations backing up this statement.
you can just raise any learning algorithm as a human child and it’ll turn out fine.
Well almost—A ULM also requires a utility function or reward circuitry with some initial complexity, but we can also use the same universal learning algorithms to learn that component. It is just another circuit, and we can learn any circuit that evolution learned.
And that’s all it takes to make them consistently UnFriendly, regardless of how well they’re raised.
Sure—which is why I discussed sim sandbox testing. Did you read about my sim sandbox idea? We test designs in a safe sandbox sim, and we don’t copy sociopaths.
Obviously, AIs are going to be more different from us than that
No, this isn’t obvious at all. AGI is going to be built from the same principles as the brain—because the brain is a universal learning machine. The AGI’s mind structure will be learned from training and experiential data such that the AI learns how to think like humans and learns how to be human—just like humans do. Human minds are software constructs—without that software we would just be animals (feral humans). An artificial brain is just another computer that can run the human mind software.
That hard coding is not going to be present in an arbitrary AI, which means we have to go and duplicate it out of a human brain. Which is HARD.
Yes, but it’s only a part of the brain and a fraction of the brain’s complexity, so obviously it can’t be harder than reverse engineering the whole brain.
A ULM also requires a utility function or reward circuitry with some initial complexity, but we can also use the same universal learning algorithms to learn that component. It is just another circuit, and we can learn any circuit that evolution learned.
Okay, so we just have to determine human terminal values in detail, and plug them into a powerful maximizer. I’m not sure I see how that’s different from the standard problem statement for friendly AI. Learning values by observing people is exactly what MIRI is working on, and it’s not a trivial problem.
For example: say your universal learning algorithm observes a human being fail a math test. How does it determine that the human being didn’t want to fail the math test? How does it cleanly separate values from their (flawed) implementation? What does it do when peoples’ values differ? These are hard questions, and precisely the ones that are being worked on by the AI risk people.
Other points of critique:
Saying the phrase “safe sandbox sim” is much easier than making a virtual machine that can withstand a superhuman intelligence trying to get out of it. Even if your software is perfect, it can still figure out that its world is artificial and figure out ways of blackmailing its captors. Probably doing what MIRI is looking into, and designing agents that won’t resist attempts to modify them (corrigibility) is a more robust solution.
You want to be careful about just plugging in a learned human utility function into a powerful maximizer, and then raising it. If it’s maximizing its own utility, which is necessary if you want it to behave anything like a child, what’s to stop it from learning human greed and cruelty, and becoming an eternal tyrant? I don’t trust a typical human to be god.
And even if you give up on that idea, and have to maximize a utility function defined in terms of humanity’s values, you still have problems. For starters, you want to be able to prove formally that its goals will remain stable as it self-modifies, and it won’t create powerful sub-agents who don’t share those goals. Which is the other class of problems that MIRI works on.
Okay, so we just have to determine human terminal values in detail, and plug them into a powerful maximizer.
Why do you even go around thinking that the concept of “terminal values”, which is basically just a consequentialist steelmanning Aristotle, cuts reality at the joints?
For starters, you want to be able to prove formally that its goals will remain stable as it self-modifies
That part honestly isn’t that hard once you read the available literature about paradox theorems.
Okay, so we just have to determine human terminal values in detail, and plug them into a powerful maximizer.
No—not at all. Perhaps you have read too much MIRI material, and not enough of the neuroscience and machine learning I referenced. An infant is not born with human ‘terminal values’. It is born with some minimal initial reward learning circuitry to bootstrap learning of complex values from adults.
Stop thinking of AGI as some wierd mathy program. Instead think of brain emulations—and then you have obvious answers to all of these questions.
Saying the phrase “safe sandbox sim” is much easier than making a virtual machine that can withstand a superhuman intelligence trying to get out of it.
You apparently didn’t read my article or links to earlier discussion? We can easily limit the capability of minds by controlling knowledge. A million smart evil humans is dangerous—but only if they have modern knowledge. If they have only say medieval knowledge, they are hardly dangerous. Also—they don’t realize they are in a sim. Also—the point of the sandbox sims is to test architectures, reward learning systems, and most importantly—altruism. Designs that work well in these safe sims are then copied into less safe sims and finally the real world.
Consider the orthogonality thesis—AI of any intelligence level can be combined with any values. Thus we can test values on young/limited AI before scaling up their power.
Sandbox sims can be arbitrarily safe. It is the only truly practical workable proposal to date. It is also the closest to what is already used in industry. Thus it is the solution by default.
Even if your software is perfect, it can still figure out that its world is artificial and figure out ways of blackmailing its captors
Ridiculous nonsense. Many humans today are aware of the sim argument. The gnostics were aware in some sense 2,000 years ago. Do you think any of them broke out? Are you trying to break out? How?
If it’s maximizing its own utility, which is necessary if you want it to behave anything like a child, what’s to stop it from learning human greed and cruelty, and becoming an eternal tyrant?
Again, stop thinking we create a single AI program and then we are done. It will be a largescale evolutionary process, with endless selection, testing, and refinement. We can select for super altruistic moral beings—like bhudda/gandhi/jesus level. We can take the human capability for altruism, refine it, and expand on it vastly.
For starters, you want to be able to prove formally that its goals will remain stable as it self-modifies,
So, to sum up, your plan is to create an arbitrarily safe VM, and use it to run brain-emulation-style denovo AIs patterned on human babies (presumably with additional infrastructure to emulate the hard-coded changes that occur in the brain during development to adulthood: adult humans are not babies + education). You then want to raise many, many iterations of these things under different conditions to try to produce morally superior specimens, then turn those AIs loose and let them self modify to godhood.
Is that accurate? (Seriously, let me know if I’m misrepresenting your position).
A few problems immediately come to mind. We’ll set aside the moral horror of what you just described as a necessary evil to avert the apocalypse, for the time being.
More practically, I think you’re being racist against weird mathy programs.
For starters, I think weird mathy programs will be a good deal easier to develop than digital people. Human beings are not just general optimizers. We have modules that function roughly like one, which we use under some limited circumstances, but anyone who’s ever struggled with procrastination or put their keys in the refrigerator knows that your goal-oriented systems are entangled with a huge number of cheap heuristics at various levels, many of which are not remotely goal-oriented.
All of this stuff is deeply tangled up with what we think of as the human ‘utility function,’ because evolution has no incentive to design a clean separation between planning and values. Replicating all of that accurately enough to get something that thinks and behaves like a person is likely much harder than making a weird mathy program that’s good at modelling the world and coming up with plans.
There’s also the point that there really isn’t a good way to make a brain emulation smarter. Weird, mathy programs—even ones that use neural networks as subroutines—often have obvious avenues to making them smarter, and many can scale smoothly with processing resources. Brain emulations are much harder to bootstrap, and it’d be very difficult to preserve their behavior through the transition.
My best guess is, they’d probably go nuts and end up as an eldritch horror. And if not, they’re still going to get curb stomped by the first weird mathy program to come along, because they’re saddled with all of our human imperfections and unnecessary complexity. The upshot of all of this is that they don’t serve the purpose of protecting us from future UFAIs.
Finally, the process you described isn’t really something you can start on (aside from the VM angle) until you already have human level AGIs, and a deep and total understanding of all of the operation of the human brain. Then, while you’re setting up your crazy AI concentration camp and burning tens of thousands of man-years of compute time searching for AI Buddha, some bright spark in a basement with a GPU cluster has the much easier task of just cludging together something smart enough to recursively self-improve. You’re in a race with a bunch of people trying to solve a much easier problem, and (unlike MIRI) you don’t have decades of lead time to get a head start on the problem. Your large-scale evolutionary process would take much, much too much time and money to actually save the world.
In short, I think it’s a really bad idea. Although now that I understand what you’re getting at, it’s less obviously silly than what I originally thought you were proposing. I apologize.
So, to sum up, your plan is to create an arbitrarily safe VM, and use it to run brain-emulation-style denovo AIs
No. I said:
Stop thinking of AGI as some wierd mathy program. Instead think of brain emulations—and then you have obvious answers to all of these questions.
I used brain emulations as analogy to help aid your understanding. Because unless you have deep knowledge of machine learning and computational neuroscience, there are huge inferential distances to cross.
Human beings are not just general optimizers.
Yes we are. I have made a detailed, extensive, citation-full, and well reviewed case that human minds are just that.
All of our understanding about the future of AGI is based ultimately on our models of the brain and AI in general. I am claiming that the MIRI viewpoint is based on an outdated model of the brain, and a poor understanding of the limits of computation and intelligence.
I will summarize for one last time. I will then no longer repeat myself because it is not worthy of my time—any time spent arguing this is better spent preparing another detailed article, rather than a little comment.
There is extensive uncertainty concerning how the brain works and what types of future AI are possible in practice. In situations of such uncertainty, any good sane probabilistic reasoning agent should come up with a multimodal distribution that spreads belief across several major clusters. If your understanding of AI comes mainly from reading LW—you are probably biased beyond hope. I’m sorry, but this is true. You are stuck in box and don’t even know it.
Here are the main key questions that lead to different belief clusters:
Are the brain’s algorithms for intelligence complex or simple?
And related—are human minds mainly software or mainly hardware?
At the practical computational level, does the brain implement said algorithms efficiently or not?
If the human mind is built out of a complex mess of hardware specific circuits, and the brain is far from efficient, than there is little to learn from the brain. This is Yudkowsky/MIRI’s position. This viewpoint leads to a focus on pure math and avoidance of anything brain-like (such as neural nets). In this viewpoint hard takeoff is likely, AI is predicted to be nothing like human minds, etc.
If you believe that the human is complex and messy hardware, but the brain is efficient, than you get Hanson’s viewpoint where the future is dominated by brain emulations. The brain ems win over brain inspired AI because scanning real brain circuitry is easier than figuring out how it works.
Now what if the brain’s algorithms are not complex, and the brain is efficient? Then you get my viewpoint cluster.
These questions are empirical—and they can be answered today. In fact, I realized all this years ago and spent a huge amount of time learning more about the future of computer hardware, the limits of computation, machine learning, and computational neuroscience.
Yudkowsky, Hanson, and to some extent Bostrom—were all heavily inspired by the highly influential evolved modularity hypothesis in ev psych from Tooby and Cosmides. In this viewpoint, the brain is complex, and most of our algorithmic content is hardware based rather than software. I have argued that this viewpoint has been tested empirically and now disproven. The brain is built out of relatively simple universal learning algorithms. It will essentially be almost impossible to build practical AGI that is very different from the brain (remember, AGI is defined as software which can do everything the brain does).
Bostrom/Yudkowksky have also argued that the brain is very far from efficient. For example, from true sources of disagreement:
Human neurons run at less than a millionth the speed of transistors, transmit spikes at less than a millionth the speed of light, and dissipate around a million times the heat per synaptic operation as the thermodynamic minimum for a one-bit operation at room temperature. Physically speaking, it ought to be possible to run a brain at a million times the speed without shrinking it, cooling it, or invoking reversible computing or quantum computing.
The first two statements are true, the third statement is problematic, and the thrust of the conclusion is incorrect. The minimum realistic energy for a brain-like circuit is probably close to what the brain actually uses:
the landauer bound depends on speed and reliability. The 10^-21 J/bit bound only applies to a signal of infinitely low frequency. For realistic fast reliable signals, the bound is 100 times higher: around 10^-19 J/bit.
the landauer bound applies to single 1 bit ops. The fundamental bound for a 32 bit flop is around 10^5 or 10^6 times higher. Moore’s Law is ending and we are actually close to these bounds already. Synapses perform analog ops which have lower cost than a 32 bit flop, but still a much higher cost than a single bit op.
most of the energy consumption in any advanced computer comes from wire dissipation, not switch dissipation. Signaling in the brain uses roughly 0.5x10^-14 J/bit/mm (5 fJ/bit/mm) 2, which appears to be within an order of mag or two of optimal, and is perhaps one order of magnitude more efficient than current computers. Wire signal energy in computers is not improving significantly. For example, for 40nm tech in 2010, the wire energy is 240 fj/bit/mm, and is predicted to be around 150 to 115 by 2017 3. The practical limit is perhaps around 1 fJ/bit/mm, but that would probably require much lower speeds.
These errors add up to around 6 orders of magnitude or so. The brain is near the limits of energy efficiency for what it does in terms of irreversible computation. No practical machine we will ever build in the near future is going to be many orders of magnitude more efficient than the brain. Yes, eventually reversible and quantum computing could perhaps result in large improvements, but those technologies are far and will come long after neuromorphic AGI.
Yes we are. I have made a detailed, extensive, citation-full, and well reviewed case that human minds are just that.
That isn’t quite correct. We do have hard wiring that raises and lowers the from-the-inside importance of specific features present in our learning data. That is, we have a nontrivial inductive bias which not all possible minds will have, even when we start by assuming that all minds are semi-modular universal learners.
Yes, I’ve read your big universal learner post, and I’m not convinced. This does seem to be the crux of our disagreement, so let me take some time to rebut:
First off, you’re seriously misrepresenting the success of deep learning as support for your thesis. Deep learning algorithms are extremely powerful, and probably have a role to play in building AGI, but they aren’t the end-all, be-all of AI research. For starters, modern deep learning systems are absolutely fine-tuned to the task at hand. You say that they have only “a small number of hyperparameters.” which is something of a misrepresentation. There are actually quite a few of these hyperparameters in state-of-the-art networks, and there are more in networks tackling more difficult tasks.
Tuning these hyperparameters is hard enough that only a small number of researchers can do it well enough to achieve state of the art results. We do not use the same network for image recognition and audio processing, because that wouldn’t work very well.
We tune the architecture of deep learning systems to the task at hand. Presumably, if we can garner benefits from doing that, evolution has an incentive to do the same. There’s a core, simple algorithm at work, but targeted to specific tasks. Evolution has no incentive to produce a clean design if cludgy tweaks give better results. You argue that evolution has a bias against complexity, but that certainly hasn’t stopped other organs from developing complex structure to make them marginally better at the task.
There’s also the point that there’s plenty of tasks that deep learning methods can’t solve yet (like how to store long-term memories of a complex and partially observed system in an efficient manner) - not to mention higher level cognitive skills that we have no clue how to approach.
Nobody thinks this stuff is just a question of throwing yet larger deep learning networks at the problem. They will be solved by finding different hard-wired network architectures that make the problem more manageable by knowing things about it in advance.
The ferret brain rewiring result is not a slam-dunk for the universal learning by itself. It just means that different brain modules can switch which pre-programmed neural algorithms they implement on the fly. Which makes sense, because on some level these things have to be self-organizing in the first place to be compactly genetically coded.
The real test here would be to take a brain and give it an entirely new sense—something that bears no resemblance to any sense it or any of its ancestors has ever had, and see if it can use that sense as naturally as hearing or vision. Personally, I doubt it. Humans can learn echolocation, but they can’t learn echolocation the way bats and dolphins can learn echolocation—and echolocation bears a fair degree of resemblance to other tasks that humans already have specialized networks for (like pinpointing the location of a sound in space).
Notably, the general learner hypothesis does not explain why non-surgically-modified brains are so standardized in structure and functional layout. Something that you yourself bring up in your article.
It also does not explain why birds are better at language tasks than cats. Cat brains are much larger. The training rewards in the lab are the same. And, yet, cats significantly underperform parrots at every single language-related task we can come up with. Why? Because the parrots have had a greater evolutionary pressure to be good at language-style tasks—and, as a result, they have evolved task-specific neurological algorithms to make it easier.
Also, plenty of mammals, fresh out of the womb, have complex behaviors and vocalizations. Humans are something of an outlier, due to being born premature by mammal standards. If mammal brains are 99% universal learning, why can baby cows walk within minutes of birth?
Look, obviously, to some degree, both things are true. The brain is capable of general learning to some degree. Otherwise, we’d never have developed math. It also obviously has hard-coded specialized modules, to some degree, which is why (for example) all human cultures develop language and music, which isn’t something you’d expect if we were all starting from zero. The question is which aspect dominates brain performance. You’re proposing an extreme swing to one end of the possibility space that doesn’t seem even remotely plausible—and then you’e using that assumption as evidence that no non-brain-like intelligence can exist.
What about Watson? It’s the best-performing NLP system ever made, and it’s absolutely a “weird mathy program.” It uses neural networks as subroutines, but the architecture of the whole bears no resemblance to the human brain. It’s not a simple universal learning algorithm. If you gave a single deep neural network access to the same computational resources, it would underperform Watson. That seems like a pretty tough pill to swallow if ‘simple universal learner’ is all there is to intelligence.
Finally, I don’t have the background to refute your argument on the efficiency of the brain (although I know clever people who do who disagree with you). But, taking it as a given that you’re right, it sounds like you’re assuming all future AIs will draw the same amount of power as a real brain and fit in the same spatial footprint. Well… what if they didn’t? What if the AI brain is the size of a fridge and cooled with LN2 and consumes as much power as a city block? Surely at the physical limits of computation you believe in, that would be able to beat the pants off little old us.
To sum up: yes, I’ve read your thing. No, it’s not as convincing as you seem to believe.
It also does not explain why birds are better at language tasks than cats. Cat brains are much larger. The training rewards in the lab are the same. And, yet, cats significantly underperform parrots at every single language-related task we can come up with. Why? Because the parrots have had a greater evolutionary pressure to be good at language-style tasks—and, as a result, they have evolved task-specific neurological algorithms to make it easier.
Cat brains are much larger, but physical size is irrelevant. What matters is neuron/synapse count.
According to my ULM theory—the most likely explanation for the superior learning ability of parrots is a larger number of neurons/synapses in their general learning modules - (whatever the equivalent of the cortex is in birds) and thus more computational power available for general learning.
Stop right now, and consider this bet—I will bet that parrots have more neurons/synapses in their cortex-equivalent brain regions than cats.
We show that in parrots and songbirds the total brain mass as well as telencephalic mass scales approximately linearly with the total number of neurons, i.e. neuronal density does not change significantly as brains get larger. The neuronal densities in the telencephalon exceed those observed in the cerebral cortex of primates by a factor of 2-8. As a result, the numbers of telencephalic neurons in the brains of the largest birds examined (raven, kea and macaw) equal or exceed those observed in the cerebral cortex of many species of monkeys.
Finally, our findings of comparable numbers of neurons in the cerebral cortex of medium-sized primates and in the telencephalon of large parrots and songbirds (particularly corvids) strongly suggest that large numbers of forebrain neurons, and hence a large computational capacity, underpin the behavioral and cognitive complexity reported for parrots and songbirds, despite their small brain size.
The telencephalon is believed to be the equivalent of the cortex in birds. The cortex of the smallest monkeys have about 400 million neurons, whereas the cat’s cortex has about 300 million neurons. A medium sized monkey such as a night monkey has more than 1 billion cortical neurons.
Interesting! I didn’t know that, and that makes a lot of sense.
If I were to restate my objection more strongly, I’d say that parrots also seem to exceed chimps in language capabilities (chimps having six billion cortical neurons). The reason I didn’t bring this up originally is that chimp language research is a horrible, horrible field full of a lot of bad science, so it’s difficult to be too confident in that result.
Plenty of people will tell you that signing chimps are just as capable as Alex the parrot—they just need a little bit of interpretation from the handler, and get too nervous to perform well when the handler isn’t working with them. Personally, I think that sounds a lot like why psychics suddenly stop working when James Randi shows up, but obviously the situation is a little more complicated.
I’d strongly suggest the movie project nim, if you haven’t seen it. In some respects chimpanzee intelligence develops faster than that of a human child, but it also planes off much earlier. Their childhood development period is much shorter.
To first approximation, general intelligence in animals can be predicted by number of neurons/synapses in general learning modules, but this isn’t the only factor. I don’t have an exact figure, but that poster article suggests parrots have perhaps 1-3 billion ish cortical neuron equivalent.
The next most important factor is probably degree of neotany or learning window. Human intelligence develops over the span of 20 years. Parrots seem exceptional in terms of lifespan and are thus perhaps more human like—where they maintain a childlike state for much longer. We know from machine learning that the ‘learning rate’ is a super important hyperparameter—learning faster has a huge advantage, but if you learn too fast you get inferior long term results for your capacity. Learning slowly is obviously more costly, but it can generate more efficient circuits in the long term.
I inferred/guessed that parrots have very long neotenic learning windows, and the articles on Alex seem to confirm this.
Alex reached a vocabulary of about 100 words by age 29, a few year’s before his untimely death. The trainer—Irene Pepperberg - claims that Alex was still learning and had not reached peak capability. She rated Alex’s intelligence as roughly equivalent to that of a 5 year old. This about makes sense if the parrot has roughly 1/6th our number of cortical neurons, but has similar learning efficiency and long learning window.
To really compare chimp vs parrot learning ability, we’d need more than a handful of samples. There is also a large selection effect here—because parrots make reasonably good pets, whereas chimps are terrible dangerous pets. So we haven’t tested chimps as much. Alex is more likely to be a very bright parrot, whereas the handful of chimps we have tested are more likely to be average.
Not much to add here, except that it’s unlikely that Alex is an exceptional example of a parrot. The researcher purchased him from a pet store at random to try to eliminate that objection.
The neuronal densities in the telencephalon exceed those observed in the cerebral cortex of primates by a factor of 2-8.
This is curious. I wonder if bird brains are also more energy efficient as a result of the greater neuronal densities (since that implies shorter wires). According to Ratio of central nervous system to body metabolism in vertebrates: its constancy and functional basis the metabolism of the brain of Corvus sp (unknown species of genus Corvus, which includes the ravens) is 0.52 cm^3 O2/min whereas the metabolism of the brain of a macaque monkey is 3.4 cm^3 O2/min. Presumably the macaque monkey has more non-cortical neurons which account for some the difference, but this still seems impressive if the Corvus sp and macaque monkey have a similar number of telencephalic/cortical neurons (1.4B for the macaque according to this paper). Unfortunately I can’t find the full paper of the abstract you linked to to check the details.
I wonder if bird brains are also more energy efficient as a result of the greater neuronal densities (since that implies shorter wires).
Yes—that seems to be the point of that poster I found earlier.
From an evolutionary point of view it makes sense—birds are under tremendous optimization pressure for mass efficiency. Hummingbirds are a great example of how far evolution can push flight and weight efficiency.
Primate/human brains also appear to have more density optimization than say elephants or cetaceans, but it is interesting that birds are even so much more density efficient. Presumably there are some other tradeoffs—perhaps the bird brain design is too hot to scale up to large sizes, and uses too much resources, etc.
Unfortunately I can’t find the full paper of the abstract you linked to to check the details.
It was a recent poster—so perhaps it is still a paper in progress? They claim to have ran the defractionator experiments on bird brains, so they should have estimates of the actual neuron counts to back up their general claims, but they didn’t provide those in the abstract. Perhaps the data exists somewhere as an image from the actual presentation. Oh well.
Yes, I’ve read your big universal learner post, and I’m not convinced.
Do you actually believe that evolved modularity is a better explanation of the brain then the ULM hypothesis? Do you have evidence for this belief or is it simply that which you want to be true? Do you understand why the computational neuroscience and machine learning folks are moving away from the latter towards the former? If you do have evidence please provide it in a critique in the comments for that post where I will respond.
First off, you’re seriously misrepresenting the success of deep learning as support for your thesis. Deep learning algorithms are extremely powerful, and probably have a role to play in building AGI, but they aren’t the end-all, be-all of AI research.
Make some specific predictions for the next 5 years about deep learning or ANNs. Let us see if we actually have significant differences of opinion. If so I expect to dominate you in any prediction market or bets concerning the near term future of AI.
First off the bat, you absolutely can create an AGI that is a pure ANN. In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN. Your claim that ANNs/Deep Learning is not the end of all AGI research is quickly becoming a minority position.
Humans can learn echolocation, but they can’t learn echolocation the way bats and dolphins can learn echolocation
Notably, the general learner hypothesis does not explain why non-surgically-modified brains are so standardized in structure and functional layout. Something that you yourself bring up in your article.
I discussed this in the comments—it absolutely does explain neurotypical standardization. It’s a result of topographic/geometric wiring optimization. There is an exactly optimal location for every piece of functionality, and the brain tends to find those same optimal locations in each human. But if you significantly perturb the input sense or the brain geometry, you can get radically different results.
Consider the case of extremehydrocephaly—where fluid fills in the center of the brain and replaces most of the brain and squeezes the remainder out to a thin surface near the skull. And yet, these patients can have above average IQs. Optimal dynamic wiring can explain this—the brain is constantly doing global optimization across the wiring structure, adapting to even extreme deformations and damage. How does evolved modularity explain this?
It also obviously has hard-coded specialized modules, to some degree, which is why (for example) all human cultures develop language and music, which isn’t something you’d expect if we were all starting from zero.
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
There is a small amount of innate circuit structures—mainly in the brainstem, which can generate innate algorithms especially for walking behavior.
The question is which aspect dominates brain performance.
This is rather obvious—it depends on the ratio of pure learning structures (cortex, hippocampus, cerebellum) to innate circuit structures (brainstem, some midbrain, etc). In humans 95% or more of the circuitry is general purpose learning.
What about Watson?
Not an AGI.
Finally, I don’t have the background to refute your argument on the efficiency of the brain (although I know clever people who do who disagree with you).
The correct thing to do here is update. Instead you are searching for ways in which you can ignore the evidence.
But, taking it as a given that you’re right, it sounds like you’re assuming all future AIs will draw the same amount of power as a real brain and fit in the same spatial footprint.
Obviously not—in theory given a power budget you can split it up into N AGIs or one big AGI. In practice due to parallel scaling limitations, there is always some optimal N. Even on a single GPU today, you need N about 100 or more to get good performance.
You can’t just invest all your energy into one big AGI and expect better performance—that is a mind numbingly naive strategy.
To sum up: yes, I’ve read your thing. No, it’s not as convincing as you seem to believe.
Update, or provide counter evidence, or stop wasting my time.
In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN.
People have been using ANNs for reinforcement learning tasks since at least the TD-Gammon system with varying success. The Deepmind Atari agent is bigger and the task is sexier, but calling it an early precursor AGI seems far fetched.
Consider the case of extreme hydrocephaly—where fluid fills in the center of the brain and replaces most of the brain and squeezes the remainder out to a thin surface near the skull. And yet, these patients can have above average IQs. Optimal dynamic wiring can explain this—the brain is constantly doing global optimization across the wiring structure, adapting to even extreme deformations and damage. How does evolved modularity explain this?
I suppose that the network topology of these brains is essentially normal, isn’t it? If that’s the case, then all the modules are still there, they are just squeezed against the skull wall.
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
If I understand correctly, damage to Broca’s area or Wernicke’s area tends to cause speech impairment. This may be more or less severe depending on the individual, which is consistent with the evolved modularity hypotheses: genetically different individuals may have small differences in the location and shape of the brain modules.
Under the universal learning machine hypothesis, instead, we would expect that speech impairment following localized brain damage to quickly heal in most cases as other brain areas are recruited to the task. Note that there are large rewards for regaining linguistic ability, hence the brain would sacrifice other abilities if it could. This generally does not happen.
In fact, for most people with completely healthy brains it is difficult to learn a new language as well as a native speaker after the age of 10. This suggests that our language processing machinery is hard-wired to a significant extent.
The Deepmind Atari agent is bigger and the task is sexier, but calling it an early precursor AGI seems far fetched.
Hardly. It can learn a wide variety of tasks—many at above human level—in a variety of environments—all with only a few million neurons. It was on the cover of Nature for a reason.
Remember a mouse brain has the same core architecture as a human brain. The main components are all there and basically the same—just smaller—and with different size allocations across modules.
I suppose that the network topology of these brains is essentially normal, isn’t it? If that’s the case, then all the modules are still there, they are just squeezed against the skull wall.
From what I’ve read the topology is radically deformed, modules are lost, timing between remaining modules is totally changed—it’s massive brain damage. It’s so wierd that they can even still think that it has lead some neuroscientists to seriously consider that cognition comes from something other than neurons and synapses.
Under the universal learning machine hypothesis, instead, we would expect that speech impairment following localized brain damage to quickly heal in most cases as other brain areas are recruited to the task.
Not at all—relearning language would take at least as much time and computational power as learning it in the first place. Language is perhaps the most computationally challenging thing that humans learn—it takes roughly a decade to learn up to a high fluent adult level. Children learn faster—they have far more free cortical capacity. All of this is consistent with the ULH, and I bet it can even vaguely predict the time required for relearning language—although measuring the exact extent of damage to language centers is probably difficult .
This suggests that our language processing machinery is hard-wired to a significant extent.
Absolutely not—because you can look at the typical language modules in the microscope, and they are basically the same as the other cortical modules. Furthermore, there is no strong case for any mechanism that can encode any significant genetically predetermined task specific wiring complexity into the cortex. It is just like an ANN—the wiring is random. The modules are all basically the same.
First off the bat, you absolutely can create an AGI that is a pure ANN. In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN. Your claim that ANNs/Deep Learning is not the end of all AGI research is quickly becoming a minority position.
The deepmind agent has no memory, one of the problems that I noted in the first place with naive ANN systems. The deepmind’s team’s solution to this is the neural Turing machine model, which is a hybrid system between a neural network and a database. It’s not a pure ANN. It isn’t even neuromorphic.
Improving its performance is going to involve giving it more structure and more specialized components, and not just throwing more neurons and training time at it.
For goodness sake: Geoffrey Hinton, the father of deep learning, believes that the future of machine vision is explicitly integrating the idea of three dimensional coordinates and geometry into the structure of the network itself, and moving away from more naive and general purpose conv-nets.
Your position is not as mainstream as you like to present it.
The real test here would be to take a brain and give it an entirely new sense
Done and done. Next!
If you’d read the full sentence that I wrote, you’d appreciate that remapping existing senses doesn’t actually address my disagreement. I want a new sense, to make absolutely sure that the subjects aren’t just re-using hard coding from a different system. Snarky, but not a useful contribution to the conversation.
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
This is far from the mainstream linguistic perspective. Go argue with Noam Chomsky; he’s smarter than I am. Incidentally, you didn’t answer the question about birds and cats. Why can’t cats learn to do complex language tasks? Surely they also implement the universal learning algorithm just as parrots do.
What about Watson?
Not an AGI.
AGIs literally don’t exist, so that’s hardly a useful argument. Watson is the most powerful thing in its (fairly broad) class, and it’s not a neural network.
Finally, I don’t have the background to refute your argument on the efficiency of the brain (although I know clever people who do who disagree with you).
The correct thing to do here is update. Instead you are searching for ways in which you can ignore the evidence.
No, it really isn’t. I don’t update based on forum posts on topics I don’t understand, because I have no way to distinguish experts from crackpots.
The deepmind’s team’s solution to this is the neural Turing machine model, which is a hybrid system between a neural network and a database. It’s not a pure ANN.
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
Your position is not as mainstream as you like to present it.
You don’t understand my position. I don’t believe DL as it exists today is somehow the grail of AI. And yes I’m familiar with Hinton’s ‘Capsule’ proposals. And yes I agree there is still substantial room for improvement in ANN microarchitecture, and especially for learning invariances—and unsupervised especially.
This is far from the mainstream linguistic perspective.
For any theory of anything the brain does—if it isn’t grounded in computational neuroscience data, it is probably wrong—mainstream or not.
No, it really isn’t. I don’t update based on forum posts on topics I don’t understand, because I have no way to distinguish experts from crackpots.
You don’t update on forum posts? Really? You seem pretty familiar with MIRI and LW positions. So are you saying that you arrived at those positions all on your own somehow? Then you just showed up here, thankfully finding other people who just happened to have arrived at all the same ideas?
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
You could say that any machine learning system is an ANN, under a sufficiently vague definition. That’s not particularly useful in a discussion, however.
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
I think you misunderstood me. The current DeepMind AI that they’ve shown the public is a pure ANN. However, it has serious limitations because it’s not easy to implement long-term memory as a naive ANN. So they’re working on a successor called the “neural Turing machine” which marries an ANN to a database retrieval system—a specialized module.
You don’t understand my position. I don’t believe DL as it exists today is somehow the grail of AI. And yes I’m familiar with Hinton’s ‘Capsule’ proposals. And yes I agree there is still substantial room for improvement in ANN microarchitecture, and especially for learning invariances—and unsupervised especially.
The thing is, many of those improvements are dependent on the task at hand. It’s really, really hard for an off-the-shelf convnet neural network to learn the rules of three dimensional geometry, so we have to build it into the network. Our own visual processing shows signs of having the same structure imbedded in it.
The same structure would not, for example, benefit an NLP system, so we’d give it a different specialized structure, tuned to the hierarchical nature of language. The future, past a certain point, isn’t making ‘neural networks’ better. It’s making ‘machine vision’ networks better, or ‘natural language’ networks better. To make a long story short, specialized modules are an obvious place to go when you run into problem too complex to teach a naive convnet to do efficiently. Both for human engineers over the next 5-10, and for evolution over the last couple of billion.
You don’t update on forum posts? Really? You seem pretty familiar with MIRI and LW positions. So are you saying that you arrived at those positions all on your own somehow?
I have a CS and machine learning background, and am well-read on the subject outside LW. My math is extremely spotty, and my physics is non-existent. I update on things I read that I understand, or things from people I believe to be reputable. I don’t know you well enough to judge whether you usually say things that make sense, and I don’t have the physics to understand the argument you made or judge its validity. Therefore, I’m not inclined to update much on your conclusion.
EDIT: Oh, and you still haven’t responded to the cat thing. Which, seriously, seems like a pretty big hole in the universal learner hypothesis.
I update on things I read that I understand, or things from people I believe to be reputable.
So you are claiming that either you already understood AI/AGI completely when you arrived to LW, or you updated on LW/MIRI writings because they are ‘reputable’ - even though their positions are disavowed or even ridiculed by many machine learning experts.
EDIT: Oh, and you still haven’t responded to the cat thing. Which, seriously, seems like a pretty big hole in the universal learner hypothesis.
I replied here, and as expected—it looks like you are factually mistaken in your assertion that disagreed with the ULH. Better yet, the outcome of your cat vs bird observation was correctly predicted by the ULH, so that’s yet more evidence in its favor.
Let us see if we actually have significant differences of opinion. If so I expect to dominate you in any prediction market or bets concerning the near term future of AI.
and rudeness
or stop wasting my time
No one has any obligation to manage your time. If you want to stop wasting your time, you stop wasting your time.
Super obvious re-rebut: sociopaths exist, and yet civilization endures.
Also, we can rather obviously test in safe simulation sandboxes and avoid copying sociopaths. The argument that sociopaths are a fundemental showstopper must be based then on some magical view of the brain (because obviously evolution succeeds in producing non sociopaths, so we can copy its techniques if they are nonmagical).
Remember the argument is against existential threat level UFAI, not some fraction of evil AIs in a large population.
I think you misunderstand my argument. The point is that it’s ridiculous to say that human beings are ‘universal learning machines’ and you can just raise any learning algorithm as a human child and it’ll turn out fine. We can’t even raise 2-5% of HUMAN CHILDREN as human children and have it reliably turn out okay.
Sociopaths are different from baseline humans by a tiny degree. It’s got to be a small number of single-gene mutations. A tiny shift in information. And that’s all it takes to make them consistently UnFriendly, regardless of how well they’re raised. Obviously, AIs are going to be more different from us than that. And that’s a pretty good reason to think that we can’t just blithely assume that putting Skynet through preschool is going to keep us safe.
Human values are obviously hard coded in large part, and the hard coded portions seem to be crucial. That hard coding is not going to be present in an arbitrary AI, which means we have to go and duplicate it out of a human brain. Which is HARD. Which is why we’re having this discussion in the first place.
No—it is not. See the article for the in depth argument and citations backing up this statement.
Well almost—A ULM also requires a utility function or reward circuitry with some initial complexity, but we can also use the same universal learning algorithms to learn that component. It is just another circuit, and we can learn any circuit that evolution learned.
Sure—which is why I discussed sim sandbox testing. Did you read about my sim sandbox idea? We test designs in a safe sandbox sim, and we don’t copy sociopaths.
No, this isn’t obvious at all. AGI is going to be built from the same principles as the brain—because the brain is a universal learning machine. The AGI’s mind structure will be learned from training and experiential data such that the AI learns how to think like humans and learns how to be human—just like humans do. Human minds are software constructs—without that software we would just be animals (feral humans). An artificial brain is just another computer that can run the human mind software.
Yes, but it’s only a part of the brain and a fraction of the brain’s complexity, so obviously it can’t be harder than reverse engineering the whole brain.
Okay, so we just have to determine human terminal values in detail, and plug them into a powerful maximizer. I’m not sure I see how that’s different from the standard problem statement for friendly AI. Learning values by observing people is exactly what MIRI is working on, and it’s not a trivial problem.
For example: say your universal learning algorithm observes a human being fail a math test. How does it determine that the human being didn’t want to fail the math test? How does it cleanly separate values from their (flawed) implementation? What does it do when peoples’ values differ? These are hard questions, and precisely the ones that are being worked on by the AI risk people.
Other points of critique:
Saying the phrase “safe sandbox sim” is much easier than making a virtual machine that can withstand a superhuman intelligence trying to get out of it. Even if your software is perfect, it can still figure out that its world is artificial and figure out ways of blackmailing its captors. Probably doing what MIRI is looking into, and designing agents that won’t resist attempts to modify them (corrigibility) is a more robust solution.
You want to be careful about just plugging in a learned human utility function into a powerful maximizer, and then raising it. If it’s maximizing its own utility, which is necessary if you want it to behave anything like a child, what’s to stop it from learning human greed and cruelty, and becoming an eternal tyrant? I don’t trust a typical human to be god.
And even if you give up on that idea, and have to maximize a utility function defined in terms of humanity’s values, you still have problems. For starters, you want to be able to prove formally that its goals will remain stable as it self-modifies, and it won’t create powerful sub-agents who don’t share those goals. Which is the other class of problems that MIRI works on.
Why do you even go around thinking that the concept of “terminal values”, which is basically just a consequentialist steelmanning Aristotle, cuts reality at the joints?
That part honestly isn’t that hard once you read the available literature about paradox theorems.
No—not at all. Perhaps you have read too much MIRI material, and not enough of the neuroscience and machine learning I referenced. An infant is not born with human ‘terminal values’. It is born with some minimal initial reward learning circuitry to bootstrap learning of complex values from adults.
Stop thinking of AGI as some wierd mathy program. Instead think of brain emulations—and then you have obvious answers to all of these questions.
You apparently didn’t read my article or links to earlier discussion? We can easily limit the capability of minds by controlling knowledge. A million smart evil humans is dangerous—but only if they have modern knowledge. If they have only say medieval knowledge, they are hardly dangerous. Also—they don’t realize they are in a sim. Also—the point of the sandbox sims is to test architectures, reward learning systems, and most importantly—altruism. Designs that work well in these safe sims are then copied into less safe sims and finally the real world.
Consider the orthogonality thesis—AI of any intelligence level can be combined with any values. Thus we can test values on young/limited AI before scaling up their power.
Sandbox sims can be arbitrarily safe. It is the only truly practical workable proposal to date. It is also the closest to what is already used in industry. Thus it is the solution by default.
Ridiculous nonsense. Many humans today are aware of the sim argument. The gnostics were aware in some sense 2,000 years ago. Do you think any of them broke out? Are you trying to break out? How?
Again, stop thinking we create a single AI program and then we are done. It will be a largescale evolutionary process, with endless selection, testing, and refinement. We can select for super altruistic moral beings—like bhudda/gandhi/jesus level. We can take the human capability for altruism, refine it, and expand on it vastly.
Quixotic waste of time.
So, to sum up, your plan is to create an arbitrarily safe VM, and use it to run brain-emulation-style denovo AIs patterned on human babies (presumably with additional infrastructure to emulate the hard-coded changes that occur in the brain during development to adulthood: adult humans are not babies + education). You then want to raise many, many iterations of these things under different conditions to try to produce morally superior specimens, then turn those AIs loose and let them self modify to godhood.
Is that accurate? (Seriously, let me know if I’m misrepresenting your position).
A few problems immediately come to mind. We’ll set aside the moral horror of what you just described as a necessary evil to avert the apocalypse, for the time being.
More practically, I think you’re being racist against weird mathy programs.
For starters, I think weird mathy programs will be a good deal easier to develop than digital people. Human beings are not just general optimizers. We have modules that function roughly like one, which we use under some limited circumstances, but anyone who’s ever struggled with procrastination or put their keys in the refrigerator knows that your goal-oriented systems are entangled with a huge number of cheap heuristics at various levels, many of which are not remotely goal-oriented.
All of this stuff is deeply tangled up with what we think of as the human ‘utility function,’ because evolution has no incentive to design a clean separation between planning and values. Replicating all of that accurately enough to get something that thinks and behaves like a person is likely much harder than making a weird mathy program that’s good at modelling the world and coming up with plans.
There’s also the point that there really isn’t a good way to make a brain emulation smarter. Weird, mathy programs—even ones that use neural networks as subroutines—often have obvious avenues to making them smarter, and many can scale smoothly with processing resources. Brain emulations are much harder to bootstrap, and it’d be very difficult to preserve their behavior through the transition.
My best guess is, they’d probably go nuts and end up as an eldritch horror. And if not, they’re still going to get curb stomped by the first weird mathy program to come along, because they’re saddled with all of our human imperfections and unnecessary complexity. The upshot of all of this is that they don’t serve the purpose of protecting us from future UFAIs.
Finally, the process you described isn’t really something you can start on (aside from the VM angle) until you already have human level AGIs, and a deep and total understanding of all of the operation of the human brain. Then, while you’re setting up your crazy AI concentration camp and burning tens of thousands of man-years of compute time searching for AI Buddha, some bright spark in a basement with a GPU cluster has the much easier task of just cludging together something smart enough to recursively self-improve. You’re in a race with a bunch of people trying to solve a much easier problem, and (unlike MIRI) you don’t have decades of lead time to get a head start on the problem. Your large-scale evolutionary process would take much, much too much time and money to actually save the world.
In short, I think it’s a really bad idea. Although now that I understand what you’re getting at, it’s less obviously silly than what I originally thought you were proposing. I apologize.
No. I said:
I used brain emulations as analogy to help aid your understanding. Because unless you have deep knowledge of machine learning and computational neuroscience, there are huge inferential distances to cross.
Yes we are. I have made a detailed, extensive, citation-full, and well reviewed case that human minds are just that.
All of our understanding about the future of AGI is based ultimately on our models of the brain and AI in general. I am claiming that the MIRI viewpoint is based on an outdated model of the brain, and a poor understanding of the limits of computation and intelligence.
I will summarize for one last time. I will then no longer repeat myself because it is not worthy of my time—any time spent arguing this is better spent preparing another detailed article, rather than a little comment.
There is extensive uncertainty concerning how the brain works and what types of future AI are possible in practice. In situations of such uncertainty, any good sane probabilistic reasoning agent should come up with a multimodal distribution that spreads belief across several major clusters. If your understanding of AI comes mainly from reading LW—you are probably biased beyond hope. I’m sorry, but this is true. You are stuck in box and don’t even know it.
Here are the main key questions that lead to different belief clusters:
Are the brain’s algorithms for intelligence complex or simple?
And related—are human minds mainly software or mainly hardware?
At the practical computational level, does the brain implement said algorithms efficiently or not?
If the human mind is built out of a complex mess of hardware specific circuits, and the brain is far from efficient, than there is little to learn from the brain. This is Yudkowsky/MIRI’s position. This viewpoint leads to a focus on pure math and avoidance of anything brain-like (such as neural nets). In this viewpoint hard takeoff is likely, AI is predicted to be nothing like human minds, etc.
If you believe that the human is complex and messy hardware, but the brain is efficient, than you get Hanson’s viewpoint where the future is dominated by brain emulations. The brain ems win over brain inspired AI because scanning real brain circuitry is easier than figuring out how it works.
Now what if the brain’s algorithms are not complex, and the brain is efficient? Then you get my viewpoint cluster.
These questions are empirical—and they can be answered today. In fact, I realized all this years ago and spent a huge amount of time learning more about the future of computer hardware, the limits of computation, machine learning, and computational neuroscience.
Yudkowsky, Hanson, and to some extent Bostrom—were all heavily inspired by the highly influential evolved modularity hypothesis in ev psych from Tooby and Cosmides. In this viewpoint, the brain is complex, and most of our algorithmic content is hardware based rather than software. I have argued that this viewpoint has been tested empirically and now disproven. The brain is built out of relatively simple universal learning algorithms. It will essentially be almost impossible to build practical AGI that is very different from the brain (remember, AGI is defined as software which can do everything the brain does).
Bostrom/Yudkowksky have also argued that the brain is very far from efficient. For example, from true sources of disagreement:
The first two statements are true, the third statement is problematic, and the thrust of the conclusion is incorrect. The minimum realistic energy for a brain-like circuit is probably close to what the brain actually uses:
the landauer bound depends on speed and reliability. The 10^-21 J/bit bound only applies to a signal of infinitely low frequency. For realistic fast reliable signals, the bound is 100 times higher: around 10^-19 J/bit.
the landauer bound applies to single 1 bit ops. The fundamental bound for a 32 bit flop is around 10^5 or 10^6 times higher. Moore’s Law is ending and we are actually close to these bounds already. Synapses perform analog ops which have lower cost than a 32 bit flop, but still a much higher cost than a single bit op.
most of the energy consumption in any advanced computer comes from wire dissipation, not switch dissipation. Signaling in the brain uses roughly 0.5x10^-14 J/bit/mm (5 fJ/bit/mm) 2, which appears to be within an order of mag or two of optimal, and is perhaps one order of magnitude more efficient than current computers. Wire signal energy in computers is not improving significantly. For example, for 40nm tech in 2010, the wire energy is 240 fj/bit/mm, and is predicted to be around 150 to 115 by 2017 3. The practical limit is perhaps around 1 fJ/bit/mm, but that would probably require much lower speeds.
These errors add up to around 6 orders of magnitude or so. The brain is near the limits of energy efficiency for what it does in terms of irreversible computation. No practical machine we will ever build in the near future is going to be many orders of magnitude more efficient than the brain. Yes, eventually reversible and quantum computing could perhaps result in large improvements, but those technologies are far and will come long after neuromorphic AGI.
That isn’t quite correct. We do have hard wiring that raises and lowers the from-the-inside importance of specific features present in our learning data. That is, we have a nontrivial inductive bias which not all possible minds will have, even when we start by assuming that all minds are semi-modular universal learners.
Yes, I’ve read your big universal learner post, and I’m not convinced. This does seem to be the crux of our disagreement, so let me take some time to rebut:
First off, you’re seriously misrepresenting the success of deep learning as support for your thesis. Deep learning algorithms are extremely powerful, and probably have a role to play in building AGI, but they aren’t the end-all, be-all of AI research. For starters, modern deep learning systems are absolutely fine-tuned to the task at hand. You say that they have only “a small number of hyperparameters.” which is something of a misrepresentation. There are actually quite a few of these hyperparameters in state-of-the-art networks, and there are more in networks tackling more difficult tasks.
Tuning these hyperparameters is hard enough that only a small number of researchers can do it well enough to achieve state of the art results. We do not use the same network for image recognition and audio processing, because that wouldn’t work very well.
We tune the architecture of deep learning systems to the task at hand. Presumably, if we can garner benefits from doing that, evolution has an incentive to do the same. There’s a core, simple algorithm at work, but targeted to specific tasks. Evolution has no incentive to produce a clean design if cludgy tweaks give better results. You argue that evolution has a bias against complexity, but that certainly hasn’t stopped other organs from developing complex structure to make them marginally better at the task.
There’s also the point that there’s plenty of tasks that deep learning methods can’t solve yet (like how to store long-term memories of a complex and partially observed system in an efficient manner) - not to mention higher level cognitive skills that we have no clue how to approach.
Nobody thinks this stuff is just a question of throwing yet larger deep learning networks at the problem. They will be solved by finding different hard-wired network architectures that make the problem more manageable by knowing things about it in advance.
The ferret brain rewiring result is not a slam-dunk for the universal learning by itself. It just means that different brain modules can switch which pre-programmed neural algorithms they implement on the fly. Which makes sense, because on some level these things have to be self-organizing in the first place to be compactly genetically coded.
The real test here would be to take a brain and give it an entirely new sense—something that bears no resemblance to any sense it or any of its ancestors has ever had, and see if it can use that sense as naturally as hearing or vision. Personally, I doubt it. Humans can learn echolocation, but they can’t learn echolocation the way bats and dolphins can learn echolocation—and echolocation bears a fair degree of resemblance to other tasks that humans already have specialized networks for (like pinpointing the location of a sound in space).
Notably, the general learner hypothesis does not explain why non-surgically-modified brains are so standardized in structure and functional layout. Something that you yourself bring up in your article.
It also does not explain why birds are better at language tasks than cats. Cat brains are much larger. The training rewards in the lab are the same. And, yet, cats significantly underperform parrots at every single language-related task we can come up with. Why? Because the parrots have had a greater evolutionary pressure to be good at language-style tasks—and, as a result, they have evolved task-specific neurological algorithms to make it easier.
Also, plenty of mammals, fresh out of the womb, have complex behaviors and vocalizations. Humans are something of an outlier, due to being born premature by mammal standards. If mammal brains are 99% universal learning, why can baby cows walk within minutes of birth?
Look, obviously, to some degree, both things are true. The brain is capable of general learning to some degree. Otherwise, we’d never have developed math. It also obviously has hard-coded specialized modules, to some degree, which is why (for example) all human cultures develop language and music, which isn’t something you’d expect if we were all starting from zero. The question is which aspect dominates brain performance. You’re proposing an extreme swing to one end of the possibility space that doesn’t seem even remotely plausible—and then you’e using that assumption as evidence that no non-brain-like intelligence can exist.
What about Watson? It’s the best-performing NLP system ever made, and it’s absolutely a “weird mathy program.” It uses neural networks as subroutines, but the architecture of the whole bears no resemblance to the human brain. It’s not a simple universal learning algorithm. If you gave a single deep neural network access to the same computational resources, it would underperform Watson. That seems like a pretty tough pill to swallow if ‘simple universal learner’ is all there is to intelligence.
Finally, I don’t have the background to refute your argument on the efficiency of the brain (although I know clever people who do who disagree with you). But, taking it as a given that you’re right, it sounds like you’re assuming all future AIs will draw the same amount of power as a real brain and fit in the same spatial footprint. Well… what if they didn’t? What if the AI brain is the size of a fridge and cooled with LN2 and consumes as much power as a city block? Surely at the physical limits of computation you believe in, that would be able to beat the pants off little old us.
To sum up: yes, I’ve read your thing. No, it’s not as convincing as you seem to believe.
Cat brains are much larger, but physical size is irrelevant. What matters is neuron/synapse count.
According to my ULM theory—the most likely explanation for the superior learning ability of parrots is a larger number of neurons/synapses in their general learning modules - (whatever the equivalent of the cortex is in birds) and thus more computational power available for general learning.
Stop right now, and consider this bet—I will bet that parrots have more neurons/synapses in their cortex-equivalent brain regions than cats.
Now a little google searching leads to this blog article which summarizes this recent research—Complex brains for complex cognition—neuronal scaling rules for bird brains,
From the abstract:
The telencephalon is believed to be the equivalent of the cortex in birds. The cortex of the smallest monkeys have about 400 million neurons, whereas the cat’s cortex has about 300 million neurons. A medium sized monkey such as a night monkey has more than 1 billion cortical neurons.
Interesting! I didn’t know that, and that makes a lot of sense.
If I were to restate my objection more strongly, I’d say that parrots also seem to exceed chimps in language capabilities (chimps having six billion cortical neurons). The reason I didn’t bring this up originally is that chimp language research is a horrible, horrible field full of a lot of bad science, so it’s difficult to be too confident in that result.
Plenty of people will tell you that signing chimps are just as capable as Alex the parrot—they just need a little bit of interpretation from the handler, and get too nervous to perform well when the handler isn’t working with them. Personally, I think that sounds a lot like why psychics suddenly stop working when James Randi shows up, but obviously the situation is a little more complicated.
I’d strongly suggest the movie project nim, if you haven’t seen it. In some respects chimpanzee intelligence develops faster than that of a human child, but it also planes off much earlier. Their childhood development period is much shorter.
To first approximation, general intelligence in animals can be predicted by number of neurons/synapses in general learning modules, but this isn’t the only factor. I don’t have an exact figure, but that poster article suggests parrots have perhaps 1-3 billion ish cortical neuron equivalent.
The next most important factor is probably degree of neotany or learning window. Human intelligence develops over the span of 20 years. Parrots seem exceptional in terms of lifespan and are thus perhaps more human like—where they maintain a childlike state for much longer. We know from machine learning that the ‘learning rate’ is a super important hyperparameter—learning faster has a huge advantage, but if you learn too fast you get inferior long term results for your capacity. Learning slowly is obviously more costly, but it can generate more efficient circuits in the long term.
I inferred/guessed that parrots have very long neotenic learning windows, and the articles on Alex seem to confirm this.
Alex reached a vocabulary of about 100 words by age 29, a few year’s before his untimely death. The trainer—Irene Pepperberg - claims that Alex was still learning and had not reached peak capability. She rated Alex’s intelligence as roughly equivalent to that of a 5 year old. This about makes sense if the parrot has roughly 1/6th our number of cortical neurons, but has similar learning efficiency and long learning window.
To really compare chimp vs parrot learning ability, we’d need more than a handful of samples. There is also a large selection effect here—because parrots make reasonably good pets, whereas chimps are terrible dangerous pets. So we haven’t tested chimps as much. Alex is more likely to be a very bright parrot, whereas the handful of chimps we have tested are more likely to be average.
Not much to add here, except that it’s unlikely that Alex is an exceptional example of a parrot. The researcher purchased him from a pet store at random to try to eliminate that objection.
This is curious. I wonder if bird brains are also more energy efficient as a result of the greater neuronal densities (since that implies shorter wires). According to Ratio of central nervous system to body metabolism in vertebrates: its constancy and functional basis the metabolism of the brain of Corvus sp (unknown species of genus Corvus, which includes the ravens) is 0.52 cm^3 O2/min whereas the metabolism of the brain of a macaque monkey is 3.4 cm^3 O2/min. Presumably the macaque monkey has more non-cortical neurons which account for some the difference, but this still seems impressive if the Corvus sp and macaque monkey have a similar number of telencephalic/cortical neurons (1.4B for the macaque according to this paper). Unfortunately I can’t find the full paper of the abstract you linked to to check the details.
Yes—that seems to be the point of that poster I found earlier.
From an evolutionary point of view it makes sense—birds are under tremendous optimization pressure for mass efficiency. Hummingbirds are a great example of how far evolution can push flight and weight efficiency.
Primate/human brains also appear to have more density optimization than say elephants or cetaceans, but it is interesting that birds are even so much more density efficient. Presumably there are some other tradeoffs—perhaps the bird brain design is too hot to scale up to large sizes, and uses too much resources, etc.
It was a recent poster—so perhaps it is still a paper in progress? They claim to have ran the defractionator experiments on bird brains, so they should have estimates of the actual neuron counts to back up their general claims, but they didn’t provide those in the abstract. Perhaps the data exists somewhere as an image from the actual presentation. Oh well.
Do you actually believe that evolved modularity is a better explanation of the brain then the ULM hypothesis? Do you have evidence for this belief or is it simply that which you want to be true? Do you understand why the computational neuroscience and machine learning folks are moving away from the latter towards the former? If you do have evidence please provide it in a critique in the comments for that post where I will respond.
Make some specific predictions for the next 5 years about deep learning or ANNs. Let us see if we actually have significant differences of opinion. If so I expect to dominate you in any prediction market or bets concerning the near term future of AI.
First off the bat, you absolutely can create an AGI that is a pure ANN. In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN. Your claim that ANNs/Deep Learning is not the end of all AGI research is quickly becoming a minority position.
What the scottsman!
Done and done. Next!
I discussed this in the comments—it absolutely does explain neurotypical standardization. It’s a result of topographic/geometric wiring optimization. There is an exactly optimal location for every piece of functionality, and the brain tends to find those same optimal locations in each human. But if you significantly perturb the input sense or the brain geometry, you can get radically different results.
Consider the case of extreme hydrocephaly—where fluid fills in the center of the brain and replaces most of the brain and squeezes the remainder out to a thin surface near the skull. And yet, these patients can have above average IQs. Optimal dynamic wiring can explain this—the brain is constantly doing global optimization across the wiring structure, adapting to even extreme deformations and damage. How does evolved modularity explain this?
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
There is a small amount of innate circuit structures—mainly in the brainstem, which can generate innate algorithms especially for walking behavior.
This is rather obvious—it depends on the ratio of pure learning structures (cortex, hippocampus, cerebellum) to innate circuit structures (brainstem, some midbrain, etc). In humans 95% or more of the circuitry is general purpose learning.
Not an AGI.
The correct thing to do here is update. Instead you are searching for ways in which you can ignore the evidence.
Obviously not—in theory given a power budget you can split it up into N AGIs or one big AGI. In practice due to parallel scaling limitations, there is always some optimal N. Even on a single GPU today, you need N about 100 or more to get good performance.
You can’t just invest all your energy into one big AGI and expect better performance—that is a mind numbingly naive strategy.
Update, or provide counter evidence, or stop wasting my time.
People have been using ANNs for reinforcement learning tasks since at least the TD-Gammon system with varying success. The Deepmind Atari agent is bigger and the task is sexier, but calling it an early precursor AGI seems far fetched.
I suppose that the network topology of these brains is essentially normal, isn’t it? If that’s the case, then all the modules are still there, they are just squeezed against the skull wall.
If I understand correctly, damage to Broca’s area or Wernicke’s area tends to cause speech impairment.
This may be more or less severe depending on the individual, which is consistent with the evolved modularity hypotheses: genetically different individuals may have small differences in the location and shape of the brain modules.
Under the universal learning machine hypothesis, instead, we would expect that speech impairment following localized brain damage to quickly heal in most cases as other brain areas are recruited to the task. Note that there are large rewards for regaining linguistic ability, hence the brain would sacrifice other abilities if it could. This generally does not happen.
In fact, for most people with completely healthy brains it is difficult to learn a new language as well as a native speaker after the age of 10. This suggests that our language processing machinery is hard-wired to a significant extent.
Hardly. It can learn a wide variety of tasks—many at above human level—in a variety of environments—all with only a few million neurons. It was on the cover of Nature for a reason.
Remember a mouse brain has the same core architecture as a human brain. The main components are all there and basically the same—just smaller—and with different size allocations across modules.
From what I’ve read the topology is radically deformed, modules are lost, timing between remaining modules is totally changed—it’s massive brain damage. It’s so wierd that they can even still think that it has lead some neuroscientists to seriously consider that cognition comes from something other than neurons and synapses.
Not at all—relearning language would take at least as much time and computational power as learning it in the first place. Language is perhaps the most computationally challenging thing that humans learn—it takes roughly a decade to learn up to a high fluent adult level. Children learn faster—they have far more free cortical capacity. All of this is consistent with the ULH, and I bet it can even vaguely predict the time required for relearning language—although measuring the exact extent of damage to language centers is probably difficult .
Absolutely not—because you can look at the typical language modules in the microscope, and they are basically the same as the other cortical modules. Furthermore, there is no strong case for any mechanism that can encode any significant genetically predetermined task specific wiring complexity into the cortex. It is just like an ANN—the wiring is random. The modules are all basically the same.
The deepmind agent has no memory, one of the problems that I noted in the first place with naive ANN systems. The deepmind’s team’s solution to this is the neural Turing machine model, which is a hybrid system between a neural network and a database. It’s not a pure ANN. It isn’t even neuromorphic.
Improving its performance is going to involve giving it more structure and more specialized components, and not just throwing more neurons and training time at it.
For goodness sake: Geoffrey Hinton, the father of deep learning, believes that the future of machine vision is explicitly integrating the idea of three dimensional coordinates and geometry into the structure of the network itself, and moving away from more naive and general purpose conv-nets.
Source: https://github.com/WalnutiQ/WalnutiQ/issues/157
Your position is not as mainstream as you like to present it.
If you’d read the full sentence that I wrote, you’d appreciate that remapping existing senses doesn’t actually address my disagreement. I want a new sense, to make absolutely sure that the subjects aren’t just re-using hard coding from a different system. Snarky, but not a useful contribution to the conversation.
This is far from the mainstream linguistic perspective. Go argue with Noam Chomsky; he’s smarter than I am. Incidentally, you didn’t answer the question about birds and cats. Why can’t cats learn to do complex language tasks? Surely they also implement the universal learning algorithm just as parrots do.
AGIs literally don’t exist, so that’s hardly a useful argument. Watson is the most powerful thing in its (fairly broad) class, and it’s not a neural network.
No, it really isn’t. I don’t update based on forum posts on topics I don’t understand, because I have no way to distinguish experts from crackpots.
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
You don’t understand my position. I don’t believe DL as it exists today is somehow the grail of AI. And yes I’m familiar with Hinton’s ‘Capsule’ proposals. And yes I agree there is still substantial room for improvement in ANN microarchitecture, and especially for learning invariances—and unsupervised especially.
For any theory of anything the brain does—if it isn’t grounded in computational neuroscience data, it is probably wrong—mainstream or not.
You don’t update on forum posts? Really? You seem pretty familiar with MIRI and LW positions. So are you saying that you arrived at those positions all on your own somehow? Then you just showed up here, thankfully finding other people who just happened to have arrived at all the same ideas?
You could say that any machine learning system is an ANN, under a sufficiently vague definition. That’s not particularly useful in a discussion, however.
I think you misunderstood me. The current DeepMind AI that they’ve shown the public is a pure ANN. However, it has serious limitations because it’s not easy to implement long-term memory as a naive ANN. So they’re working on a successor called the “neural Turing machine” which marries an ANN to a database retrieval system—a specialized module.
The thing is, many of those improvements are dependent on the task at hand. It’s really, really hard for an off-the-shelf convnet neural network to learn the rules of three dimensional geometry, so we have to build it into the network. Our own visual processing shows signs of having the same structure imbedded in it.
The same structure would not, for example, benefit an NLP system, so we’d give it a different specialized structure, tuned to the hierarchical nature of language. The future, past a certain point, isn’t making ‘neural networks’ better. It’s making ‘machine vision’ networks better, or ‘natural language’ networks better. To make a long story short, specialized modules are an obvious place to go when you run into problem too complex to teach a naive convnet to do efficiently. Both for human engineers over the next 5-10, and for evolution over the last couple of billion.
I have a CS and machine learning background, and am well-read on the subject outside LW. My math is extremely spotty, and my physics is non-existent. I update on things I read that I understand, or things from people I believe to be reputable. I don’t know you well enough to judge whether you usually say things that make sense, and I don’t have the physics to understand the argument you made or judge its validity. Therefore, I’m not inclined to update much on your conclusion.
EDIT: Oh, and you still haven’t responded to the cat thing. Which, seriously, seems like a pretty big hole in the universal learner hypothesis.
So you are claiming that either you already understood AI/AGI completely when you arrived to LW, or you updated on LW/MIRI writings because they are ‘reputable’ - even though their positions are disavowed or even ridiculed by many machine learning experts.
I replied here, and as expected—it looks like you are factually mistaken in your assertion that disagreed with the ULH. Better yet, the outcome of your cat vs bird observation was correctly predicted by the ULH, so that’s yet more evidence in its favor.
Let me point out the blatant hubris:
and rudeness
No one has any obligation to manage your time. If you want to stop wasting your time, you stop wasting your time.
Hubris—perhaps, but it was a challenge. Making predictions/bets can help clarify differences in world models.
The full quote was this:
In the context that he had just claimed that he wasn’t going to update.