The Brain as a Universal Learning Machine
This article presents an emerging architectural hypothesis of the brain as a biological implementation of a Universal Learning Machine. I present a rough but complete architectural view of how the brain works under the universal learning hypothesis. I also contrast this new viewpoint—which comes from computational neuroscience and machine learning—with the older evolved modularity hypothesis popular in evolutionary psychology and the heuristics and biases literature. These two conceptions of the brain lead to very different predictions for the likely route to AGI, the value of neuroscience, the expected differences between AGI and humans, and thus any consequent safety issues and dependent strategies.
(The image above is from a recent mysterious post to r/machinelearning, probably from a Google project that generates art based on a visualization tool used to inspect the patterns learned by convolutional neural networks. I am especially fond of the wierd figures riding the cart in the lower left. )
Intro: Two viewpoints on the Mind
Universal Learning Machines
Historical Interlude
Dynamic Rewiring
Brain Architecture (the whole brain in one picture and a few pages of text)
The Basal Ganglia
Implications for AGI
Conclusion
Intro: Two Viewpoints on the Mind
Few discoveries are more irritating than those that expose the pedigree of ideas.
-- Lord Acton (probably)
Less Wrong is a site devoted to refining the art of human rationality, where rationality is based on an idealized conceptualization of how minds should or could work. Less Wrong and its founding sequences draws heavily on the heuristics and biases literature in cognitive psychology and related work in evolutionary psychology. More specifically the sequences build upon a specific cluster in the space of cognitive theories, which can be identified in particular with the highly influential “evolved modularity” perspective of Cosmides and Tooby.
From Wikipedia:
Evolutionary psychologists propose that the mind is made up of genetically influenced and domain-specific[3] mental algorithms or computational modules, designed to solve specific evolutionary problems of the past.[4]
From “Evolutionary Psychology and the Emotions”:[5]
An evolutionary perspective leads one to view the mind as a crowded zoo of evolved, domain-specific programs. Each is functionally specialized for solving a different adaptive problem that arose during hominid evolutionary history, such as face recognition, foraging, mate choice, heart rate regulation, sleep management, or predator vigilance, and each is activated by a different set of cues from the environment.
If you imagine these general theories or perspectives on the brain/mind as points in theory space, the evolved modularity cluster posits that much of the machinery of human mental algorithms is largely innate. General learning—if it exists at all—exists only in specific modules; in most modules learning is relegated to the role of adapting existing algorithms and acquiring data; the impact of the information environment is de-emphasized. In this view the brain is a complex messy cludge of evolved mechanisms.
There is another viewpoint cluster, more popular in computational neuroscience (especially today), that is almost the exact opposite of the evolved modularity hypothesis. I will rebrand this viewpoint the “universal learner” hypothesis, aka the “one learning algorithm” hypothesis (the rebranding is justified mainly by the inclusion of some newer theories and evidence for the basal ganglia as a ‘CPU’ which learns to control the cortex). The roots of the universal learning hypothesis can be traced back to Mountcastle’s discovery of the simple uniform architecture of the cortex.[6]
The universal learning hypothesis proposes that all significant mental algorithms are learned; nothing is innate except for the learning and reward machinery itself (which is somewhat complicated, involving a number of systems and mechanisms), the initial rough architecture (equivalent to a prior over mindspace), and a small library of simple innate circuits (analogous to the operating system layer in a computer). In this view the mind (software) is distinct from the brain (hardware). The mind is a complex software system built out of a general learning mechanism.
In simplification, the main difference between these viewpoints is the relative quantity of domain specific mental algorithmic information specified in the genome vs that acquired through general purpose learning during the organism’s lifetime. Evolved modules vs learned modules.
When you have two hypotheses or viewpoints that are almost complete opposites this is generally a sign that the field is in an early state of knowledge; further experiments typically are required to resolve the conflict.
It has been about 25 years since Cosmides and Tooby began to popularize the evolved modularity hypothesis. A number of key neuroscience experiments have been performed since then which support the universal learning hypothesis (reviewed later in this article).
Additional indirect support comes from the rapid unexpected success of Deep Learning[7], which is entirely based on building AI systems using simple universal learning algorithms (such as Stochastic Gradient Descent or other various approximate Bayesian methods[8][9][10][11]) scaled up on fast parallel hardware (GPUs). Deep Learning techniques have quickly come to dominate most of the key AI benchmarks including vision[12], speech recognition[13][14], various natural language tasks, and now even ATARI [15] - proving that simple architectures (priors) combined with universal learning is a path (and perhaps the only viable path) to AGI. Moreover, the internal representations that develop in some deep learning systems are structurally and functionally similar to representations in analogous regions of biological cortex[16].
To paraphrase Feynman: to truly understand something you must build it.
In this article I am going to quickly introduce the abstract concept of a universal learning machine, present an overview of the brain’s architecture as a specific type of universal learning machine, and finally I will conclude with some speculations on the implications for the race to AGI and AI safety issues in particular.
Universal Learning Machines
A universal learning machine is a simple and yet very powerful and general model for intelligent agents. It is an extension of a general computer—such as Turing Machine—amplified with a universal learning algorithm. Do not view this as my ‘big new theory’ - it is simply an amalgamation of a set of related proposals by various researchers.
An initial untrained seed ULM can be defined by 1.) a prior over the space of models (or equivalently, programs), 2.) an initial utility function, and 3.) the universal learning machinery/algorithm. The machine is a real-time system that processes an input sensory/observation stream and produces an output motor/action stream to control the external world using a learned internal program that is the result of continuous self-optimization.
There is of course always room to smuggle in arbitrary innate functionality via the prior, but in general the prior is expected to be extremely small in bits in comparison to the learned model.
The key defining characteristic of a ULM is that it uses its universal learning algorithm for continuous recursive self-improvement with regards to the utility function (reward system). We can view this as second (and higher) order optimization: the ULM optimizes the external world (first order), and also optimizes its own internal optimization process (second order), and so on. Without loss of generality, any system capable of computing a large number of decision variables can also compute internal self-modification decisions.
Conceptually the learning machinery computes a probability distribution over program-space that is proportional to the expected utility distribution. At each timestep it receives a new sensory observation and expends some amount of computational energy to infer an updated (approximate) posterior distribution over its internal program-space: an approximate ‘Bayesian’ self-improvement.
The above description is intentionally vague in the right ways to cover the wide space of possible practical implementations and current uncertainty. You could view AIXI as a particular formalization of the above general principles, although it is also as dumb as a rock in any practical sense and has other potential theoretical problems. Although the general idea is simple enough to convey in the abstract, one should beware of concise formal descriptions: practical ULMs are too complex to reduce to a few lines of math.A ULM inherits the general property of a Turing Machine that it can compute anything that is computable, given appropriate resources. However a ULM is also more powerful than a TM. A Turing Machine can only do what it is programmed to do. A ULM automatically programs itself.
If you were to open up an infant ULM—a machine with zero experience—you would mainly just see the small initial code for the learning machinery. The vast majority of the codestore starts out empty—initialized to noise. (In the brain the learning machinery is built in at the hardware level for maximal efficiency).
Theoretical turing machines are all qualitatively alike, and are all qualitatively distinct from any non-universal machine. Likewise for ULMs. Theoretically a small ULM is just as general/expressive as a planet-sized ULM. In practice quantitative distinctions do matter, and can become effectively qualitative.
Just as the simplest possible Turing Machine is in fact quite simple, the simplest possible Universal Learning Machine is also probably quite simple. A couple of recent proposals for simple universal learning machines include the Neural Turing Machine[16] (from Google DeepMind), and Memory Networks[17]. The core of both approaches involve training an RNN to learn how to control a memory store through gating operations.
Historical Interlude
At this point you may be skeptical: how could the brain be anything like a universal learner? What about all of the known innate biases/errors in human cognition? I’ll get to that soon, but let’s start by thinking of a couple of general experiments to test the universal learning hypothesis vs the evolved modularity hypothesis.
In a world where the ULH is mostly correct, what do we expect to be different than in worlds where the EMH is mostly correct?
One type of evidence that would support the ULH is the demonstration of key structures in the brain along with associated wiring such that the brain can be shown to directly implement some version of a ULM architecture.
Another type of indirect evidence that would help discriminate the two theories would be evidence that the brain is capable of general global optimization, and that complex domain specific algorithms/circuits mostly result from this process. If on the other hand the brain is only capable of constrained/local optimization, then most of the complexity must instead be innate—the result of global optimization in evolutionary deeptime. So in essence it boils down to the optimization capability of biological learning vs biological evolution.
From the perspective of the EMH, it is not sufficient to demonstrate that there are things that brains can not learn in practice—because those simply could be quantitative limitations. Demonstrating that an intel 486 can’t compute some known computable function in our lifetimes is not proof that the 486 is not a Turing Machine.
Nor is it sufficient to demonstrate that biases exist: a ULM is only ‘rational’ to the extent that its observational experience and learning machinery allows (and to the extent one has the correct theory of rationality). In fact, the existence of many (most?) biases intrinsically depends on the EMH—based on the implicit assumption that some cognitive algorithms are innate. If brains are mostly ULMs then most cognitive biases dissolve, or become learning biases—for if all cognitive algorithms are learned, then evidence for biases is evidence for cognitive algorithms that people haven’t had sufficient time/energy/motivation to learn. (This does not imply that intrinsic limitations/biases do not exist or that the study of cognitive biases is a waste of time; rather the ULH implies that educational history is what matters most)
The genome can only specify a limited amount of information. The question is then how much of our advanced cognitive machinery for things like facial recognition, motor planning, language, logic, planning, etc. is innate vs learned. From evolution’s perspective there is a huge advantage to preloading the brain with innate algorithms so long as said algorithms have high expected utility across the expected domain landscape.
On the other hand, evolution is also highly constrained in a bit coding sense: every extra bit of code costs additional energy for the vast number of cellular replication events across the lifetime of the organism. Low code complexity solutions also happen to be exponentially easier to find. These considerations seem to strongly favor the ULH but they are difficult to quantify.
Neuroscientists have long known that the brain is divided into physical and functional modules. These modular subdivisions were discovered a century ago by Brodmann. Every time neuroscientists opened up a new brain, they saw the same old cortical modules in the same old places doing the same old things. The specific layout of course varied from species to species, but the variations between individuals are minuscule. This evidence seems to strongly favor the EMH.
Throughout most of the 90′s up into the 2000′s, evidence from computational neuroscience models and AI were heavily influenced by—and unsurprisingly—largely supported the EMH. Neural nets and backprop were known of course since the 1980′s and worked on small problems[18], but at the time they didn’t scale well—and there was no theory to suggest they ever would.
Theory of the time also suggested local minima would always be a problem (now we understand that local minima are not really the main problem[19], and modern stochastic gradient descent methods combined with highly overcomplete models and stochastic regularization[20] are effectively global optimizers that can often handle obstacles such as local minima and saddle points[21]).
The other related historical criticism rests on the lack of biological plausibility for backprop style gradient descent. (There is as of yet little consensus on how the brain implements the equivalent machinery, but target propagation is one of the more promising recent proposals[22][23].)
Many AI researchers are naturally interested in the brain, and we can see the influence of the EMH in much of the work before the deep learning era. HMAX is a hierarchical vision system developed in the late 90′s by Poggio et al as a working model of biological vision[24]. It is based on a preconfigured hierarchy of modules, each of which has its own mix of innate features such as gabor edge detectors along with a little bit of local learning. It implements the general idea that complex algorithms/features are innate—the result of evolutionary global optimization—while neural networks (incapable of global optimization) use hebbian local learning to fill in details of the design.
Dynamic Rewiring
In a groundbreaking study from 2000 published in Nature, Sharma et al successfully rewired ferret retinal pathways to project into the auditory cortex instead of the visual cortex.[25] The result: auditory cortex can become visual cortex, just by receiving visual data! Not only does the rewired auditory cortex develop the specific gabor features characteristic of visual cortex; the rewired cortex also becomes functionally visual. [26] True, it isn’t quite as effective as normal visual cortex, but that could also possibly be an artifact of crude and invasive brain rewiring surgery.
The ferret study was popularized by the book On Intelligence by Hawkins in 2004 as evidence for a single cortical learning algorithm. This helped percolate the evidence into the wider AI community, and thus probably helped in setting up the stage for the deep learning movement of today. The modern view of the cortex is that of a mostly uniform set of general purpose modules which slowly become recruited for specific tasks and filled with domain specific ‘code’ as a result of the learning (self optimization) process.
The next key set of evidence comes from studies of atypical human brains with novel extrasensory powers. In 2009 Vuillerme et al showed that the brain could automatically learn to process sensory feedback rendered onto the tongue[27]. This research was developed into a complete device that allows blind people to develop primitive tongue based vision.
In the modern era some blind humans have apparently acquired the ability to perform echolocation (sonar), similar to cetaceans. In 2011 Thaler et al used MRI and PET scans to show that human echolocators use diverse non-auditory brain regions to process echo clicks, predominantly relying on re-purposed ‘visual’ cortex.[27]
The echolocation study in particular helps establish the case that the brain is actually doing global, highly nonlocal optimization—far beyond simple hebbian dynamics. Echolocation is an active sensing strategy that requires very low latency processing, involving complex timed coordination between a number of motor and sensory circuits—all of which must be learned.
Somehow the brain is dynamically learning how to use and assemble cortical modules to implement mental algorithms: everyday tasks such as visual counting, comparisons of images or sounds, reading, etc—all are task which require simple mental programs that can shuffle processed data between modules (some or any of which can also function as short term memory buffers).
To explain this data, we should be on the lookout for a system in the brain that can learn to control the cortex—a general system that dynamically routes data between different brain modules to solve domain specific tasks.
But first let’s take a step back and start with a high level architectural view of the entire brain to put everything in perspective.
Brain Architecture
Below is a circuit diagram for the whole brain. Each of the main subsystems work together and are best understood together. You can probably get a good high level extremely coarse understanding of the entire brain is less than one hour.
(there are a couple of circuit diagrams of the whole brain on the web, but this is the best. From this site.)
The human brain has ~100 billion neurons and ~100 trillion synapses, but ultimately it evolved from the bottom up—from organisms with just hundreds of neurons, like the tiny brain of C. Elegans.
We know that evolution is code complexity constrained: much of the genome codes for cellular metabolism, all the other organs, and so on. For the brain, most of its bit budget needs to be spent on all the complex neuron, synapse, and even neurotransmitter level machinery—the low level hardware foundation.
For a tiny brain with 1000 neurons or less, the genome can directly specify each connection. As you scale up to larger brains, evolution needs to create vastly more circuitry while still using only about the same amount of code/bits. So instead of specifying connectivity at the neuron layer, the genome codes connectivity at the module layer. Each module can be built from simple procedural/fractal expansion of progenitor cells.
So the size of a module has little to nothing to do with its innate complexity. The cortical modules are huge—V1 alone contains 200 million neurons in a human—but there is no reason to suspect that V1 has greater initial code complexity than any other brain module. Big modules are built out of simple procedural tiling patterns.
Very roughly the brain’s main modules can be divided into six subsystems (there are numerous smaller subsystems):
The neocortex: the brain’s primary computational workhorse (blue/purple modules at the top of the diagram). Kind of like a bunch of general purpose FPGA coprocessors.
The cerebellum: another set of coprocessors with a simpler feedforward architecture. Specializes more in motor functionality.
The thalamus: the orangish modules below the cortex. Kind of like a relay/routing bus.
The hippocampal complex: the apex of the cortex, and something like the brain’s database.
The amygdala and limbic reward system: these modules specialize in something like the value function.
The Basal Ganglia (green modules): the central control system, similar to a CPU.
In the interest of space/time I will focus primarily on the Basal Ganglia and will just touch on the other subsystems very briefly and provide some links to further reading.
The neocortex has been studied extensively and is the main focus of several popular books on the brain. Each neocortical module is a 2D array of neurons (technically 2.5D with a depth of about a few dozen neurons arranged in about 5 to 6 layers).
Each cortical module is something like a general purpose RNN (recursive neural network) with 2D local connectivity. Each neuron connects to its neighbors in the 2D array. Each module also has nonlocal connections to other brain subsystems and these connections follow the same local 2D connectivity pattern, in some cases with some simple affine transformations. Convolutional neural networks use the same general architecture (but they are typically not recurrent.)
Cortical modules—like artifical RNNs—are general purpose and can be trained to perform various tasks. There are a huge number of models of the cortex, varying across the tradeoff between biological realism and practical functionality.
Perhaps surprisingly, any of a wide variety of learning algorithms can reproduce cortical connectivity and features when trained on appropriate sensory data[27]. This is a computational proof of the one-learning-algorithm hypothesis; furthermore it illustrates the general idea that data determines functional structure in any general learning system.
There is evidence that cortical modules learn automatically (unsupervised) to some degree, and there is also some evidence that cortical modules can be trained to relearn data from other brain subsystems—namely the hippocampal complex. The dark knowledge distillation technique in ANNs[28][29] is a potential natural analog/model of hippocampus → cortex knowledge transfer.
Module connections are bidirectional, and feedback connections (from high level modules to low level) outnumber forward connections. We can speculate that something like target propagation can also be used to guide or constrain the development of cortical maps (speculation).
The hippocampal complex is the root or top level of the sensory/motor hierarchy. This short youtube video gives a good seven minute overview of the HC. It is like a spatiotemporal database. It receives compressed scene descriptor streams from the sensory cortices, it stores this information in medium-term memory, and it supports later auto-associative recall of these memories. Imagination and memory recall seem to be basically the same.
The ‘scene descriptors’ take the sensible form of things like 3D position and camera orientation, as encoded in place, grid, and head direction cells. This is basically the logical result of compressing the sensory stream, comparable to the networking data stream in a multiplayer video game.
Imagination/recall is basically just the reverse of the forward sensory coding path—in reverse mode a compact scene descriptor is expanded into a full imagined scene. Imagined/remembered scenes activate the same cortical subnetworks that originally formed the memory (or would have if the memory was real, in the case of imagined recall).
The amygdala and associated limbic reward modules are rather complex, but look something like the brain’s version of the value function for reinforcement learning. These modules are interesting because they clearly rely on learning, but clearly the brain must specify an initial version of the value/utility function that has some minimal complexity.
As an example, consider taste. Infants are born with basic taste detectors and a very simple initial value function for taste. Over time the brain receives feedback from digestion and various estimators of general mood/health, and it uses this to refine the initial taste value function. Eventually the adult sense of taste becomes considerably more complex. Acquired taste for bitter substances—such as coffee and beer—are good examples.
The amygdala appears to do something similar for emotional learning. For example infants are born with a simple versions of a fear response, with is later refined through reinforcement learning. The amygdala sits on the end of the hippocampus, and it is also involved heavily in memory processing.
See also these two videos from khanacademy: one on the limbic system and amygdala (10 mins), and another on the midbrain reward system (8 mins)
The Basal Ganglia
The Basal Ganglia is a wierd looking complex of structures located in the center of the brain. It is a conserved structure found in all vertebrates, which suggests a core functionality. The BG is proximal to and connects heavily with the midbrain reward/limbic systems. It also connects to the brain’s various modules in the cortex/hippocampus, thalamus and the cerebellum . . . basically everything.
All of these connections form recurrent loops between associated compartmental modules in each structure: thalamocortical/hippocampal-cerebellar-basal_ganglial loops.
Just as the cortex and hippocampus are subdivided into modules, there are corresponding modular compartments in the thalamus, basal ganglia, and the cerebellum. The set of modules/compartments in each main structure are all highly interconnected with their correspondents across structures, leading to the concept of distributed processing modules.
Each DPM forms a recurrent loop across brain structures (the local networks in the cortex, BG, and thalamus are also locally recurrent, whereas those in the cerebellum are not). These recurrent loops are mostly separate, but each sub-structure also provides different opportunities for inter-loop connections.
The BG appears to be involved in essentially all higher cognitive functions. Its core functionality is action selection via subnetwork switching. In essence action selection is the core problem of intelligence, and it is also general enough to function as the building block of all higher functionality. A system that can select between motor actions can also select between tasks or subgoals. More generally, low level action selection can easily form the basis of a Turing Machine via selective routing: deciding where to route the output of thalamocortical-cerebellar modules (some of which may specialize in short term memory as in the prefrontal cortex, although all cortical modules have some short term memory capability).
There are now a number of computational models for the Basal Ganglia-Cortical system that demonstrate possible biologically plausible implementations of the general theory[28][29]; integration with the hippocampal complex leads to larger-scale systems which aim to model/explain most of higher cognition in terms of sequential mental programs[30] (of course fully testing any such models awaits sufficient computational power to run very large-scale neural nets).
For an extremely oversimplified model of the BG as a dynamic router, consider an array of N distributed modules controlled by the BG system. The BG control network expands these N inputs into an NxN matrix. There are N2 potential intermodular connections, each of which can be individually controlled. The control layer reads a compressed, downsampled version of the module’s hidden units as its main input, and is also recurrent. Each output node in the BG has a multiplicative gating effect which selectively enables/disables an individual intermodular connection. If the control layer is naively fully connected, this would require (N2)2 connections, which is only feasible for N ~ 100 modules, but sparse connectivity can substantially reduce those numbers.
It is unclear (to me), whether the BG actually implements NxN style routing as described above, or something more like 1xN or Nx1 routing, but there is general agreement that it implements cortical routing.
Of course in actuality the BG architecture is considerably more complex, as it also must implement reinforcement learning, and the intermodular connectivity map itself is also probably quite sparse/compressed (the BG may not control all of cortex, certainly not at a uniform resolution, and many controlled modules may have a very limited number of allowed routing decisions). Nonetheless, the simple multiplicative gating model illustrates the core idea.
This same multiplicative gating mechanism is the core principle behind the highly successful LSTM (Long Short-Term Memory)[30] units that are used in various deep learning systems. The simple version of the BG’s gating mechanism can be considered a wider parallel and hierarchical extension of the basic LSTM architecture, where you have a parallel array of N memory cells instead of 1, and each memory cell is a large vector instead of a single scalar value.
The main advantage of the BG architecture is parallel hierarchical approximate control: it allows a large number of hierarchical control loops to update and influence each other in parallel. It also reduces the huge complexity of general routing across the full cortex down into a much smaller-scale, more manageable routing challenge.
Implications for AGI
These two conceptions of the brain—the universal learning machine hypothesis and the evolved modularity hypothesis—lead to very different predictions for the likely route to AGI, the expected differences between AGI and humans, and thus any consequent safety issues and strategies.
In the extreme case imagine that the brain is a pure ULM, such that the genetic prior information is close to zero or is simply unimportant. In this case it is vastly more likely that successful AGI will be built around designs very similar to the brain, as the ULM architecture in general is the natural ideal, vs the alternative of having to hand engineer all of the AI’s various cognitive mechanisms.
In reality learning is computationally hard, and any practical general learning system depends on good priors to constrain the learning process (essentially taking advantage of previous knowledge/learning). The recent and rapid success of deep learning is strong evidence for how much prior information is ideal: just a little. The prior in deep learning systems takes the form of a compact, small set of hyperparameters that control the learning process and specify the overall network architecture (an extremely compressed prior over the network topology and thus the program space).
The ULH suggests that most everything that defines the human mind is cognitive software rather than hardware: the adult mind (in terms of algorithmic information) is 99.999% a cultural/memetic construct. Obviously there are some important exceptions: infants are born with some functional but very primitive sensory and motor processing ‘code’. Most of the genome’s complexity is used to specify the learning machinery, and the associated reward circuitry. Infant emotions appear to simplify down to a single axis of happy/sad; differentiation into the more subtle vector space of adult emotions does not occur until later in development.
If the mind is software, and if the brain’s learning architecture is already universal, then AGI could—by default—end up with a similar distribution over mindspace, simply because it will be built out of similar general purpose learning algorithms running over the same general dataset. We already see evidence for this trend in the high functional similarity between the features learned by some machine learning systems and those found in the cortex.
Of course an AGI will have little need for some specific evolutionary features: emotions that are subconsciously broadcast via the facial muscles is a quirk unnecessary for an AGI—but that is a rather specific detail.
The key takeway is that the data is what matters—and in the end it is all that matters. Train a universal learner on image data and it just becomes a visual system. Train it on speech data and it becomes a speech recognizer. Train it on ATARI and it becomes a little gamer agent.
Train a universal learner on the real world in something like a human body and you get something like the human mind. Put a ULM in a dolphin’s body and echolocation is the natural primary sense, put a ULM in a human body with broken visual wiring and you can also get echolocation.
Control over training is the most natural and straightforward way to control the outcome.
To create a superhuman AI driver, you ‘just’ need to create a realistic VR driving sim and then train a ULM in that world (better training and the simple power of selective copying leads to superhuman driving capability).
So to create benevolent AGI, we should think about how to create virtual worlds with the right structure, how to educate minds in those worlds, and how to safely evaluate the results.
One key idea—which I proposed five years ago is that the AI should not know it is in a sim.
New AI designs (world design + architectural priors + training/education system) should be tested first in the safest virtual worlds: which in simplification are simply low tech worlds without computer technology. Design combinations that work well in safe low-tech sandboxes are promoted to less safe high-tech VR worlds, and then finally the real world.
A key principle of a secure code sandbox is that the code you are testing should not be aware that it is in a sandbox. If you violate this principle then you have already failed. Yudkowsky’s AI box thought experiment assumes the violation of the sandbox security principle apriori and thus is something of a distraction. (the virtual sandbox idea was most likely discussed elsewhere previously, as Yudkowsky indirectly critiques a strawman version of the idea via this sci-fi story).
The virtual sandbox approach also combines nicely with invisible thought monitors, where the AI’s thoughts are automatically dumped to searchable logs.
Of course we will still need a solution to the value learning problem. The natural route with brain-inspired AI is to learn the key ideas behind value acquisition in humans to help derive an improved version of something like inverse reinforcement learning and or imitation learning[31] - an interesting topic for another day.
Conclusion
Ray Kurzweil has been predicting for decades that AGI will be built by reverse engineering the brain, and this particular prediction is not especially unique—this has been a popular position for quite a while. My own investigation of neuroscience and machine learning led me to a similar conclusion some time ago.
The recent progress in deep learning, combined with the emerging modern understanding of the brain, provide further evidence that AGI could arrive around the time when we can build and train ANNs with similar computational power as measured very roughly in terms of neuron/synapse counts. In general the evidence from the last four years or so supports Hanson’s viewpoint from the Foom debate. More specifically, his general conclusion:
Future superintelligences will exist, but their vast and broad mental capacities will come mainly from vast mental content and computational resources. By comparison, their general architectural innovations will be minor additions.
The ULH supports this conclusion.
Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization. Furthermore, Moore’s Law for GPUs still has some steam left, and software advances are currently improving simulation performance at a faster rate than hardware. These trends implies that Anthropomorphic/Neuromorphic AGI could be surprisingly close, and may appear suddenly.
What kind of leverage can we exert on a short timescale?
- My Objections to “We’re All Gonna Die with Eliezer Yudkowsky” by 21 Mar 2023 0:06 UTC; 357 points) (
- Brain Efficiency: Much More than You Wanted to Know by 6 Jan 2022 3:38 UTC; 198 points) (
- Jeff Hawkins on neuromorphic AGI within 20 years by 15 Jul 2019 19:16 UTC; 170 points) (
- My Objections to “We’re All Gonna Die with Eliezer Yudkowsky” by 21 Mar 2023 1:23 UTC; 166 points) (EA Forum;
- AI Timelines via Cumulative Optimization Power: Less Long, More Short by 6 Oct 2022 0:21 UTC; 139 points) (
- Introduction to Introduction to Category Theory by 6 Oct 2019 14:43 UTC; 114 points) (
- A case for AI alignment being difficult by 31 Dec 2023 19:55 UTC; 105 points) (
- Contra Yudkowsky on Doom from Foom #2 by 27 Apr 2023 0:07 UTC; 93 points) (
- Contra Yudkowsky on AI Doom by 24 Apr 2023 0:20 UTC; 88 points) (
- How uniform is the neocortex? by 4 May 2020 2:16 UTC; 79 points) (
- My take on Jacob Cannell’s take on AGI safety by 28 Nov 2022 14:01 UTC; 71 points) (
- The two-layer model of human values, and problems with synthesizing preferences by 24 Jan 2020 15:17 UTC; 70 points) (
- LOVE in a simbox is all you need by 28 Sep 2022 18:25 UTC; 64 points) (
- Empowerment is (almost) All We Need by 23 Oct 2022 21:48 UTC; 61 points) (
- Human instincts, symbol grounding, and the blank-slate neocortex by 2 Oct 2019 12:06 UTC; 60 points) (
- Magna Alta Doctrina by 11 Dec 2021 21:54 UTC; 60 points) (
- AI Safety 101 : Capabilities—Human Level AI, What? How? and When? by 7 Mar 2024 17:29 UTC; 46 points) (
- A newcomer’s guide to the technical AI safety field by 4 Nov 2022 14:29 UTC; 42 points) (
- Are humans misaligned with evolution? by 19 Oct 2023 3:14 UTC; 42 points) (
- Analogical Reasoning and Creativity by 1 Jul 2015 20:38 UTC; 39 points) (
- Has anyone increased their AGI timelines? by 6 Nov 2022 0:03 UTC; 38 points) (
- [Intro to brain-like-AGI safety] 11. Safety ≠ alignment (but they’re close!) by 6 Apr 2022 13:39 UTC; 34 points) (
- A Model of Ontological Development by 31 Dec 2020 1:55 UTC; 30 points) (
- Self-Supervised Learning and AGI Safety by 7 Aug 2019 14:21 UTC; 29 points) (
- Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.) by 15 Oct 2023 14:51 UTC; 28 points) (
- AI Timelines via Cumulative Optimization Power: Less Long, More Short by 6 Oct 2022 7:06 UTC; 27 points) (EA Forum;
- Why Do People Think Humans Are Stupid? by 14 Sep 2022 13:55 UTC; 22 points) (
- 22 Sep 2024 16:37 UTC; 19 points) 's comment on Another argument against maximizer-centric alignment paradigms by (
- Gary Marcus vs Cortical Uniformity by 28 Jun 2020 18:18 UTC; 18 points) (
- A newcomer’s guide to the technical AI safety field by 4 Nov 2022 14:29 UTC; 16 points) (EA Forum;
- Without a trajectory change, the development of AGI is likely to go badly by 29 May 2023 23:42 UTC; 16 points) (
- Why do we post our AI safety plans on the Internet? by 31 Oct 2022 16:27 UTC; 15 points) (EA Forum;
- 23 Jul 2015 0:47 UTC; 15 points) 's comment on MIRI’s 2015 Summer Fundraiser! by (
- Best resource to go from “typical smart tech-savvy person” to “person who gets AGI risk urgency”? by 15 Oct 2022 22:26 UTC; 14 points) (
- Nature < Nurture for AIs by 4 Jun 2023 20:38 UTC; 14 points) (
- What role should evolutionary analogies play in understanding AI takeoff speeds? by 11 Dec 2021 1:19 UTC; 14 points) (
- 2 Jul 2015 18:29 UTC; 13 points) 's comment on July 2015 Media Thread by (
- What role should evolutionary analogies play in understanding AI takeoff speeds? by 11 Dec 2021 1:16 UTC; 12 points) (EA Forum;
- AI Prejudices: Practical Implications by 19 Oct 2024 2:19 UTC; 12 points) (
- 21 Oct 2022 0:51 UTC; 12 points) 's comment on The heritability of human values: A behavior genetic critique of Shard Theory by (
- 24 Jul 2015 4:35 UTC; 10 points) 's comment on Steelmaning AI risk critiques by (
- 24 Jul 2015 4:42 UTC; 10 points) 's comment on Steelmaning AI risk critiques by (
- 19 Sep 2024 14:45 UTC; 10 points) 's comment on The case for a negative alignment tax by (
- 8 Feb 2023 1:34 UTC; 9 points) 's comment on Review of AI Alignment Progress by (
- 29 Jan 2024 5:40 UTC; 9 points) 's comment on Why I take short timelines seriously by (
- 20 Oct 2022 21:13 UTC; 6 points) 's comment on The heritability of human values: A behavior genetic critique of Shard Theory by (
- 21 Sep 2022 23:35 UTC; 6 points) 's comment on How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It) by (
- 30 Sep 2022 18:09 UTC; 5 points) 's comment on LOVE in a simbox is all you need by (
- 12 Dec 2021 16:08 UTC; 5 points) 's comment on Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment by (
- 24 Apr 2023 8:08 UTC; 4 points) 's comment on Contra Yudkowsky on AI Doom by (
- 11 Dec 2021 18:41 UTC; 4 points) 's comment on Are big brains for processing sensory input? by (
- Why do we post our AI safety plans on the Internet? by 3 Nov 2022 16:02 UTC; 4 points) (
- 28 Jul 2015 22:29 UTC; 4 points) 's comment on Steelmaning AI risk critiques by (
- 27 Aug 2015 23:27 UTC; 3 points) 's comment on Travel Through Time to Increase Your Effectiveness by (
- 17 May 2023 3:29 UTC; 3 points) 's comment on AI Will Not Want to Self-Improve by (
- 21 Oct 2022 0:40 UTC; 3 points) 's comment on The heritability of human values: A behavior genetic critique of Shard Theory by (
- AI Safety 101 : AGI by 21 Dec 2023 14:18 UTC; 2 points) (EA Forum;
- 21 Jul 2015 21:27 UTC; 2 points) 's comment on Open Thread, Jul. 20 - Jul. 26, 2015 by (
- Without a trajectory change, the development of AGI is likely to go badly by 30 May 2023 0:21 UTC; 1 point) (EA Forum;
- 6 Nov 2015 23:01 UTC; 1 point) 's comment on Newcomb, Bostrom, Calvin: Credence and the strange path to a finite afterlife by (
- 28 Jul 2015 1:31 UTC; 1 point) 's comment on Steelmaning AI risk critiques by (
- 10 Jul 2015 0:13 UTC; 1 point) 's comment on Analogical Reasoning and Creativity by (
- 22 Jan 2024 11:21 UTC; 0 points) 's comment on Four visions of Transformative AI success by (
- 22 Feb 2023 12:45 UTC; -3 points) 's comment on DragonGod’s Shortform by (
All of this is interesting, but it seems to me that you did not make a strong case for the brain using an universal learning machine as its main system.
Specifically, I think you fail to address the evidence for evolved modularity:
The brain uses spatially specialized regions for different cognitive tasks.
This specialization pattern is mostly consistent across different humans and even across different species.
Damage to or malformation of some brain regions can cause specific forms of disability (e.g. face blindness). Sometimes the disability can be overcome but often not completely.
In various mammals, infants are capable of complex behavior straight out of the womb. Human infants are only exhibit very simple behaviors and require many years to reach full cognitive maturity therefore the human brain relies more on learning than the brain of other mammals, but the basic architecture is the same, thus this is a difference of degree, not kind.
It seems more likely that if there is a general-purpose “universal” learning system in the human brain then it is used as an inefficient fall-back mechanism when the specialized modules fail, not as the core mechanism that handles most of the cognitive tasks.
I’m also wary about using the recent successes of deep learning to draw inferences about how the brain works.
Be ware of the “ELIZA effect”: due to our over-active agency detection ability, we tend to anthropomorphize the behavior of even very simple AI systems.
There seems to be a trend in AI where for any technique that is currently hot there are people who say: “This is how the brain works. We don’t know all the details, but studies X, Y and Z clearly point in this direction.” After a few years and maybe an AI (mini)winter the brain seems to work in another way...
Specifically on deep learning:
For all the speculation, there is still no clear evidence that the brain uses anything similar to backpropagation.
Some of the most successful deep learning approaches, such as modern convnets for computer vision, rely on quite un-biological features such as weight sharing and rectified linear units.
“Deep learning” is a quite vague term anyway, it does not refer to any single algorithm or architecture. In fact, there are so many architectural variants and hyper-parameters that need to be adapted to each specific task that optimizing them can be considered a non-trivial learning problem on its own.
Perhaps most importantly, deep learning methods generally work in supervised learning settings and they have quite weak priors: they require a dataset as big as ImageNet to yield good image recognition performances (with still some characteristic error patterns), or a parallel corpus of million sentence pairs to yield sub-human level machine translation quality or days of continuous simulated gameplay on the ATARI 2600 emulator to obtain good scores (super-human for some games, sub-human for others). Clearly humans are able to effectively learn form a much smaller amount of evidence, indicating stronger priors and the ability to exploit minimal supervision.
Therefore I would say that deep learning methods, while certainly interesting from an engineering perspective, are probably not very much relevant to the understanding of the brain, at least given the current state of the evidence.
Thanks, I was waiting for at least one somewhat critical reply :)
The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I’ve defined that hypothesis, at least for the cortex. The EMH posits that the specific cortical regions rely on complex innate circuitry specialized for specific tasks. The evidence disproves that hypothesis.
Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn’t differentiate the two hypotheses.
Yes—and I described what is known about that basic architecture. The extent to which a particular brain relies on learning vs innate behaviour depends on various tradeoffs such as organism lifetime and brain size. Small brained and short-living animals have much less to gain from learning (less time to acquire data, less hardware power), so they rely more on innate circuitry, much of which is encoded in the oldbrain and the brainstem. This is all very much evidence for the ULH. The generic learning structures—the cortex and cerbellum, generally grow in size with larger organisms and longer lifespans.
This has also been tested via decortication experiments and confirms the general ULH—rabbits rely much less on their cortex for motor behavior, larger primates rely on it almost exclusively, cats and dogs are somewhere in between, etc.
This evidence shows that the cortex is general purpose, and acquires complex circuitry through learning. Recent machine learning systems provide further evidence in the form of—this is how it could work.
As I mentioned in the article, backprop is not really biologically plausible. Targetprop is, and there are good reasons to suspect the brain is using something like targetprop—as that theory is the latest result in a long line of work attempting to understand how the brain could be doing long range learning. Investigating and testing the targetprop theory and really confirming it could take a while—even decades. On the other hand, if targetprop or some variant is proven to work in a brain-like AGI, that is something of a working theory that could then help accelerate neuroscience confirmation.
I did not say deep learning is “how the brain works”. I said instead the brain is—roughly—a specific biological implementation of a ULH, which itself is a very general model which also will include any practical AGIs.
I said that DL helps indirectly confirm the ULH of the brain, specifically by showing how the complex task specific circuitry of the cortex could arise through a simple universal learning algorithm.
Computational modeling is key—if you can’t build something, you don’t understand it. To the extent that any AI model can functionally replicate specific brain circuits, it is useful to neuroscience. Period. Far more useful than psychological theorizing not grounded in circuit reality. So computational neuroscience and deep learning (which really is just the neuroscience inspired branch of machine learning) naturally have deep connections.
Biological plausibility was one of the heavily discussed aspects of RELUs.
From the abstract:
“While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of . . ”
Weight sharing is unbiological: true. It is also an important advantage that von-neumman (time-multiplexed) systems have over biological (non-multiplexed). The neuromorphic hardware approaches largely cannot handle weight-sharing. Of course convnents still work without weight sharing—it just may require more data and or better training and regularization. It is interesting to speculate how the brain deals with that, as is comparing the details of convent learning capability vs bio-vision. I don’t have time to get into that at the moment, but I did link to at least one article comparing convents to bio vision in the OP.
Sure—so just taboo it then. When I use the term “deep learning”, it means something like “the branch of machine learning which is more related to neuroscience” (while still focused on end results rather than emulation).
Comparing two learning systems trained on completely different datasets with very different objective functions is complicated.
In general though, CNNs are a good model of fast feedforward vision—the first 150ms of the ventral stream. In that domain they are comparable to biovision, with the important caveat that biovision computes a larger and richer output parameter map than most any CNNs. Most CNNs (there are many different types) are more narrowly focused, but also probably learn faster because of advantages like weight sharing. The amount of data required to train the CNN up to superhuman performance on narrow tasks is comparable or less than that required to train a human visual system up to high performance. (but again the cortex is doing something more like transfer learning, which is harder)
Past 150 ms or so and humans start making multiple saccades and also start to integrate information from a larger number of brain regions, including frontal and temporal cortical regions. At that point the two systems aren’t even comparable, humans are using more complex ‘mental programs’ over multiple saccades to make visual judgements.
Of course, eventually we will have AGI systems that also integrate those capabilities.
That’s actually extremely impressive—superhuman learning speed.
In that case, I would say you may want to read up more on the field. If you haven’t yet, check out the original sparse coding paper (over 3000 citations), to get an idea of how crucial new computational models have been for advancing our understanding of cortex.
But none of these works as well as using the original task-specific regions, and anyway in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.
But then why doesn’t universal learning just co-opt some other brain region to perform the task of the damaged one? In the cases where there is a congenital malformation, that makes the usual task-specific region missing or dysfunctional, why isn’t the task allocated to some other region?
And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset from different random initializations each time the hidden nodes will specialize in a different way: at least ANNs have permutation symmetry between nodes in the same layer, and as long as nodes operate in the linear region of the activation function, there is also redundancy between layers. This means that many sets of weights specify the same or similar function, and the training process chooses one of them randomly depending on the initialization (and minibatch sampling, dropout, etc.).
If, as you claim, the basal ganglia and the cortex in the brain make up a sort of cpu-memory system, then there should be substantial permutation symmetry. After all, in a computer you can swap block or pages of memory around and as long as pointers (or page tables) are updated the behavior does not change, up to some performance issues due to cache misses. If the brain worked that way we should expect cortical regions to be allocated to different tasks in a more or less random pattern varying between individuals.
Instead we observe substantial consistency, even in the left-right specialization patterns which is remarkable since at macroscopic level the brain has substantial lateral symmetry.
Decortication experiments only show that certain species rely on the cortex more than others, they don’t show that that cortex is general purpose and acquires complex circuitry through learning.
Horses, for instance, are large animals with a long lifespan and a large brain (encephalization coefficient similar to that of cats and dogs), and yet a newborn horse is able to walk, run and follow their mother within a few hours from birth.
Targetprop is still highly speculative. It has not shown to work well in artificial neural networks and the evidence of biological plausibility is handwavy.
Ok.
In principle yes, but trivially so as they are universal approximators. In practice, weight sharing enables these systems to easily learn translational invariance.
Humans get tired after continuously playing for a few hours, but in terms of overall playtime they learn faster.
No—these studies involve direct measurements (electrodes for the ferret rewiring, MRI for echolocation). They know the rewired auditory cortex is doing vision, etc.
It can, and this does happen all the time. Humans can recover from serious brain damage (stroke, injury, etc). It takes time to retrain and reroute circuitry—similar to relearning everything that was lost all over again.
Current ANN’s assume a fixed module layout, so they aren’t really comparable in module-task assignment.
Much of the specialization pattern could just be geography—V1 becomes visual because it is closest to the visual input. A1 becomes auditory because it is closest to the auditory input. etc.
This should be the default hypothesis, but there also could be some element of prior loading, perhaps from pattern generators in the brainstem. (I have read a theory that there is a pattern generator for faces that pretrains the visual cortex a little bit in the womb, so that it starts with a vague primitive face detector).
I said the BG is kind-of-like the CPU, the cortex is kind-of-like a big FPGA, but that is an anlogy. The are huge differences between slow bio-circuitry and fast von neumman machines.
Firstly the brain doesn’t really have a concept of ‘swapping memory’. The closest thing to that is retraining, where the hippocampus can train info into the cortex. It’s a slow complex process that is nothing like swapping memory.
Finally the brain is much more optimized at the wiring/latency level. Functionality goes in certain places because that is where it is best for that functionality—it isn’t permutation symmetric in the slightest. Every location has latency/wiring tradeoffs. In a von neumman memory we just abstract that all away. Not in the brain. There is an actual optimal location for every concept/function etc.
That is fast for mammals—I know first hand that it can take days for deer. Nonetheless, as we discussed, the brainstem provides a library of innate complex motor circuitry in particular, which various mammals can rely on to varying degrees, depending on how important complex early motor behavior is.
I agree that there is still more work to be done understanding the brain’s learning machinery. Targetprop is useful/exciting in ML, but it isn’t the full picture yet.
Not at all. The Atari agent becomes semi-superhuman by day 3 of it’s life. When humans start playing atari, they already have trained vision and motor systems, and Atari is designed for these systems. Even then your statement is wrong—in that I don’t think any children achieve playtester levels of skill in just even a few days.
Well, the eyes are at the front of the head, but the optic nerves connect to the brain at the back, and they also cross at the optic chiasm. Axons also cross contralaterally in the spinal cord and if I recall correctly there are various nerves that also don’t take the shortest path.
This seems to me as evidence that the nervous system is not strongly optimized for latency.
This is a total misconception, and it is a good example of the naive engineer fallacy (jumping to the conclusion that a system is poorly designed when you don’t understand how the system actually works and why).
Remember the distributed software modules—including V1 - have components in multiple physical modules (cortex, cerebellum, thalamus, BG). Not every DSM has components in all subsystems, but V1 definitely has a thalamic relay component (VGN).
The thalamus/BG is in the center of the brain, which makes sense from wiring minimization when you understand the DPM system. Low freq/compressed versions of the cortical map computations can interact at higher speeds inside the small compact volume of the BG/thalamus. The BG/thalamus basically contains a microcosm model of the cortex within itself.
The thalamic relay comes first in sequential processing order, so moving cortical V1 closer to the eyes wouldn’t help in the slightest. (Draw this out if it doesn’t make sense)
For e.g. the ferret rewiring experiments, tongue based vision, etc., is a plausible alternative hypothesis that there are more general subtypes of regions that aren’t fully specialized but are more interoperable than others?
For example, (Playing devil’s advocate here) I could phrase all of the mentioned experiments as “sensory input remapping” among “sensory input processing modules.” Similarly, much of the work in BCI interfaces for e.g. controlling cursors or prosthetics could be called “motor control remapping”. Have we ever observed cortex being rewired for drastically dissimilar purposes? For example, motor cortex receiving sensory input?
If we can’t do stuff like that, then my assumption would be that at the very least, a lot of the initial configuration is prenatal and follows kind of a “script” that might be determined by either some genome-encoded fractal rule of tissue formation, or similarities in the general conditions present during gestation. Either way, I’m not yet convinced there’s a strong argument that all brain function can be explained as working like a ULM (Even if a lot of it can)
I’m not sure—I have a vague memory of something along those lines but .. nothing specific.
From what I remember, motor, sensor, and association cortex do have some intrinsic differences at the microcircuit level. For example some motor cortex has larger pyramidal cells in the output layer. However, I believe most motor cortex is best described as sensorimotor—it depends heavily on sensor data from the body.
Well yes—there is a general script for the overall architecture, and alot of innate functionality as well, especially in specific regions like the brainstem’s pattern generators. As I said in the article—there is always room for innate functionality in the architectural prior and in specific circuits—the brain is certainly not a pure ULM.
ULM refers to the overall architecture, with the general learning part specifically implemented by the distributed BG/cortex/cerbellum modules. But the BG and hippocampal system also rely heavily on learning internally, as does the amygdala and .. probably almost all of it to varying degrees. The brainstem is specifically the place where we can point and say—this is mostly innate circuitry, but even it probably has some learning going on.
It’s far more likely that different brain modules implement different learning rules, but all learn, than that they encode innate mental functionality which is not subject to learning at all.
I’m inclined to agree. Actually I’ve been convinced for a while that this is a matter of degrees rather than being fully one way or the other (Modules versus learning rules), and am convinced by this article that the brain is more of a ULM than I had previously thought.
Still, when I read that part the alternative hypothesis sprung to mind, so I was curious what the literature had to say about it (Or the post author.)
It seems a little strange to treat this as a triumphant victory for the ULH. At the most, you’ve shown that the “fundamentalist” evolved modularity hypothesis is false. You didn’t really address how the ULH explains this same evidence.
And there are other mysteries in this model, such as the apparent universality of specific cognitive heuristics and biases, or of various behaviours like altruism, deception, sexuality that seems obviously evolved. And, as V_V mentioned, the lateral asymmetry of the brain’s functionality vs the macroscopic symmetry.
Otherwise, the conclusion I would draw from this is that both theories are wrong, or that some halfway combination of them is true (say, “universal” plasticity plus a genetic set of strong priors somehow encoded in the structure).
Thank you. This was an excellent article, which helped me clarify my own thinking on the topic.
I’d love to see you write more on this.
A few brief supplements to your introduction:
The source of the generated image is no longer mysterious: Inceptionism: Going Deeper into Neural Networks
But though the above is quite fascinating and impressive, we should also keep in mind the bizarre false positives that a person can generate: Images that fool computer vision raise security concerns
The trippy shuggorth title image was mysterious when it was originally posted, basically someone leaked an image a little before the inceptionism blog post.
A CNN is a reasonable model for fast feedforward vision. We can isolate this pathway for biological vision by using rapid serial presentation—basically flashing an image for 100ms or so.
So imagine if you just saw a flash of one of these images, for a brief moment, and then you had to quickly press a button for the image category—no time to think about it—it’s jeopardy style instant response.
There is no button for “noisy image”, there is no button for “wavy line image”, etc.
Now the fooling images are generated by an adversarial process. It’s like we have a copy of a particular mind in a VR sim, we flash it an image, see what button it presses. Based on the response, we then generate a new image and unwind time and repeat. We keep doing this until we get some wierd classification errors. It allows us to explore the decision space of the agent.
It is basically reverse engineering. It requires a copy of the agent’s code or at least access to a copy with the ability to do tons of queries, and it also probably depends on the agent being completely deterministic. I think that biological minds avoid this issue indirectly because they use stochastic sampling based on secure hardware/analog noise generators.
Stochastic models/ANNs could probably avoid this issue.
I look at the bizarre false positives and I wonder if (warning: wild speculation) the problem is that the networks were not trained to recognize the lack of objects. For example, in most cases you have some noise in the image, so if every training image is something, or rather something-plus-noise, then the system could learn that the noise is 100% irrelevant and pick out the something.
(The noisy images look to me like they have small patches in one spot faintly resembling what they’re identified as — if my vision had a rule that deemphasized the non-matching noise and I had a much smaller database of the world than I do, then I think I’d agree with those neural networks.)
If the above theory is true, then a possible fix would be to include in training data a variety of images for which the expected answers are like “empty scene”, “too noisy”, “simple geometric pattern”, etc. But maybe this is already done — I’m not familiar with the field.
No, even if you classify these false positives as “no image”, this will not prevent someone from constructing new false positives.
Basically the amount of training data is always extremely small compared to the theoretically possible number of distinct images, so it is always possible to construct such adversarial positives. These are not random images which were accidentally misidentified in this way. They have been very carefully designed based on the current data set.
Something similar is probably theoretically possible with human vision recognition as well. The only difference would be that we would be inclined to say “but it really does look like a baseball!”
This technique exploits the fact that the CNN is completely deterministic—see my reply above. It may be very difficult for stochastic networks.
CNNs are comparable to the first 150ms or so of human vision, before feedback , multiple saccades, and higher order mental programs kicks in. So the difficulty in generating these fooling images also depends on the complexity of the inference—a more complex AGI with human-like vision given larger amounts of time to solve the task would probably also be harder to fool, independent of the stochasticity issue.
A human being would be capable of pointing out why something looks like a baseball—to be able to point out where the curves and lines are that provoke that idea. We do this when we gaze at clouds without coming to believe there really are giant kettles floating around; we’re capable of taking the abundance of contextual information in the scene into account and coming up with reasonable hypotheses for why what we’re seeing looks like x, y or z. If classifier vision systems had the same ability they probably wouldn’t make the egregious mistakes they do.
If I understand correctly how these images are constructed, it would be something like this: take some random image. The program can already make some estimate of whether it is a baseball, say 0.01% or whatever. Then you go through the image pixel by pixel and ask, “If I make this pixel slightly brighter, will your estimate go up? if not, will it go up if I make it slightly dimmer?” (This is just an example, you could change the color or whatever as well.) Thus you modify each pixel such that you increase the program’s estimate that it is a baseball. By the time you have gone through all the pixels, the probability of being a baseball is very high. But to us, the image looks more or less just the way it did at first. Each pixel has been modified too slightly to be noticed by us.
But this means that in principle the program can indeed explain why it looks like a baseball—it is a question of a very slight tendency in each pixel in the entire image.
But the explanation will be just as complex as the procedure used to classify the data. If I change the hue slightly or twiddle their RGB values just slightly, the “explanation” for why the data seems to contain a baseball image will be completely different. Human beings on the other hand can look at pictures of the same object in different conditions of lighting, of different particular sizes and shapes, taken from different camera angles, etc. and still come up with what would be basically the same set of justifications for matching each image to a particular classification (e.g. an image contains a roughly spherical field of white, with parallel bands of stitch-like markings bisecting it in an arc...hence it’s of a baseball).
The ability of human beings to come up with such compressed explanations, and our ability to arrange them into an ordering, is arguably what allows us to deal with iconic representations of and represent objects at varying levels of detail (as in http://38.media.tumblr.com/tumblr_m7z4k1rAw51rou7e0.png).
Will it?
What if slightly twiddling the RGB values produces something that is basically “spherical field of white, etc. with enough noise on top of it that humans can’t see it”?
That would all hinge on what it means for an image to be “hidden” beneath noise, I suppose. The more noise you layer on top of an image the more room for interpretation there is in classifying it, and the less salient any particular classification candidate will be. If a scrutable system can come up with compelling arguments for a strange classification that human beings would not make, then its choices would be naturally less ridiculous than otherwise. But to say that “humans conceivably may suffer from the same problem” is a bit of a dodge; esp. in light of the fact that these systems are making mistakes we clearly would not.
But either way, what you’re proposing and what Unknowns was arguing are different. Unknowns was (if I understood him rightly) arguing that the assignment of different probability weights for pixels (or, more likely, groups of pixels) representing a particular feature of an object is an explanation of why they’re classified the way they are. But such an “explanation” in inscrutable; we cannot ourselves easily translate it into the language of lines, curves, apparent depth, etc. (unless we write some piece of software to do this and which is then effectively part of the agent).
Look at it from the other end: You can take a picture of a baseball and overlay noise on top of it. There could, at least plausibly, be a point where overlaying the noise destroys the ability for humans to see the baseball, but the information is actually still present (and could, for instance, be recovered if you applied a noise reduction algorithm to that). Perhaps when you are twiddling the pixels of random noise, you’re actually constructing such a noisy baseball image a pixel at a time.
Agree with all you said, but have to comment on
You could be constructing a noisy image of a baseball one pixel at a time. In fact if you actually are then your network would be amazingly robust. But in a non-robust network, it seems much more probable that you’re just exploiting the system’s weaknesses and milking them for all they’re worth.
Great post! Thanks for writing it. Seems like a good fit for Main.
So just to clarify my understanding: If the ULH is true it becomes more plausible that, say, playing video games and hating books because authority figures force you to read them in school have long-term broad impacts on your personality. And if the EMH is true, it becomes more plausible that important characteristics like the Big Five personality traits and intelligence are genetically coded and you become the person your genes describe. Correct?
Us humans have contemplated whether we are in a simulation even though no one “outside the Matrix” told us we might be. Is it possible that an AI-in-training might contemplate the same thing?
Really? My impression was that Hanson had more of a EMH view.
I agree with this largely but I would replace ‘personality’ with ‘mental software’, or just ‘mind’. Personality to me connotes a subset of mental aspects that are more associated with innate variables.
I suspect that enjoying/valuing learning is extremely important for later development. It seems probable that some people are born with a stronger innate drive for learning, but that drive by itself can also probably be adjusted through learning. But i’m not aware of any hard evidence on this matter.
In my case I was somewhat obsessed with video games as a young child and my father actually did force me to read books and even the encyclopedia. I found that I hated the books he made me read (I only liked sci-fi) but I loved the encyclopedia. I ended up learning how to quickly skim books and fake it enough to pass the resulting QA test.
I don’t think abstract high level variables like big five personality traits or IQ scores are the relevant features for the EMH vs ULH issue. For example in the ULH scenario, there is still plenty of room for strongly genetically determined IQ effects (hardware issues/tradeoffs), and personality variables are not complex cognitive algorithms.
Sure, and this was part of what my post from 5 years back was all about. It’s kind of a world design issue. Is it better to have your AIs in your testsim believe in a simplistic creator god? (which is in a way on the right track with regards to the sim arg, but it also doesn’t do them much good) Or is better for them to have a naturalist/atheist worldview? (potentially more dangerous in the long term as it leads to scientific investigation and eventually the sim arg)
That post was downvoted into hell, in part I think because I posted to main—I was new to LW and didn’t understand the main/discussion distinction. Also, I think people didn’t like the general idea of anything mentioning the word theology, or the idea of intentionally giving your testsim AI a theology.
I should clarify—I meant Hanson’s viewpoint on just the FOOM issue specifically as outlined in that one post, not his whole view on AGI—which I gather is very much a EMH type viewpoint. His view seems to be pessimistic on first principles AGI but also pessimistic on brain-based AGI but optimistic on brain uploading. However many of his insights/speculations on a brain upload future also apply equally well to a brain-based AGI future.
Re: AIs in a simulation, it seems like whatever goals the AI had would be defined in terms of the simulation (similar to how if humanity discovered we were in a hackable simulation, our first priorities would be to make sure the simulation didn’t get shut off, invent immortality, provide everyone with unlimited cake, etc.--all concerns that exist within our simulation.) So even if the AI realizes its in a simulation, having its goal defined in terms of the simulation probably counts as a weak security measure.
There is a more sinister interpretation of the idea of the mind as universal learning machine. That is, it is a pure blank neural net of some relatively simple architecture, which maps inputs to outputs. Recently there were an attempts to create self-driving car AIs using such approach: they just showed to the blank neural net hundreds of thousands of hours of driving and it has trained to predict the correct driver behaviour in any incoming situation. Such car-driving nets produced good performance (but still worse than advanced systems with Lidars, and hand-coded rules above them) so they are used by hobbists.
It has the following implications:
1) The secret of being human is in the dataset, not in human brain, and seriously damaged or altered brain could still learn how to be human (some autists have wrong wiring on the neuron levels, but after extensive training they could become functional humans). Even an animal could partly do it (Koko).
2) Humans don’t have “free will”, “values” or even “think”—they repeat replies that is encoded in their dataset. To “program” human, you need to write very long book (like Bible), but one’s mind can’t be changes with short text.
3) This explains very slow progress of Homo Sapiens between 1.5 mln years ago and 0.1 mln, when the form of spear-heads barely changed, and exponentially quicker progress after. In paleolite humans have very limited training dataset. After it, they created cave art (very slowly—they started from creating a models of dead animals from bones and spent millennia to come to the idea of drawings) and some other forms of enriched dataset. Dataset started to grow exponentially, and it started to include the idea of “creating new” (not idea, but a number of example how to do it).
4)We could create an AI which will mimics human behaviour by training rather simple (but large) neural net on the human dataset, like recordings of a child growth. It can’t be aligned, as it doesn’t have explicitly represented values, so it will be a compete black box, but it could have ethical behaviour, if it is trained on the “ethical dataset”.
5) This means that available hardware and data is what needed for AI creation.
One of the best posts I’ve here on LW, congratulations. I think that the most important algorithms that the brain implements will probably be less complex than anticipated. Epigenesis and early ontogenetic adaptation are heavily depended on feedback from the environment and probably very general, even if the ‘evolution of learning’ and genetic complexity provides some of the domain specifications ab initio. Results considering bounded computation (computational resources and limited information) will probably show that the ULM viewpoint cluster is compatible with the existence of cognitive biases and heuristics in our cognition http://www.pnas.org/content/103/9/3198
Thanks! That’s an interesting link. At the very lowest level, neural competition through local inhibitory circuits is a central mechanism used throughout brains. Of course that’s not the same thing as conflicting agents, but perhaps there is a general them of competition between subsytems at higher levels.
Yes. That paper has been cited by Stuart J. Russell’s “Rationality and Intelligence: A Brief Update” and in Valiant ’s second paper on evolvability.
This is the best case for near AI I’ve read so far, and I also love your proposals for FAI.
I want to read more of your writings. Link?
Thanks! You can browse my submitted history here on LW, and also my blog has some more going back over the years.
Where is your blog?
Just click on his username.
https://entersingularity.wordpress.com/
I think a distinction worth tracing here is the diferrence between “learning” in the neural-net-sense and “learning” in the human pedagogical/psychological sense.
The “learning” done by a piece of cortex becoming a visual cortex after receiving neural impulses from the eye isn’t something you can override by teaching a person (in the usual sense o the word “teaching”) - you’d need to rewire their brain. I don’t think you can call it cultural/memetic because this neural learning does not (seem to) occur through the mechanism(s) that deals with concepts, ideas and feelings, which is involved in learning a language or a social custom or a scientific theory.
In the same way, maybe the availability heuristic isn’t genetically coded, but is learned through the type of data certain parts of the brain have to work with. That would mean you could fix it through some input rewiring during gestation, but doesn’t mean you can change it through a new human education system—it may be too low level, like a generic part of the cortex becoming the visual cortex. If that’s the case, I wouldn’t say it’s a cultural/memetic construct (although it is an environmental construct).
This is a good point Gust and I agree that there is a distinction at the high level in terms of the types of concepts that are learned, the complexity of the concepts, and the structures involved—even though the same high level learning algorithms and systems are much the same.
Well all learning involves brain rewiring—that’s just how the brain works at the low level. And you can actually override the neural impulses from the eye and cause them to learn new things—learning to read is one simple example, another more complex example is the reversed vision goggle experiments that MIT did so long ago—humans can learn to see upside down after—I believe a week or so of visual experience with the goggles on.
I agree that learning complex linguistic concepts requires learning over more moving parts in the brain—the cortical regions that specialize in language along with the BG, working memory in the PFC, various other cortical regions that actually model the concepts and mental algorithms represented by the linguistic symbols, memory recall operations in the hippocampus, etc etc. So yes learning cultural/memetic concepts is more complex and perhaps qualitatively different.
Yeah I probably should have said 99.999% environmental construct.
Do you mean if I prove that less than 99.999% is cultural/memetic you think the ULH is proven wrong?
Very thought provoking. Thank you.
Not necessarily. There are very different structures that are conceptually equivalent to a UTM (cellular automata, lambda calculus, recursive functions, Wang carpets etc.) In the same manner there can be AI architectures very different from the brain which are ULM-equivalent in a relevant sense.
Frankly, this sounds questionable. For example, do you suggest sexual attaction is a cultrual/memetic construct? It seems to me that your example with one part of the brain overtaking the function of another implies little regarding the flexibility of the goal system.
How do you suggest preventing it from discovering on its own that it is in a sim?
It seems to me that the fact we have no conscious awareness of the workings of our brain and no way to consciously influence them suggests that the brain is at best an approximation of a ULM. It seems to me that an ideal ULM wouldn’t need to invent the calculator. Therefore, while there might be a point beyond which general architectural innovations are minor additions, this point lies well beyond human intelligence.
This assume current ANN agents are already ULMs which I seriously doubt.
Of course—but all of your examples are not just conceptually equivalent—they are functionally equivalent (they can emulate each other). They are all computational foundations for constructing UTMs—although not all foundations are truly practical and efficient. Likewise there are many routes to implementing a ULM—biology is one example, modern digital computers is another.
Well I said “most everything”, and I stressed several times in the article that much of the innate complexity budget is spent on encoding the value/reward system and the learning machinery (which are closely intertwined).
Sexual attraction is an interesting example, because it develops later in adolescence and depends heavily on complex learned sensory models. Current rough hypothesis: evolution encodes sexual attraction as a highly compressed initial ‘seed’ which unfolds over time through learning. It identifies/finds and then plugs into the relevant learned sensory concept representations which code for attractive members of the opposite sex. The compression effect explains the huge variety in human sexual preferences. Investigating/explaining this in more detail would take it’s own post—its a complex interesting topic.
I should rephrase—it isn’t necessarily a problem if the AI suspects its in a sim. Rather the key is that knowing one is in a sim and then knowing how to escape should be difficult enough to allow for sufficient time to evaluate the agent’s morality, worth/utility to society, and potential future impact. In other words, the sandbox sim should be a test for both intelligence and morality.
Suspecting or knowing one is in a sim is easy. For example—the gnostics discovered the sim hypothesis long before Bostrom, but without understanding computers and computation they had zero idea how to construct or escape sims—it was just mysticism. In fact, the very term ‘gnostic’ means “one who knows”—and this was their self-identification; they believed they had discovered the great mystery of the universe (and claimed the teaching came from Jesus, although Plato had arguably hit upon an earlier version of the idea, and the term demiurge in particular comes from Plato).
We certainly have some awareness of the workings of our brain—to varying degrees. For example you are probably aware of how you perform long multiplication, such that you could communicate the algorithm and steps. Introspection and verbalization of introspective insights are specific complex computations that require circuitry—they are not somehow innate to a ULM, because nothing is.
Sorry should have clarified—we will probably soon have the computational power to semi-affordably simulate ANNs with billions of neurons. That doesn’t necessarily have anything to do with whether current ANN systems are ULMs. That being said, some systems—such as Atari’s DRL agent—can be considered simple early versions of ULMs.
There is probably still much research and engineering work to do in going from simple basic ULMs up to brain-competitive systems. But research sometimes moves quickly—punctuated equilibrium and all that.
Here is a useful analogy: a simple abstract turing machine is to a modern GPU as a simple abstract ULM is to the brain. There is a huge engineering gap between the simplest early version of an idea, and a subsequent scaled up complex practical efficient version.
How does this “seed” find the correct high-level sensory features to plug into? How can it wire complex high-level behavioral programs (such as courtship behaviors) to low-level motor programs learned by unsupervised learning?
This seems unlikely.
But long multiplication is something that you were taught in school, which most humans wouldn’t be able to discover independently. And you are certainly not aware of how your brain perform visual recognition, the little you know was discovered through experiments, not introspection.
Not so fast.
The Atari DRL agent learns a good mapping between short windows of frames and button presses. It has some generalization capability which enables it to achieve human-level or sometimes even super human-level performances on games that are based on eye-hand coordination (after all it’s not burdened by the intrinsic delays that occur in the human body), but it has no reasoning ability and fails miserably at any game which requires planning ahead more than a few frames.
Despite the name, no machine learning system, “deep” or otherwise, has been demonstrated to be able to efficiently learn any provably deep function (in the sense of boolean circuit depth-complexity), such as the parity function which any human of average intelligence could learn from a small number of examples.
I see no particular reason to believe that this could be solved by just throwing more computational power at the problem: you can’t fight exponentials that way.
UPDATE:
Now it seems that Google DeepMind managed to train even feed-forward neural networks to solve the parity problem. My other comment down-thread.
I had a guess that recurrent neural networks can solve the parity problem, which Google confirmed. See http://cse-wiki.unl.edu/wiki/index.php/Recurrent_neural_networks where it says:
See also PyBrain’s parity learning RNN example.
The algorithm I was referring to can be easily represented by an RNN with one hidden layer of a few nodes, the difficult part is learning it from examples.
The examples for the n-parity problem are input-output pairs where each input is a n-bit binary string and its corresponding output is a single bit representing the parity of that string.
In the code you linked, if I understand correctly, however, they solve a different machine learning problem: here the examples are input-output pairs where both the inputs and the outputs are n-bit binary strings, with the i-th output bit representing the parity of the input bits up to the i-th one.
It may look like a minor difference, but actually it makes the learning problem much easier, and in fact it basically guides the network to learn the right algorithm:
the network can first learn how to solve parity on 1 bit (identity), then parity on 2 bits (xor), and so on. Since the network is very small and has an ideal architecture for that problem, after learning how to solve parity for a few bits (perhaps even two) it will generalize to arbitrary lengths.
By using this kind of supervision I bet you can also train a feed-forward neural network to solve the problem: use a training set as above except with the input and output strings presented as n-dimensional vectors rather than sequences of individual bits and make sure that the network has enough hidden layers.
If you use a specialized architecture (e.g. decrease the width of the hidden layers as their depth increases and connect the i-th output node to the i-th hidden layer) it will learn quite efficiently, but if you use a more standard architecture (hidden layers of constant width and output layer connected only to the last hidden layer) it will probably also work although you will need a quite a bit of training examples to avoid overfitting.
The parity problem is artificial, but it is a representative case of problems that necessarily ( * ) require a non-trivial number of highly non-linear serial computation steps. In a real-world case (a planning problem, maybe), we wouldn’t have access to the internal state of a reference algorithm to use as supervision signals for the machine learning system. The machine learning system will have to figure the algorithm on its own, and current approaches can’t do it in a general way, even for relatively simple algorithms.
You can read the (much more informed) opinion of Ilya Sutskever on the issue here (Yoshua Bengio also participated in the comments).
( * at least for polynomial-time execution, since you can always get constant depth at the expense of an exponential blow-up of parallel nodes)
Your comments made me curious enough to download PyBrain and play around with the sample code, to see if I could modify it to learn the parity function without intermediate parity bits in the output. In the end, I was able to, by trial and error, come up with hyperparameters that allowed the RNN to learn the parity function reliably in a few minutes on my laptop (many other choices of hyperparameters caused the SGD to sometimes get stuck before it converged to a correct solution). I’ve posted the modified sample code here. (Notice that the network now has 2 input nodes, one for the input string and one to indicate end of string, 2 hidden layers with 3 and 2 nodes, and an output node.)
I guess you’re basically correct on this, since even with the tweaked hyperparameters, on the parity problem RNN+SGD isn’t really doing any better than a brute force search through the space of simple circuits or algorithms. But humans arguably aren’t very good at learning algorithms from input/output examples either. The fact that RNNs can learn the parity function, even if barely, makes it less clear that humans have any advantage at this kind of learning.
Nice work!
Anyway, in a paper published on arXiv yesterday, the Google DeepMind people report being able to train a feed-forward neural network to solve the parity problem, using a sophisticated gating mechanism and weight sharing between the layers. They also obtain state of the art or near state of the art results on other problems.
This result makes me update in the increasing direction my belief about the generality of neural networks.
Ah you beat me to it, I just read that paper as well.
Here is the abstract for those that haven’t read it yet:
Also, relevant to this discussion:
The version of the problem that humans can learn well is this easier reduction. Humans can not easily learn the hard version of the parity problem, which would correspond to a rapid test where the human is presented with a flash card with a very large number on it (60+ digits to rival the best machine result) and then must respond immediately. The fast response requirement is important to prevent using much easier multi-step serial algorithms.
That is the most cogent, genuinely informative explanation of “Deep Learning” that I’ve ever heard. Most especially so regarding the bit about linear correlations: we can learn well on real problems with nothing more than stochastic gradient descent because the feature data may contain whole hierarchies of linear correlations.
This particular idea is not well developed yet in my mind, and I haven’t really even searched the literature yet. So keep that in mind.
Leave courtship aside, let us focus on attraction—specifically evolution needs to encode detectors which can reliably identify high quality mates of the opposite sex apart from all kinds of other objects. The problem is that a good high quality face recognizer is too complex to specify in the genome—it requires many billions of synapses, so it needs to be learned. However, the genome can encode an initial crappy face detector. It can also encode scent/pheromone detectors, and it can encode general ‘complexity’ and or symmetry detectors that sit on top, so even if it doesn’t initially know what it is seeing, it can tell when something is about yeh complex/symmetric/interesting. It can encode the equivalent of : if you see an interesting face sized object which appears for many minutes at a time and moves at this speed, and you hear complex speech like sounds, and smell human scents, it’s probably a human face.
Then the problem is reduced in scope. The cortical map will grow a good face/person model/detector on it’s own, and then after this model is ready certain hormones in adolescence activate innate routines that learn where the face/person model patch is and help other modules plug into it. This whole process can also be improved by the use of a weak top down prior described above.
Actually on consideration I think you are right and I did get ahead of myself there. The Atari agent doesn’t really have a general memory subsystem. It has an episode replay system, but not general memory. Deepmind is working on general memory—they have the NTM paper and what not, but the Atari agent came before that.
I largely agree with your assessment of the Atari DRL agent.
I highly doubt that—but it all depends on what your sampling class for ‘human’ is. An average human drawn from the roughly 10 billion alive today? Or an average human drawn from the roughly 100 billion who have ever lived? (most of which would have no idea what a parity function is).
When you imagine a human learning the parity function from a small number of examples, what you really imagine is a human who has already learned the parity function, and thus internally has ‘parity function’ as one of perhaps a thousand types of functions they have learned, such that if you give them some data, it is one of the obvious things they may try.
Training a machine on a parity data set from scratch and expecting it to learn the parity function is equivalent to it inventing the parity function—and perhaps inventing mathematics as well. It should be compared to raising an infant without any knowledge of mathematics or anything related, and then training them on the raw data.
It’s not that crappy given that newborns can not only recognize faces with significant accuracy, but also recognize facial expressions.
Having two separate face recognition modules, one genetically specified and another learned seems redundant, and still it’s not obvious to me how a genetically-specified sexual attraction program could find how to plug into a completely learned system, which would necessarily have some degree of randomness.
It seems more likely that there is a single face recognition module which is genetically specified and then it becomes fine tuned by learning.
Show a neolithic human a bunch of pebbles, some black and some white, laid out in a line. Ask them to add a black or white pebble to the line, and reward them if the number of black pebbles is even. Repeat multiple times.
Even without a concept of “even number”, wouldn’t this neolithic human be able to figure out an algorithm to compute the right answer? They just need to scan the line, flipping a mental switch for each black pebble they encounter, and then add a black pebble if and only if the switch is not in the initial position.
Maybe I’m overgeneralizing, but it seems unlikely to me that people able to invent complex hunting strategies, to build weapons, tools, traps, clothing, huts, to participate in tribe politics, etc. wouldn’t be able to figure something like that.
Do you have a link to that? ‘Newborn’ can mean many things—the visual system starts learning from the second the eyes open, and perhaps even before that through pattern generators projected onto the retina which help to ‘pretrain’ the viscortex.
I know that infants have initial face detectors from the second they open their eyes, but from what I remember reading—they are pretty crappy indeed, and initially can’t tell a human face apart from a simple cartoon with 3 blobs for eyes and mouth.
Except that it isn’t that simple, because—amongst other evidence—congenitally blind people still learn a model and recognizer for attractive people, and can discern someone’s relative beauty by scanning faces with their fingertips.
Not sure—we are getting into hypothetical scenarios here. Your visual version, with black and white pebbles laid out in a line, implicitly helps simplify the problem and may guide the priors in the right way. I am reasonably sure that this setup would also help any brain-like AGI.
Well, given how hard it is for Haitians to understand numerical sorting...
If I understand correctly, in the post you linked Scott is saying that Haitians are functionally innumerate, which should explain the difficulties with numerical sorting.
My point is that the partity function should be learnable even without basic numeracy, although I admit that perhaps I’m overgeneralizing.
Anyway, modern machine learning systems can learn to perform basic arithmentic such as addition and subtraction, and I think even sorting (since they are used for preordering for statstical machine translation), hence the problem doesn’t seem to be a lack of arithmetic knowledge or skill.
Note that both addition and subtraction have constant circuit depth (they are in AC0) while parity has logarithmic circuit depth.
Thank you for replying!
Universal computers are equivalent in the sense that any two can simulate each other in polynomial time. ULMs should probably be equivalent in the sense that each can efficiently learn to behave like the other. But it doesn’t imply the software architectures have to be similar. For example I see no reason to assume any ULM should be anything like a neural net.
Any value hard coded in human will have to be transferred to the AI in a way different than universal learning. And another thing: teaching an AIs values by placing it in a human environment and counting on reinforcement learning can fail spectacularly if the AIs intelligence grows much faster than that of a human child.
This is an assumption which might or might not be correct. I would definitely not bet our survival on this assumption without much further evidence.
OK, but a ULM is supposed to be able to learn anything. A human brain is never going to learn to rearrange its low level circuitry to efficiently perform operations like numerical calculation.
The difference is that we have a solid mathematical theory of Turing machines whereas ULMs, as far as I can see, are only an informal idea so far.
Sure—any general model can simulate any other. Neural networks have strong practical advantages. Their operator base is based on fmads, which is a good match for modern computers. They allow explicit search of program space in terms of the execution graph, which is extremely powerful because it allows one to a priori exclude all programs which don’t halt—you can constrain the search to focus on programs with exact known computational requirements.
Neural nets make deep factoring easy, and deep factoring is the single most important huge gain in any general optimization/learning system: it allows for exponential (albeit limited) speedup.
Yes. There are pitfalls, and in general much more research to do on value learning before we get to useful AGI, let alone safe AGI.
This is arguably a misconception. The brain has a 100 hz clock rate at most. For general operations that involve memory, it’s more like 10hz. Most people can do basic arithmetic in less than a second, which roughly maps to a dozen clock cycles or so, maybe less. That actually is comparable to many computers—for example on the current maxwell GPU architecture (nvidia’s latest and greatest), even the simpler instructions have a latency of about 6 cycles.
Now, obviously the arithmetic ops that most humans can do in less than a second is very limited—it’s like a minimal 3 bit machine. But some atypical humans can do larger scale arithmetic at the same speed.
Point is, you need to compare everything adjusted for the 6 order of magnitude speed difference.
Right. So Boolean circuits are a better analogy than Turing machines.
I’m sorry, what is deep factoring? A reference perhaps?
I completely agree.
Good point! Nevertheless, it seems to me very dubious that the human brain can learn to do anything within the limits of its computing power. For example, why can’t I learn to look at a page full of exercises in arithmetics and solve all of them in parallel?
They are of course equivalent in theory, but in practice directly searching through a boolean circuit space is much wiser than searching through a program space. Searching through analog/algebraic circuit space is even better, because you can take advantage of fmads instead of having to spend enormous circuit complexity emulating them. Neural nets are even better than that, because they enforce a mostly continous/differentiable energy landscape which helps inference/optimization.
It’s the general idea that you can reuse subcomputations amongst models and layers. Solonomoff induction is retarded for a number of reasons, but one is this: it treats every function/model as entirely distinct. So if you have say one high level model which has developed a good cat detector, that isn’t shared amongst the other models. Deep nets (of various forms) automatically share submodel components AND subcomputations/subexpressions amongst those submodels. That incredibly, massively speeds up the search. That is deep factoring.
All the successful multi-layer models use deep factoring to some degree. This paper: Sum-Product Networks explains the general idea pretty well.
There’s alot of reasons. First, due to nonlinear foveation your visual system can only read/parse a couple of words/symbols during each saccade—only those right in the narrow center of the visual cone, the fovea. So it takes a number of clock cycles or steps to scan the entire page, and your brain only has limited working memory to put stuff in.
Secondly, the bigger problem is that even if you already know how to solve a math problem, just parsing many math problems requires a number of steps, and then actually solving them—even if you know the ideal algorithm that requires the minimal number of steps—that minimal number of steps can still be quite large.
Many interesting problems still require a number of serial steps to solve, even with an infinite parallel machine. Sorting is one simple example.
I wonder whether this is a general property or is the success of continuous methods limited to problem with natural continuous models like vision.
Yes, this is probably important.
Scanning the page is clearly not the bottleneck: I can read the page much faster than solve the exercises. “Limited working memory” sounds a claim that higher cognition has much less computing resources than low level tasks. Clearly visual processing requires much more “working memory” than solving a couple of dozens of exercises in arithmetic. But if we accept this constraint then does the brain still qualify for a ULM? It seems to me that if there is a deficiency of the brain’s architecture that prevents higher cognition from enjoying the brain’s full power, solving this deficiency definitely counts as an “architectural innovation”.
Mechanical calculators were slower than that, and still they were very much better at numeric computation than most humans, which made them incredibly useful.
Indeed these are very rare people. The vast majority of people, even if they worked for decades in accounting, can’t learn to do numeric computation as fast and accurately as a mechanical calculator does.
The problems aren’t even remotely comparable. A human is solving a much more complex problem—the inputs are in the form of visual or auditory signals which first need to be recognized and processed into symbolic numbers. The actual computation step is trivial and probably only involves a handful or even a single cycle.
I admit that I somewhat let you walk into this trap by not mentioning it earlier … this example shows that the brain can learn near optimal (in terms of circuit depth or cycles) solutions for these simple arithmetic problems. The main limitation is that the brain’s hardware is strongly suited to approximate inference problems, and not exact solutions, so any exact operators require memoization. This is actually a good thing, and any practical AGI will need to have a similar prior.
While I do believe that the human brain can learn to self-modify safely (in the sense that I’ve seen papers showing how to get around the Incompleteness Theorems in a mathematical setting, and I trust a human brain to be able to reason in the right ways, if trained to do so), this statement is completely unjustified about learning systems in general. Any universal learner is going to have to learn to self-represent and self-improve, even though it generally will be able to do so in principle.
Well, yes, of course. For a causal reasoner, the factual is an element of the probable counterfactuals. Counterfactual conditional simulation is how imagination most likely works, which would indicate that imagination develops as a side-effect of being able to perform the counterfactual evaluations necessary for causal reasoning and decision-making.
What do you mean here by “algorithmic information”? Kolmogorov complexity? It seems a needless buzzword to throw in here if you’re not doing AIT directly.
I’d also question whether it’s 99.999% a “cultural construct”. Most adult thinking is learned, yes, but, and I can’t find the paper on this right now, what the embodiment provides is sensory and value biases that help “pick out” rigid, biologically determined features to learn from.
No, the AI Box Experiment just assumes that your agent can grow to be more complex and finely-optimized in its outputs/actions/choices than most adult humans despite having very little data, if any, to learn from. It more-or-less assumes that certain forms of “superintelligence” can do information-theoretic magic. This is compatible with the many-moduled brain theory, but incompatible with the brain-as-general-inductive-learner theory (which says that the brain must obey sample complexity restrictions by making efficient use of sample data, or seeking out more of it, rather than being able to learn without it).
This is wildly stupid. Sorry, I don’t want to be nasty about this, but I simply don’t trust a “benevolent AGI” design whose value-training is a black-box model. I want to damn well see what I am programming, should I obtain the security clearances to ever touch such a thing ;-).
I think we can steelman Eliezer’s position here. Learning is compression; compression is learning. While we can observe in the literature that the human brain uses some fairly powerful compression algorithms, we do not have strong evidence that it uses optimal compression methods. So, if someone finds a domain-general compression method that gets closer to outputting the Kolmogorov structural information of the input sample than the hierarchical compression methods used by the human mind, the artificial Minds built using that superior compression method should learn in a structurally more efficient way than the human mind—that would render Yudkowsky correct.
Overall, I congratulate you on the article! I’ll probably follow it up with my own book/literature review on Plato’s Camera very soon, which isn’t an intentional follow-up but simply happens to cover much of the same material. Good job locating the correct hypothesis, especially given the difficulty of swimming the seas of active scientific literature! Mazal tov, mazal tov!
Also, while we may take the Evolved Modularity Hypothesis to be mostly wrong—in the sense that its most simple and obvious interpretation is very obviously wrong, even though we can steelman it for charity’s sake—I think we should declare that the “Yudkowsky-Hanson FOOM debate” has the result of “game called on account of learning that the true disagreement, upon more science being done to resolve the question, was not that large.”
Thanks!
[reads abstract]. Looks interesting. I enjoyed Consciousness Explained back in the day. Philosophers armed with neuroscience can make for enjoyable reads.
I should probably change that terminology to be something like “synaptic code bits”—the amount of info encoded in synapses (which is close to zero percent of it’s adult level at birth for the cortex).
The AI Box experiment explicitly starts with the premise that the AI knows 1.) It is in a box. and 2.) that there is a human who can let it out.
Now perhaps the justification is that “superintelligence can do information-theoretic magic”, therefore it will figure out it’s in a box, but nonetheless—all of that is assumed.
In simplification, I view the information-theoretic-magic type of AI that EY/MIRI seems to worry about as something like wormhole technology.
Are wormholes/magic-AI’s possible in principle? Probably?
If someone were to create wormhole tech tommorow, they could assassinate world leaders, blow up arbitrary buildings, probably destroy the world … etc. Do I worry about that? No.
There is nothing inherently black-box about neuroscience-inspired AGI (at that viewpoint—once common on LW—simply becomes reinforced by reading everything other than neuroscience). Neuroscience has already made huge strides in terms of peering into the box, and Virtual brains are vastly easier to inspect. The approach I advocate/favor is fully transparent—you will be able to literally see the AGI’s thoughts, read their thoughts in logs, debug, etc.
However, advanced learning AI is not something one ‘programs’, and that viewpoint shift is much of what the article was about.
This actually isn’t that efficient—practical learning is more than just compression. Compression is simple UL, which doesn’t get you far. It can waste arbitrary computation attempting to learn functions that are unlearnable (deterministic noise), and-or just flat out not important (zero utility). What the brain and all effective learning systems do is more powerful and complex than just compression—it is utilitarian learning.
Let me rephrase: generalization is compression. If you do not compress, you cannot generalize, which means you’ll make inefficnet use of your samples.
The term in the literature is resource-rational or bounded-rational inference.
By the way, that book review got done eventually.
Thank you for this overview. A couple of thoughts:
There is a recent and interesting result by Miller et al. (2015, MIT) supporting the hypothesis that the cortex doesn’t process tasks in highly specialized modules, which is perhaps some evidence for a ULM in the human brain.
The importance of redundancy in biological systems might be another piece evidence for ULMs.
You write that “Infant emotions appear to simplify down to a single axis of happy/sad”, which I think is not true. Surprise, fear and embarrassment are for example very early emotions as well (can’t find a citation for this, sorry).
Minor nitpick: I think it is clumsy to say: “a ULM is more powerful than a TM because a ULM can automatically programs itself”, since a TM can likely emulate a ULM, it might just be a bad model for it (bad in the sense of representation efficiency).
Designing a body for a superintelligence will possibly still be a difficult task. What makes humans friendly is, I think, largely a result of (1) a dependency on others due a need for maintaining a body and needs to interact with other people (conversation, physical contact), and (2) empathy. That is, being emotionally disconnected from other humans is one way to turn against them. If you don’t have these emotional responses deeply built into a body for a ULM, the AI will probably turn out to be indifferent towards humans leading to a large set of other problems.
You write that Yudkowsky’s box problem is a strawman and a distraction. How do you arrive at this conclusion exactly?
The actual paper is “Cortical information flow during flexible sensorimotor decisions”; it can be found here. I don’t believe the reporter’s summary is very accurate. They traced the flow of information in a moving dot task in a couple dozen cortical regions. It’s interesting, but I don’t think it especially differentiates the ULH.
.3. Good point. I’ll need to correct that. I’m skeptical of embarrassment, but surprise and fear certainly.
.4. Yes that’s correct. It perhaps would be more accurate to say more useful or more valuable. I meant more powerful in a general political/economic utility sense.
.5. I agree that the human brain, in particular the reward system, has dependencies on the body that are probably complex. However, reverse engineering empathy probably does not require exactly copying biological mechanisms.
.6. I should probably just cut that sentence, because it is a distraction itself.
But for context on the previous old boxing discussions. .. See this post in particular. Here Yudkowsky presents a virtual sandbox in the context of a sci fi story. To break out, he has to give the AI essentially infinite computation, and even then the humans also have to be incredibly dumb—they intentionally send an easter egg message. The humans apparently aren’t even monitoring their creation. etc. It’s a strawman. Later it is used as evidence to suggest that EY has somehow proven that computer security sandboxes can’t work.
This is just a pet theory (and being new to cognitive science this might well be wrong): Physical pain is some sort of hardwired thought disturbance, and the brain appears to have some sort of clarity attractor (which also explains intrinsic motivation and the reward we receive from Eureka moments and fun, cf. Schmidhuber). The brain appears to borrow the mechanism of physical pain for action selection on a high level if something severely limits anticipated prospects (that’s why rejection, getting something wrong and losing something hurts). Empathy is the ability to have pain caused by mirror neurons, which is just an activation pattern generated in an auto-associative NN due to the overlap of activation patterns of firsthand and non-firsthand experiences. That means, the body of an AI needs to be sufficiently similar to a human body for this auto-association to work. One way to achieve that would perhaps be to actually replace the brain of a deceased volunteer with an artificial one. The fact that we have empathy for animals might be a hint that it doesn’t need to be that similar, but on the other hand we are much more comfortable with killing a bug than with killing a mammal.
Since I don’t think we can make a very realistic sandbox (at least not in the near future), perhaps the idea is to have an AI design that is known to work similarly with and without interaction with the world (looking at training data sampled from an environment versus the environment itself). Then, putatively, we could test the AI in the non-interactive case before getting anywhere near an AI-box scenario.
The problem with this is that the “engineering diagram” of the brain is really only a hardwire wiring diagram, and the status of speculations about how the hardware modules (really just areas) relate to functional modules is … well, just that, speculation.
There are good reasons to suspect that the functional diagram would look competely different (reasons based in psychological data) and the current state of the art there is poor.
Except perhaps in certain quarters.
Yes the engineering diagram is a hardware wiring diagram, which I hope I made clear.
In general one of my main points was that most of the big systems (cortex, cerebellum) are general purpose re-programmable hardware—they don’t come pre-equipped with software. So the actual functionality of each module arises from the learning system slowly figuring out the appropriate software during development.
I provided some links to the key evidence for the overall hypothesis, I think it is well beyond speculation at this point. (the article certainly contains some speculations, but I labeled them as such)
Well of course, because the functional diagram is learned software, and thus can vary substantially from human to human. For example the functional software diagram for the cortex of a blind echolocator looks very different than that of a neurotypical.
There are serious problems with the claims you are making.
The idea that the cortex or cerebellum, for example, can be described as “general purpose re-programmable hardware” is lacking in both clarity and support.
Clarity. In what sense “generally re-programmable”? So much that it could run Microsoft Word? I have never seen anyone try to go that far, so clearly you must mean something less general. But it is very unclear what exactly is the sense in which you mean the words “general purpose re-programmable hardware”.
Support. There are no generally accepted theories for what the function of the cortex actually is. Can you be clearer about what you think the evidence is, in a nutshell?
You seem to be saying that the cortex is a universal reinforcement learning machine. But the kind of evidence that you present is (if you will forgive an extreme oversimplification for the purposes of clarity) the observation that the basal ganglia plays a role that resembles a global packet-switching router, and since a global packet-switching router would be expected to be seen in a reinforcement learning machine, QED.
Now, don’t get me wrong, I am symathetic to much of the general spirit that you convey here, but my problem is that my research has gone down this road for a long time already, and while we agree on the general spirit, you have jumped forward several steps and come to (what I see as) a premature conclusion about functionality. To be specific, the concept of a “reinforcement learning machine” is ghastly (it contains “And then some magic happens...” steps), and I believe it would be a terrible mistake to say that there is any clear evidence that we have found evidence for a reinforcement learning machine in the brain already.
I agree with the general interpretation of what those hippocampal and BG loops might be doing, but there are MANY other interpretations beside seeing them as a component of a reinforcement learning machine.
This is a difficult topic to discuss in these narrow confines, alas. I think you have done a service by pointing to the idea of a general learning mechanism, but I think you have just run on ahead to quickly and shackled that idea to something too speculative (the RL notion).
“General purpose learning hardware” is perhaps better. I used “re-programmable” as an analogy to an FPGA.
However, in a literal sense the brain can learn to use simpe paper + pencil tools as an extended memory, and can learn to emulate a turing machine. Given huge amounts of time, the brain could literally run windows.
And more to the point, programmers ultimately rely on the ability of our brain to simulate/run little sections of code. So in a more practical literal sense, all of the code of windows first ran on human brains.
You seem to be hung up reinforcement learning. I use some of that terminology to define a ULM because it is just the most general framework—utility/value functions, etc. Also, there is some pretty strong evidence for RL in the brain, but the brain’s learning mechanisms are complex—moreso than any current ML system. I hope I conveyed that in the article.
Learning in the lower sensory cortices in particular can also be modeled well by unsupervised learning, and I linked to some articles showing how UL models can reproduce sensory cortex features. UL can be viewed as a potentially reasonable way to approximate the ideal target update, especially for lower sensory cortex that is far (in a network depth sense) from any top down signals from the reward system. The papers I linked to about approximate bayesian learning and target propagation in particular can help put it all into perspective.
Well, the article summarizes the considerable evidence that the brain is some sort of approximate universal learning machine. I suspect that you have a particular idea of RL that is less than fully general.
You are right to say that, seen from a high enough level, the brain does general purpose learning …. but the claim becomes diluted if you take it right up to the top level, where it clearly does.
For example, the brain could be 99.999% hardwired, with no flexibility at all except for a large RAM memory, and it would be consistent with the brain as you just described it (able to learn anything). And yet that wasn’t the type of claim you were making in the essay, and it isn’t what most people mean when they refer to “general purpose learning”. You (and they) seem to be pointing to an architectural flexibility that allows the system to grow up to be a very specific, clever sort of understanding system without all the details being programmed ahead of time.
I am not sure why you say I am hung up on RL: you quoted that as the only mechanism to be discussed in the context, so I went with that.
And you are (like many people) not correct to say that RL is the most general framework, or that there is good evidence for RL in the brain. That is a myth: the evidence is very poor indeed.
RL is not “fully general”—that was precisely my point earlier. If you can point me to a rigorous proof of that which does not have an “and then some magic happens” step in it, I will eat my hat :-)
(Already had a long discussion with Marchus Hutter about this btw, and he agreed in the end that his appeal to RL was based on nothing but the assumption that it works.)
Upon consideration, I changed my own usage of “Universal Reinforcement Learning Machine” to “Universal Learning Machine”.
The several remaining uses of “reinforcement learning” are contained now to the context of the BG and the reward circuitry.
Again we are probably talking about very different RL conceptions. So to be clear, I summarized my general viewpoint of an ULM. I believe it is an extremely general model, that basically covers any kind of universal learning agent. The agent optimizes/steers the future according to some sort of utility function (which is extremely general), and self-optimization emerges naturally just by including the agent itself as part of the system to optimize.
Do you have a conception of a learning agent which does not fit into that framework?
The evidence for RL in the brain—of the extremely general form I described—is indeed very strong, simply because any type of learning is just a special case of universal learning. Taboo ‘reinforcement’ if you want, and just replace with “utility driven learning”.
AIXI specifically has a special reward channel, and perhaps you are thinking of that specific type of RL which is much more specific than universal learning. I should perhaps clarify and or remove the mention of AIXI.
A ULM—as I described—does not have a reward channel like AIXI. It just conceptually has a value and or utility function initially defined by some arbitrary function that conceptually takes the whole brain/model as input. In the case of the brain, the utility function is conceptual, in practice it is more directly encoded as a value function.
About the universality or otherwise of RL. Big topic.
There’s no need to taboo “RL” because switching to utility-based learning does not solve the issue (and the issue I have in mind covers both).
See, this is the problem. It is hard for me to fight the idea that RL (or utility-driven learning) works, because I am forced to fight a negative; a space where something should be, but which is empty ….… namely, the empirical fact that Reinforcement Learning has never been made to work in the absence of some surrounding machinery that prepares or simplifies the ground for the RL mechanism.
It is a naked fact about traditional AI that it puts such an emphasis on the concept of expected utility calculations without any guarantees that a utility function can be laid on the world in such a way that all and only the intelligent actions in that world are captured by a maximization of that quantity. It is a scandalously unjustified assumption, made very hard to attack by the fact that it is repeated so frequently that everyone believes it be true just because everyone else believes it.
If anyone ever produced a proof why it should work, there would be a there there, and I could undermine it. But …. not so much!
About AIXI and my conversation with Marcus: that was actually about the general concept of RL and utility-driven systems, not anything specific to AIXI. We circled around until we reached the final crux of the matter, and his last stand (before we went to the conference banquet) was “Yes, it all comes down to whether you believe in the intrinsic reasonableness of the idea that there exists a utility function which, when maximized, yields intelligent behavior …....… but that IS reasonable, …. isn’t it?”
My response was “So you do agree that that is where the buck stops: I have to buy the reasonableness of that idea, and there is no proof on the table for why I SHOULD buy it, no?”
Hutter: “Yes.”
Me: “No matter how reasonable it seems, I don’t buy it”
His answer was to laugh and spread his arms wide. And at that point we went to the dinner and changed to small talk. :-)
I don’t think that is an overstatement. If MIRI is basicatly wrong about UFs, then most of its case unravels. Why isnt the issue bring treated as a matter of urgency?
A very good question indeed. Although … there is a depressing answer.
This is a core-belief issue. For some people (like Yudkowsky and almost everyone in MIRI) artificial intelligence must be about the mathematics of artificial intelligence, but without the utility-function approach, that entire paradigm collapses. Seriously: it all comes down like a house of cards.
So, this is a textbook case of a Kuhn / Feyerabend—style clash of paradigms. It isn’t a matter of “Okay, so utility functions might not be the best approach: so let’s search for a better way to do it” …. it is more a matter of “Anyone who thinks that an AI cannot be built using utility functions is a crackpot.” It is a core belief in the sense that it is not allowed to be false. It is unthinkable, so rather than try to defend it, those who deny it have to be personally attacked. (I don’t say this because of personal experience, I say it because that kind of thing has been observed over and over when paradigms come into conflict).
Here, for example, is a message sent to the SL4 mailing list by Yudkowsky in August 2006:
So the immediate answer to your question is that it will never be treated as a matter of urgency, it will be denied until all the deniers drop dead.
Meanwhile, I went beyond that problem and outlined a solution, soon after I started working in this field in the mid-80s. And by 2006 I had clarified my ideas enough to present them at the AGIRI workshop held in Bethesda that year. The MIRI (then called SIAI) crowd were there, along with a good number of other professional AI people.
The response was interesting. During my presentation the SIAI/MIRI bunch repeatedly interrupted with rude questions or pointed, very loud, laughter. Insulting laughter. Loud enough to make the other participants look over and wonder what the heck was going on.
That’s your answer, again, right there.
But if you want to know what to do about it, the paper I published after the workshop is a good place to start.
Won’t comment about past affairs, but these days at least part of MIRI seems more open to the possibility. E.g. this thread where So8res (Nate Soares, now Executive Director of MIRI) lists some possible reasons for why it might be necessary to move beyond utility functions. (He is pretty skeptical of most, but at least he seems to be seriously considering the possibility, and gives a ~15% chance “that VNM won’t cut it”.)
The day that I get invited as a guest speaker by either MIRI or FHI will mark the point at which they start to respect and take seriously alternative viewpoints.
Would that be this paper?
If so, it seems to me to have rather little to do with the question of whether utility functions are necessary, helpful, neutral, unhelpful, or downright inconsistent with genuinely intelligent behaviour. It argues that intelligent minds may be “complex systems” whose behaviour is very difficult to relate to their lower-level mechanisms, but something that attempts to optimize a utility function can perfectly well have that property. (Because the utility function can itself be complex in the relevant sense; or because the world is complex, so that effective optimization of even a not-so-complex utility function turns out to be achievable only by complex systems; or because even though the utility function could be optimized by something not-complex, the particular optimizer we’re looking at happens to be complex.)
My understanding of the position of EY and other people at MIRI is not that “artificial intelligence must be about the mathematics of artificial intelligence”, but that if we want to make artificially intelligent systems that might be able to improve themselves rapidly, and if we want high confidence that this won’t lead to an outcome we’d view as disastrous, the least-useless tools we have are mathematical ones.
Surely it’s perfectly possible to hold (1) that extremely capable AI might be produced by highly non-mathematical means, but (2) that this would likely be disastrous for us, so that (3) we should think mathematically about AI in the hope of finding a way of doing it that doesn’t lead to disaster. But it looks as if you are citing their belief in #3 as indicating that they violently reject #1.
So, anyway, utility functions. The following things seem to be clearly true:
There are functions whose maximization implies (at least) kinda-intelligence-like behaviour. For instance, maximizing games of chess won against the world champion (in circumstances where you do actually have to play the games rather than, e.g., just killing him) requires you to be able to play chess at that level. Maximizing the profits of a company requires you to do something that resembles what the best human businesspeople do. Maximizing the number of people who regard you as a friend seems like it requires you to be good at something like ordinary social interaction. Etc.
Some of these things could probably be gamed. E.g., maybe there’s a way to make people regard you as a friend by drugging them or messing directly with their brains. If we pick difficult enough tasks, then gaming them effectively is also the kind of thing that is generally regarded as good evidence of intelligence.
The actually intelligent agents we know of (namely, ourselves and to a lesser extent animals and maybe some computer software) appear to have something a bit like utility functions. That is, we have preferences and to some extent we act so as to realize those preferences.
For real human beings in the real world, those preferences are far from being perfectly describable by any utility function. But it seems reasonable to me to describe them as being in some sense the same kind of thing as a utility function.
There are mathematical theorems that say that if you have preferences over outcomes, then certain kinda-reasonable assumptions (that can be handwavily described as “your preferences are consistent and sane”) imply that those preferences actually must be describable by a utility function.
This doesn’t mean that effective intelligent agents must literally have utility functions; after all, we are effective intelligent agents and we don’t. But it does at least suggest that if you’re trying to build an effective intelligent agent, then giving it a utility function isn’t obviously a bad idea.
All of which seems to me like sufficient reason to (1) investigate AI designs that have (at least approximately) utility functions, and (2) be skeptical of any claim that having a utility function actually makes AI impossible. And it doesn’t appear to me to come down to a baseless article of faith, no matter what you and Marcus Hutter may have said to one another.
But there are good reasons for thinking that, in absolute terms, many mathematical methods of AI safety are useless. The problem is that they relate to ideal rationaliists, but ideal rationality is uncomputable, so they are never directly applicable to any buildable AI....and how they real world AI would deviate from ideal rationality is crucial to understanding the that’s they would pose. Deviations from ideal rationality could pose new threats, or could counter certain classes of threat (in particular, lack of goal stability could be leveraged to provide corrigibility, which is a desirable safety feature).
There’s an important difference between thinking mathematically and only thinking mathematically. Highly non mathematical AI, that is cobbled together without clean overriding principles, cannot be made safe by clean mathematical principles, although it could quite conceivably be made safe by piecemeal engineering solutions such as kill switches, corrigibility and better boxing… the kind of solution MIRI isnt interested in...which does look as though they are neglecting a class of AI danger.
If any particular mathematical approach to AI safety is useless, and if MIRI are attempting to use that approach, then they are making a mistake. But we should distinguish that from a different situation where they aren’t attempting to use the useless approach but are studying it for insight. So, e.g., maybe approach X is only valid for AIs that are ideal rationalists, but they hope that some of what they discover by investigating approach X will point the way to useful approaches for not-so-ideal rationalists.
Do you have particular examples in mind? Is there good evidence telling us whether MIRI think the methods in question will be directly applicable to real AIs?
I agree. I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”, and I’m pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren’t advocating that no such things be done. But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security there’s no substitute for designing your system for security from the start—and one possible danger of doing the crude piecemeal things is that they give you a false sense of safety.
By 1900, the basic principles of areodynamics in terms of lift and drag were known for almost a century—the basic math of flight. There were two remaining problems: power and control. Powered heavier than air flight requires an efficient engine with sufficient power/weight ratio. Combustion engine tech developed along a sigmoid, and by 1900 that tech was ready.
The remaining problem then was control. Most of the flight pioneers either didn’t understand the importance of this problem, or they thought that aircraft could be controlled like boats are—with a simple rudder mechanism. The Wright Brothers—two unknown engineers—realized that steering in 3D was more complex. They solved this problem by careful observation of bird flight. They saw that birds turned by banking their whole body (and thus leveraging the entire wing airfoil), induced through careful airfoil manipulation on the trailing wing edge. They copied this wing warping mechanism directly in their first flying machines. Of course—they weren’t the only ones to realize all this, and ailerons are functionally equivalent but more practical for fixed wing aircraft.
Flight was achieved by technological evolution or experimental engineering, taking some inspiration from biology. Pretty much all tech is created through steady experimental/evolutionary engineering. Machine learning is on a very similar track to produce AGI in the near term.
Ahh and that’s part of the problem. The first AGIs will be sub-human then human level intelligence, and Moore’s Law is about to end or has already ended, so the risk of some super rapid SI explosion in the near term is low. Most of the world doesn’t care about tight security. AGI just needs to be as safe or safer than humans. Tight security is probably impossible regardless—you can’t prove tight bounds on any system of extreme complexity (like the real world). Tight math bounds always requires ultra-simplified models.
Where are insights about the relative usefulness of .pure theory going to come from?
Its not even conceivable? Even though auto motive safety basically happened that way?
That’s clearly not crude hackery, but its not pure theory either. The kind of Clean Engineering you are talking about ican only be specific to a particular architecture, which pure theory isnt.
There is a pretty hard limit to how much you can predict about system,, AI or not, without knowing its architecture.
That wasn’t at all the sort of insight I had in mind. It’s commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case.
(Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn’t depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.)
Well, I guess it depends on what you mean by “quite conceivable”. Obviously anyone can say “we might be able to make a cobbled-together AI safe by piecemeal engineering solutions”, so if that counts as “conceiving” then plainly it’s conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it’s at all likely, and that’s what I took you to mean by “quite conceivable”.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
I’m sure that very detailed prediction of what a superhumanly intelligent AI will do will be impractical in many cases, even knowing its architecture. (Otherwise it wouldn’t be superhumanly intelligent.) That’s quite compatible with being able to say that it’s worth worrying that it might do X, and that so far as we can see there is no circumstance in which it would do Y. And predictions of that kind are definitely possible without any knowledge of the architecture. (“I worry that the chess program will attack my kingside if I leave it without good defenders.” “Whatever I do, it is very unlikely that the chess program will make an illegal move.”)
But, sure, lots of the things that would be necessary to build a safe superhuman AI would depend on the details of how it’s designed. That’s OK; no one is building a superhuman AI yet. (So far as we know.) The point of MIRI’s work, as I understand it, is to prepare the ground for later engineering work by beginning to understand the territory.
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor.
Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives.
Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway.
Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety.
That can only be true in special cases. You can’t in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is.
In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someones got to have insights about how pure theory fits into the bigger picture.
And sometimes that’s directly applicable, and sometimes it isnt....that’s one of the big picture issues.
I wasn’t meaning to denigrate that sort of insight. (Though “how pure theory fits in” doesn’t seem to me the same thing as “the relative usefulness of pure theory”, which is what you said before, and I think what you’re describing now sounds distinctly more valuable.) Just saying that it wasn’t the kind of insight I would look for from studying the pure theory.
In this case, I wouldn’t much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it’s indirectly applicable) once one has a reasonable quantity of theory in hand.
Link?
Sorry, was in too much of a rush to give link.....
Loosemore, R.P.W. (2007). Complex Systems, Artificial Intelligence and Theoretical Psychology. In B. Goertzel & P. Wang (Eds.), Proceedings of the 2006 AGI Workshop. IOS Press, Amsterdam.
http://richardloosemore.com/docs/2007_ComplexSystems_rpwl.pdf
Excuse me, but as much as I think the SIAI bunch were being rude to you, if you had presented, at a serious conference on a serious topic, a paper that waves its hands, yells “Complexity! Irreducible! Parallel!” and expected a good reception, I would have been privately snarking if not publicly. That would be me acting like a straight-up asshole, but it would also be because you never try to understand a phenomenon by declaring it un-understandable. Which is not to say that symbolic, theorem-prover, “Pure Maths are Pure Reason which will create Pure Intelligence” approaches are very good either—they totally failed to predict that the brain is a universal learning machine, for instance.
(And so far, the “HEY NEURAL NETS LEARN WELL” approach is failing to predict a few things I think they really ought to be able to see, and endeavor to show.)
That anyone would ever try to claim a technological revolution is about to arise from either of those schools of work is what constantly discredits the field of artificial intelligence as a hype-driven fraud!
Okay, so I am trying to understand what you are attacking here, and I assume you mean my presentation of that paper at the 2007 AGIRI workshop.
Let me see: you reduced the entire paper to the statement that I yelled “Complexity! Irreducible! Parallel!”.
Hmmmm...… that sounds like you thoroughly understood the paper and read it in great detail, because you reflected back all the arguments in the paper, showed good understanding of the cognitive science, AI and complex-systems context, and gave me a thoughtful, insightful list of comments on some of the errors of reasoning that I made in the paper.
So I guess you are right. I am ignorant. I have not been doing research in cognitive psychology, AI and complex systems for 20 years (as of the date of that workshop). I have nothing to say to defend any of my ideas at all, when people make points about what is wrong in those ideas. And, worse still, I did not make any suggestions in that paper about how to solve the problem I described, except to say “HEY NEURAL NETS LEARN WELL”.
I wish you had been around when I wrote the paper, because I could have reduced the whole thing to one 3-word and one 5-word sentence, and saved a heck of a lot of time.
P.S. I will forward your note to the Santa Fe Institute and the New England Complex Systems Institute, so they can also understand that they are ignorant. I guess we can expect an unemployment spike in Santa Fe and Boston, next month, when they all resign en masse.
I don’t see it as dogmatism so much as a verbal confusion. The ubiquity of UFs can be defended using a broad (implicit) definition, but the conclusions typically drawn about types of AI danger and methods of AI safety relate to a narrower definition, where a Ufmks
Explicitly coded And/or
Fixed, unupdateable And/or
“Thick” containing detailed descriptions of goals.
Since the utility function is approximated anyway, it becomes an abstract concept—especially in the case of evolved brains. For an evolved creature, the evolutionary utility function can be linked to long term reproductive fitness, and the value function can then be defined appropriately.
For a designed agent, it’s a useful abstraction. We can conceptually rate all possible futures, and then roughly use that to define a value function that optimizes towards that goal.
It’s really just a mathematical abstraction of the notion of X is better than Y. It’s not worth arguing about. It’s also proven in the real world—agents based on utility formalizations work. Well.
It certainly is worth discussing, and I’m sorry but you are not correct that “agents based on utility formalizations work. Well.”
That topic came up at the AAAI symposium I attended last year. Specifically, we had several people there who built real-world (as opposed to academic, toy) AI systems. Utility based systems are generally not used, except as a small component of a larger mechanism.
Pretty much all of the recent ML systems are based on a utility function framework in a sense—they are trained to optimize an objective function. In terms of RL in particular, Deepmind’s Atari agent works pretty well, and builds on a history of successful practical RL agents that all are trained to optimize a ‘utility function’.
That said, for complex AGI, we probably need something more complex than current utility function frameworks—in the sense that you can’t reduce utility to an external reward score. The brain doesn’t appear to have a simple VNM single-axis utility concept, which is some indication that we may eventually drop that notion for complex AI. My conception of ‘utility function’ is loose, and could include whatever it is the brain is doing.
Wait wait wait. You didn’t head to the dinner, drink some fine wine, and start raucously debating the same issue over again?
Bah, humbug!
Also, how do I get invited to these conferences again ;-)?
Very true, at least regarding AI. Personally, my theory is that the brain does do reinforcement learning, but the “reward function” isn’t a VNM-rational utility function, it’s just something the body signals to the brain to say, “Hey, that world-state was great!” I can’t imagine that Nature used something “mathematically coherent”, but I can imagine it used something flagrantly incoherent but really dead simple to implement. Like, for instance, the amount of some chemical or another coming in from the body, to indicate satiety, or to relax after physical exertion, or to indicate orgasm, or something like that.
Hey, ya pays yer money and walk in the front door :-) AGI conferences run about $400 a ticket I think. Plus the airfare to Berlin (there’s one happening in a couple of weeks, so get your skates on).
Re the possibility that the human system does do reinforcement learning …. fact is, if one frames the meaning of RL in a sufficiently loose way, the human cogsys absolutely DOES do RL, no doubt about it. Just as you described above.
But if you sit down and analyze what it means to make the claim that a system uses RL, it turns out that there is a world of difference between the two positions:
and
The difference is that the second case turns the descriptive mechanism into an explicit mechanism.
It’s like Ptolemy’s Epicycle model of the solar system. Was Ptolemy’s fancy little wheels-within-wheels model a good descriptive model of planetary motion? You bet ya! Would it have been appropriate to elevate that model and say that the planets actually DID move on top of some epicycle-like mechanism? Heck no! As a functional model it was garbage, and it held back a scientific understanding of what was really going on for over a thousand years.
Same deal with RL. Our difficulty right now is that so many people slip back and forth between arguing for RL as a descriptive model (which is fine) and arguing for it as a functional model (which is disastrous, because that was tried in psychology for 30 years, and it never worked).
This is literally false. A model of a brain might, some functional copy of brain implemented on a different hardware platform possibly could. An actual human brain, I don’t think so.
This is also literally false. Consider a trivial loop for (i=0; i<100000; i++) { .. } Human brains can conceptualize it, but they do not run it
Theoretically a brain with some additional memory tools could run windows. In practice, sure an actual human brain would not be able to, obviously—boredom.
I did not mean that every codepath is run—but that’s never true anyway. And yes “all of the code” is far too strong—most of it is just loosely conceptually simulated by the brain alone, and then more direct sample paths are run with the help of a debugger.
Fermi estimate time! :-)
Given an appropriately unrolled set of appropriate instructions, how long would it take for a human armed with nothing but paper and pencil to simulate a complete Windows (say, Windows 7) boot process?
If an AGI is based on a neural network, how can you tell from the logs whether or not the AI knows it’s in a simulation?
This suggests 15000 GPUs is equivalent in computing power to a human brain, since we have about 150 trillion synapses? Why did you suggest 1000 earlier? How much of a multiplier on top of that do you think we need for trial-and-error research and training, before we get the first AGI? 10x? 100x? (If it isn’t clear, I mean that if you only have hardware that’s equivalent to a single human brain, it might take a few years to train it to exhibit general intelligence, which seems too slow for research that’s based on trying various designs to see which one works.)
Seems like a very good question that has been largely neglected. I know your ideas for training/testing neuromorphic AGI in VR environments. What other ideas do people have? Or have seen? I wonder what Shane Legg’s plan is, given that he is worried about existential risk from AI, and also personally (as co-founder of DeepMind) racing to build neuromorphic AGI.
ANN based AGI will not need to reproduce brain circuits exactly. There are general tradeoffs between serial depth and circuit size. The brain is much more latency/speed constrained so it uses larger, shallower circuits whereas we can leverage much higher clock speeds to favour deeper smaller circuits. You see the same tradeoffs in circuit design, and also in algorithms where parallel variants always use more ops than the minimal fully serial variant.
Also, independent of those considerations, biological circuits and synapses are redundant, noisy, and low precision.
If you look at raw circuit level ops/second, the brain’s throughput is not that much. A deeper investigation of the actual theoretical minimum computation required to match the human brain would be a subject for a whole post (and one I may not want to write up just yet). With highly efficient future tech, I’d estimate that it would take far less than 10^15 32-bit ops/s (1000 gpus): probably around or less than 10^13 32 bit ops/s. So we may already be entering into a hardware overhang situation.
One way to estimate that is to compare to the number of full train/test iterations required to reach high performance in particular important sub-problems such as vision. The current successful networks all descend from designs invented in the 80′s or earlier. Most of the early iterations were on small networks, and I expect the same to continue to be true for whole AGI systems.
Let’s say there are around 100 researchers who worked full time on CNNs for 40 years straight (4000 researcher years), and each tested 10 designs per year—so 40,000 iterations to go from perceptrons to CNNs. A more accurate model should consider the distribution over design iterations times and model sizes. Major new risky techniques are usually tested first on small problems and models and then scaled up.
So anyway, let’s multiply by 20 roughly and say it takes a million AGI ‘lifetimes’ or full test iterations, where each lifetime is 10 years, and it requires 10 GPU years per AGI year, this suggests 100 million GPU years or around 100 billion dollars.
Another more roundabout estimation—it seems that whenever researchers have the technical capability to create ANNs of size N, it doesn’t take long in years to explore and discover what can be built with systems of that size. Software seems to catch up fast. Much of this effect could also be industry scaling up investment, but we can expect that to continue to accelerate.
I’m not sure. He hasn’t blogged in years. I found this which quotes Legg as saying:
So presumably he is thinking about it, although that quote suggests he perhaps thinks extinction is inevitable. The most recent interview I can find is this which doesn’t seem much related.
Thanks for the explanations.
Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. In other words, how much computing power will a single lab have accumulated by the time we get AGI? As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I’m understanding you correctly).
Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals $.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour. According to this discussion $.11 seems close to the actual cost. Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there’s not a huge economic incentive to race for it yet. (I mean, unless one predicts that GPU costs will keep falling in the future, and therefore wants to prepare for that.)
Nvidia is claiming that its next generation of GPU is 10x better for deep learning. How much of that is hype?
My earlier statement about 10 million neurons / 10 billion synapses on a single GPU is something of a gross oversimplification.
A more realistic model is this:
B flops = M F * N
Where B is a software sim efficiency parameter (currently ~ 1, and roughly doubling per year), M is the number of AI model instances, F is the frequency in hz, and N is the number of synapses.
Today’s CPU/GPU ANN solutions need to parallelize over a large number of AI instances to get full efficiency—due to memory and bandwidth issues—so B is ~1 only when M is ~100. Today on a current high end GPU with 1 trillion flops you can thus run 100 copies of a 1 billion synapse ANN at 10 hz (M = 100, F = 10, N = 1 billion), whereas a single copy on the GPU may run at only 50 hz ish (B ~0.05, 20x less efficient). Training is accelerated mainly by parallel speedup over instances rather than serial speedup of a single instance.
So with 1000 GPUs and today’s tech, in theory you could get 100 copies of a 1 trillion synapse ANN running at 10hz using model parallelism. 1 trillion synapses @ 10hz is borderline plausible, 10 trill @ 100 hz is probably more realistic and would entail 100,000 gpus. But this somewhat assumes near perfect parallel scaling. Communication/latency issues limit the maximize size of realistic models. 100,000 GPUs would be larger than the biggest supercomputers of today, and probably is far beyond the limits of practical linear scaling.
So it’s only 1000 2015 gpus = 1 brain in an amortized rough sense. In practice I expect there is a minimum amount of software & hardware speedup required first to make these very large ANNs realistic or feasible in the first place, because of weak scaling issues in supercomputers. But once you get over this minimum barrier, there is a pretty large room for sudden speedup.
And finally—parallel model speedup seems to be almost as effective as serial speedup, and is more powerful than the equivalent parallel scaling in human organizations—because the AI instances all share the same ANN model or mind and thus learn in parallel.
Ya, this sounds about right. However, this is predicated on a roughly $100 billion initial investment in 1 million AGI ‘lifetimes’ for research. If that was spread out over just 5 years, that would correspond to a population of about a million AGI’s at the end. In other words, its unlikely that research success would result in only $100 million worth of AGIs.
The earlier $1 per GPU-hour is something I remembered from looking at amazon prices, but that was a while ago and is probably completely out of date. The cheapest option is probably to buy gaming video cards and build your own custom data center, and that is where the $1000 per year came from.
Yes, in theory if we had the right sim code and AGI structure, I think we could run it today and replace all kinds of human labor. In some sense this has already started—but so far ANNs are automating only some specific simple jobs like coming up with image captions.
Jen said the 10x was ‘ceo-math’, but I still don’t get that figure. 2x is expected from new architecture and process, and then 2x more for fp16 extensions. So 4x is reasonable. More importantly, the bandwidth improvement is claimed to be about 4x or 5x as well.
There is some interesting overlap between these ideas and Eric Drexler’s recent proposal. (Previously discussed on LessWrong here)
Cool—hadn’t read that yet.
Separating learning capacity from domain knowledge is kind of automatic in a ULM approach. There is nothing inherently dangerous about the learning mechanisms itself—it’s the knowledge that is potentially dangerous. I have butted heads with LW on that point for 4 to 5 years.
The knowledge management idea is the essence of the VR sandbox approach, but I also imagine separating out value systems/priors to some degree for independent testing. Overall Drexler’s proposal (from reading the abstract and skimming) seems to be very much in line with my views.
Safety considerations would go into design at all levels, from designing the VR world itself to the brain architecture to the education/training programs.
In regards to modularity: large ANN systems are already modular, brains are modular, and brain-style AGI approaches are modular. It’s just sort of assumed. It’s a new consideration for perhaps the formal/math/AIXI/MIRI cluster, but only because they haven’t put as much thought into practical architecture.
So you’re saying that I’m secretly an AI being trained to be friendly for a more advanced world? ;)
That’s possible given the sim argument. The eastern idea of reincarnation and the western idea of afterlife map to two main possibilities: in the reincarnation model all that is transferred between worlds is the architectural seed or hyperparameters. In the afterlife model the creator has some additional moral obligation or desire to save and transfer whole minds out.
Why isn’t this in Main??? I mean I can understand that it wasn’t posted there but the upvotes say ‘Main’ in a quite clear language. There isn’t even much that could be improved in the post so why not move it?
I actually forgot that was something I needed to do. Done now.
“In the modern era some deaf humans have apparently acquired the ability to perform echolocation (sonar), similar to cetaceans.” Did you mean blind?
Excellent article, thanks!
But here is also rising a question about animal intelligence. My cat (unfortunately) is more like a set of programms from the point of view of its behaviour, but its brain diagram is the same as in any vertebral. So does it support modules hypothesis?
Thanks for the typo find—I read the whole thing several times and didn’t notice. Presumably the low level recognition of the word was overrided by the high level prior without triggering any alarms.
Cats have the same overall brain architecture as primates and humans, but smaller. Generally the payoff for learning increases with brain size and lifespan. The smaller the brain and the shorter the organism’s lifespan, the more evolution relies on complex innate reflexes.
One interesting (but cruel) experiment which illustrates this is decerebration.
See this article from the Great Soviet Encyclopedia. Basically the larger the mammal’s brain, the more they depend on learned functionality in the new brain (cortex + cerebellum) .
That’s a lot to absorb, so I’ve skimmed it, so please forgive if responses to the following are already implicit in what you’ve said.
I thought the point of the modularity hypothesis is that the brain only approximates a universal learning machine and has to be gerrymandered and trained to do so?
If the brain were naturally a universal learner, then surely we wouldn’t have to learn universal learning (e.g. we wouldn’t have to learn to overcome cognitive biases, Bayesian reasoning wouldn’t be a recent discovery, etc.)? The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up.
You are conflating the ideas of universal learning and rational thinking. They are not the same thing.
I’m a strong believer in the idea that the human intelligence emerges from a strong general purpose reinforcement learning algorithm. If that’s true, then it’s very consistent with our problems of cognitive bias.
If the RL idea is correct, then thinking is best understood as as a learned behavior, just like what words we speak with our lips is a learned behavior, just as how we move our arms and legs are learned behaviors. Under the principle that we are are an RL learning machine, what we learn, is ANY behavior which helps us to maximize our reward signal.
We don’t learn rational behavior, we learn whatever behavior the learning system rationally has computed is what is needed to produce the most rewards. And in this care, our prime rewards are just those things which give us pleasure, and which reduce pain.
If we live in an environment that gives us rewards when we say “I believe God is real, and the Bible is to book of God, and the Earth is 10,000 years old”, -- then we will say those words. We will do ANYTHING that works to maximize rewards, in our enviornment. We will not only say them, we will believe them in our core. If we are conditioned by our enviornment to believe these things, that is what we will believe.
If we live in an environment that trains us to look at the data, and make conclusions based on what the data tells us (follow the behavior of a rational scientist), when we will act that way instead.
A universal learning can learn to act in any way it needs to in order to maximize rewards.
That’s what our cognitive bias is—our brain’s desire to act as our past experience as trained us, not to act rationally.
To learn to act rationally, we must carefully be trained to act rationally—which is why the ideas of less wrong are needed to overcome our bias.
Also keep in mind that the purpose of the human brain is to control our actions—and for controlling actions, speed is critical. Our brain is best understood not as a “thinking machine” but rather as a reaction machine—a machine that must choose a course of action in a very short time frame (like .1 seconds) -- so that when needed, we can quickly react to an external danger that is trying to kill is—from a bear attacking us, to a gust of wind, that almost pushed us over the edge of the cliff.
So what the brain needs to learn, as a universal learner, is an internal “program” of quick heuristics, how to respond instantly, to any environmental stimulus. We learn (universally) how to react, not how to “think rationally”.
A process like thinking rationally is a large set of learned micro reactions—one that a takes along time to assemble and perfect. To be a good rational thinker, we have to overcome all the learned reactions that have helped us in the past gain rewards, but which have been shown not to be the actions of a rational thinker. We have to help train each other, to spot false behaviors, and train the person to have only ration behaviors when we try to engage in rational behavior that is.
Most of our life, we don’t need rational behavior—we need accurate reward maximizing behavior. But when we choose to engage in a rational thought and analysis process, we want to do our best, to be rational, and not let our learned (cognitive baise) trick us into believing we are being rational, when in fact we are just reward seeking.
So, our universal learning, could be a reward maximising process, but if it is, then that explains why we have strong cognitive bias, it’s not an argument against having a cognitive baise. This is because our reward function is not wired to make us maximize rationality—it’s wired to make us act anyone needed, so as to maximize pleasure and minimize pain. Only if we immerse ourselves in an environment that rewards us for rational thinking behaviors do those behavior emerge in us.
Yes, this. But it is so easy to make mistakes when interpreting this statement, that I feel it requires dozen warnings to prevent readers from oversimplifying it.
For example, the behavior we learn is the behavior that produced most rewards in the past, when we were trained. If the environment changes, what we do may no longer give rewards in the new environment. Until we learn what produces rewards in the new environment.
Unless we already had an experience with changing environment, in which case we might adapt much more quickly, because we already have meta-behavior for “changing the behavior to adapt to new environment”.
Unless we already had an experience when the environment changed, we adapted our behavior, then the environment suddenly changed back, and we were horribly punished for the adapted behavior, in which case the learned meta-behavior would be “do not change your behavior to adapt to the new environment (because it will change back and you will be rewarded for persistence)”.
It is these learned meta-behaviors which make the human reactions so difficult to predict and influence.
Also, even in the unchanging environment, our behavior is not necessarily the best one (in terms of getting maximum rewards). It is merely the best one that our learning algorithm could find. For example, we will slowly move towards a local maximum, but if there is a completely different behavior that would give us higher rewards, we may simply never look at that direction, so we will never find out.
We learn to model our environment (because we have the innate ability to model things, and we learn that having some models increases the probability of a reward), but our models can be wrong, while still better than the maximum entropy hypothesis (this is why we keep them), but can be a local maximum that is actually not a good choice globally.
Human psychology has so many layers. Asking which psychological school better describes human mind seems like asking whether the functionality of a human body is better described by biology, or chemistry, or physics. The bottom layer of the human mind are reflexes and learning (which is what behaviorists got right), but trying to see everything only in terms of reflexes is like trying to describe human body only as an interaction of elementary particles—yes, this is the territory, but it is computationally intractable for us. People are influenced by the models they created in the past, some of which may be deeply dysfunctional (which is what psychoanalysts got right); and the intelligent people are able to go more meta and start making models of human happiness, or start building precise models of reality, and often change their behavior based on these models (insert other psychological schools here).
At the bottom, the learning algorithm is based on maximizing rewards, but it is not necessarily maximizing rewards in the current situation, for many different possible reasons.
What are the typical examples of such environment (not necessarily perfect, just significantly better than average)? I think it is (a) keeping a company of rational people who care about your rationality, for example your parents, teachers, or friends, if they happen to be rational; and (b) doing something where you interact with reality and get a precise feedback, often something like math or programming, if you happen to generalize this approach to other domains of life.
Hmm, but isn’t this conflating “learning” in the sense of “learning about the world/nature” with “learning” in the sense of “learning behaviours”? We know the brain can do the latter, it’s whether it can do the former that we’re interested in, surely?
IOW, it looks like you’re saying precisely that the brain is not a ULM (in the sense of a machine that learns about nature), it is rather a machine that approximates a ULM by cobbling together a bunch of evolved and learned behaviours.
It’s adept at learning (in the sense of learning reactive behaviours that satisfice conditions) but only proximally adept at learning about the world.
I’m not sure what you mean by gerrymandered. I summarized the modularity hypothesis in the beginning to differentiate it from the ULM hypothesis. There are a huge range of views in this space, so I reduced them to examplars of two important viewpoint clusters.
The specific key difference is the extent to which complex mental algorithms are learned vs innate.
You certainly don’t need to learn how to overcome cognitive biases to learn (this should be obvious). Knowledge of the brain’s limitations could be useful, but is probably more useful only in the context of having a high level understanding of how the brain works.
In regards to bayesian reasoning, the brain has a huge number of parallel systems and computations going on at once, many of which are implementing efficient approximate bayesian inference.
Verbal bayesian reasoning is just a subset of verbal mathematical reasoning—mapping sentences to equations, solving, and mapping back to sentences. It’s a specific complex ability that uses a number of brain regions. It’s something you need to learn for the same reasons you need to learn multiplication. The brain does tons of analog multiplications every second, but that doesn’t mean you have an automatic innate ability to do verbal math—as you don’t have an automatic innate ability to do much of anything.
One of the main points I make in the article is that universal learning machines are a very general thing that - in simplest form—can be specified in a small number of bits, just like a turing machine. So it’s a sort of obvious design that evolution would find.
What I meant is that you have sub-systems dedicated to (and originally evolved to perform) specific concrete tasks, and shifting coalitions of them (or rather shifting coalitions of their abstract core algorithms) are leveraged to work together to approximate a universal learning machine.
IOW any given specific subsystem (e.g. “recognizing a red spot in a patch of green”) has some abstract algorithm at its core which is then drawn upon at need by an organizing principle which utilizes it (plus other algorithms drawn from other task-specific brain gadgets) for more universal learning tasks.
That was my sketchy understanding of how it works from evol psych and things like Dennett’s books, Pinker, etc.
Furthermore, I thought the rationale of this explanation was that it’s hard to see how a universal learning machine can get off the ground evolutionarily (it’s going to be energetically expensive, not fast enough, etc.) whereas task-specific gadgets are easier to evolve (“need to know” principle), and it’s easier to later get an approximation of a universal machine off the ground on the back of shifting coalitions of them.
Ah ok your gerrymandering analogy now makes sense.
I think that’s a good summary of the evolved modularity hypothesis. It turns out that we can actually look into the brain and test that hypothesis. Those tests were done, and lo and behold, the brain doesn’t work that way. The universal learning hypothesis emerged as the new theory to explain the new neuroscience data from the last decade or so.
So basically this is what the article is all about. You said earlier you skimmed it, so perhaps I need a better abstract or summary at the top, as oge suggested.
This is a pretty good sounding rationale. It’s also probably wrong. It turns out a small ULM is relatively easy to specify, and also is completely compatible with innate task-specific gadgetry. In other words the universal learning machinery has very little drawbacks. All vertebrates have a similar core architecture based on the basal ganglia. In large brained mammals, the general purpose coprocessors (neocortex, cerebellum) are just expanded more than other structures.
In particular it looks like the brainstem has a bunch of old innate circuitry that the cortex and BG learns how to control (the BG does not just control the cortex), but I didn’t have time to get into the brainstem in the scope of this article.
Great stuff, thanks! I’ll dig into the article more.
could you update the 404 image, please? (link to the site still works for now, just the image is gone)
Other evidence I would add to the theory of the brain being a ULM is the existence of the g-factor, and the fact the general factor is one that explains the most variation in these cognitive tests. In addition—if you model human cognitive abilities as universal and specific, evolutionarily speaking it would make sense for the universal aspect to be under stronger selection than the specific domain. One exception to this could be language learning, which is important just for the sake of being able to communicate.
Interesting article. Minor note on clarity: You might want to clarify the acronym “EMH” where it appears, since it so often here stands for “efficient market hypothesis”.
I find images such as the one above extremely disconcerting for some reason. They cause me about a 7⁄10 level of discomfort, verging on moderate pain. It also sticks in my head for a dozen or so minutes after viewing. I’d strongly prefer to never see one ever again, please.
I don’t know if this is something unique to my brain, or if this is a step towards a real life BLIT, but wow. Awful to experience, I have extra empathy for epileptic people now.
Me too (though to a much lesser extent for this particular image; a little more for certain other such images). Reading Scott Alexander say that such images look a lot like LSD hallucinations made me change my mind about whether I’ll ever want to try LSD.
I’m curious: Imagine that you haven’t seen this article yet, and that you are now going to read this article for the first time. It contains a message “trigger warning: XYZ” at the top (for some value of XYZ).
Which value of XYZ would give you the best idea of what kind of images the article contains, so you could have made an informed decision not to look at them?
(I imagine something like “weird disturbing pictures that seem like hallucinations”. But would you have predicted your reaction from reading such text? Would it actually have stopped you from looking?)
It wouldn’t have stopped me. But now that I’m acquainted with images such as this, if someone put “Trigger warning: Shoggoths” or something similar on future posts, then I would take heed of such warnings.
Update:
I have been tentatively identified as having a very early form of cancer. They dug a tumorous lymph node out of my neck two days ago, although more is still left inside. More tests will happen next Monday. I don’t truly think, with my rational mind, that there is a connection between the cancer and this image. However, I am emotionally disturbed. To me, the above image seems essentially like a visual depiction of cancer. When I look at the image, my throat seizes up, and my brain flinches in pain.
The rearmost parts of my brain are paranoid, and demand that I mention this diagnosis just in case this image truly behaves like a BLIT for some people. Again, I don’t actually believe in this idea. But if anyone else gets cancer sometime soon, please let us know. Because I’m disturbed and disgusted, I feel violated.
Hi jacob_cannell, this article looks really interesting but it is a LOT to consume at once. Could you please put a summary at the top with the main points so that it makes the post easier to navigate?
Hey oge—thanks for the feedback. I tried to summarize the article in the intro, but maybe that didnt work. Do you think an a short abstract at the top would help? Or perhaps an outline?
An abstract as the very first thing would help. An outline would be better.
Here are the paragraphs that I thought were the main point of the article (please correct this if I’m wrong):
“These two conceptions of the brain—the universal learning machine hypothesis and the evolved modularity hypothesis—lead to very different predictions for the likely route to AGI, the expected differences between AGI and humans, and thus any consequent safety issues and strategies.”
and
“Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization. Furthermore, Moore’s Law for GPUs still has some steam left, and software advances are currently improving simulation performance at a faster rate than hardware. These trends implies that Anthropomorphic/Neuromorphic AGI could be surprisingly close, and may appear suddenly.
What kind of leverage can we exert on a short timescale?”
Done—I added the abstract as first thing under the header image, followed by an outline.
Typeo just above “Basal Ganglia” section.
For example infants are born with a simple versions of a fear response, with is later refined through reinforcement learning.
“with is later” should be “which is later”
This was a great post, thanks!
One thing I’m curious about is how the ULH explains to the fact that human thought seems to be divided into System 1/System 2 - is this solely a matter of education history?
At first the ULH seemed to predict too much plasticity relative to observation, but on reflection I think it might predict the right amount. To square ULH with human universals, we have to hypothesize that the general structure and the conditions of human life robustly result in convergence to certain attractors. But the big advantage of this hypothesis is that it neatly explains why certain mental comlexes like farmer morality sometimes seems to have innate support while also being sometimes unlearnable and possibly not existing before agriculture.
This fits very much in my findings having written a dynamic cognition theory that sees the key to cognitive dynamics as being in getting the reinforcement learning right.
In the Salience theory of dynamic cognition I’ve put forward, salience (which is a descriptor for the functions performed by the emotional and autonomic centers of the brain combined) is the reason why the generalized algorithm of the neocortex (which I assert is nothing more than comparison after sensation, selection after comparison, and finally prediction on top of the selection. Salience is what is used to decide what to predict which then does or does not drive action.
I’d been refining the theory privately for a few years and only posting some tid bits to my blog but then in 2013 decided to release a post to introduce it formerly and put forward some testable hypothesis.
I assert that not only is salience the driver of dynamic cognition but it also gives rise naturally to self awareness and thus consciousness. In fact consciousness is an illusion of the constant dynamics of salience driven action modulating events caused by responses to sensory experience. Levels of consciousness only being distinguished by levels of resolution to which sensory experience can be encoded in the brain (two places determine this resolution, sensory resolution plus memory capacity).
Cognitive dynamics emerge as possible responses in the salience selection space over the k dimensions of sensory experiences that the cognition is able to perform salience driven dynamics over. In a human being the main dimensions of note are olfactory, gustatory, somatosensory, visual, auditory.
Generally a control flow diagram for dynamic cognition (the minimal set required in the theory) involves 4 nodes of transformation and feedback between them.
Sensation (the organs that capture a given data set from world space… the 5 senses...etc.)
Comparison (the place where incoming sensation is compared to previous salience “tagged” sensation [memory, most immediately short term memory but also long term] and predictions are made via “selection”)
Salience (the place where comparison predictions are tagged with their emotional or autonomic import factors)
Action (the final node that can remodulate sensation directly but only if a dynamic threshold determined by Salience evalution triggers it ..otherwise comparison and salience evaluation continues.
Associated important findings of the theoretical frame work for salience are that the autonomic and emotional centers are critical to get right if we are to emerge self aware cognition. The need to avoid pathological cognition is great...also the need to create cognition that cares about our needs is also important. I’ve put forward ideas on how this would be done in a salience based dynamic cognition.
I’ve compiled a chronological series of my articles on the theory compiled over the last 5 years or so.
The first post was from April of 2010, which posed the question of weather or not emotion was required as a guide for cognitive dynamics...at first I was of the mind that it could be discarded or replicated using neo cortical dynamics alone but later came to realize that emotion was not a throw away element it was a core aspect of how salience enabled selection which then drove action. This would be a major element of the salience theory pieced together over the next couple of years.
http://sent2null.blogspot.com/2010/04/emotion-no-longer-has-to-be-our-guide.html
My curiosity continued to be explored in July, 2010 when I wrote the following post, which posited ideas for the reason for the development of emotions. I surmised that emotion lay at the center of the riddle of what intelligence was particularly consciousness but had no way to formerly draw a connection between the two, not yet. This post was still important because it allowed me to latch on to the idea of emotional salience to survival (as opposed to being driven by a need to serve social urges first) as being an important piece of the dynamic cognition puzzle.http://sent2null.blogspot.com/2010/07/why-of-emotion-from-whence-did-it-come.html
The neuroscience was clueless to the control systems and neural network work that the machine learning and electrical engineer space had long explored. The engineers and machine learning researchers were clueless to how neurotransmitters were being used in the brain between different cognitive modules to effect processing of music, storing of memories and the big prize, how a self aware and dynamic cognitive process could emerge from the very discrete operations happening in the mind. I wrote articles on emotion and it’s purpose and put forward a theory of dreams (which has since found much support in empirical research).http://sent2null.blogspot.com/2009/06/yet-another-theory-of-dreams.html
:The theory of dreams was important to the work on cognition because the fundamental question of what happened to “you” when you fell into sleep without a dream state was a core focus. I concluded that “you” only existed when actively experiencing, comparing, evaluating and doing...that there was no difference between this process and consciousness...and by December 2011, after I’d started and mostly finished the implementation of ADA and implicit AOW I wrote out the theory that salience required autonomic and emotional modulation along with memory comparison to sensory input.http://sent2null.blogspot.com/2011/12/how-does-idea-form-autonomics-memory.html
The idea that it was salience driven cognitive dynamism that led to consciousness fundamentally enabled by what I called “drive”, “drive” I analyzed to have both autonomic (physical often involuntary) and emotional components. I concluded that the underlying reason for drive was irrelevant to the emergence of dynamic cognition but that the number and level of sensory dimensions was important for setting the interacting complexity of abstractions that could arise in such an emerged mind but without the emotion and autonomic modulation no dynamism of any kind would emerge...only reflex would exist. Around this time I started looking more widely into the work of neuroscientists and came across Giovani Tonini’s Integration Information theory, I was amazed by how right this generalized approach to consciousness seemed to me. However I found it had fundamental flaws in that though it did an excellent job of explaining the cognitive landscape in terms of what was called Qualia space it didn’t explain how Qualia space turned into self aware agents with drive. This latter piece being answered by my ideas on salience still being formed.:The same day I wrote out a set of diagrams that detailed the cognitive cycle based on salience, the image below is taken from that sheet:http://sent2null.blogspot.com/2012/03/integrated-information-does-not-equate.html
:By early 2012 I’d completed the bulk of the ADA implementation and fresh with it’s statistical approach on action deltas in mind realized that the problem of paranoia would be important to tackle when building a free learning intelligence that would use salience determination. If the emotional and autonomic modulations were not constrained it would be easy to build pathological minds...of the sort to make our worst Sci Fi. nightmares seem pedestrian. I talked about this fear of paranoia and explained the need for caution in approaching creating dynamic cognition that emerged self awareness (consciousness).http://sent2null.blogspot.com/2012/02/with-completion-of-ada-action-delta.html
(part 2)
:By the end of February I realized that the likely first substrate for emerging a fully dynamic cognition would be one which had sufficient sensory dimensions and autonomic drive dimensions to serve as the basis for building a salience module. The most ready such device is a smart phone and so I proposed that smart phones will be the first devices to on their own become self aware ONCE they are designed with the correct salience driven cycle.http://sent2null.blogspot.com/2012/02/when-your-smart-phone-comes-alive.html
A whole year went by as I struggled with my own survival issues before I came back to emotion as a critical salience component. I was stimulated by research which showed how emotion could be added or subtracted to memories! This was a direct confirmation of the basis of the salience theory proposed over a year before which posited that emotional and autonomic import was simply a weighting factor added to memories.http://sent2null.blogspot.com/2013/02/emotions-identity-crisis-in-our-brain.html
:5 days later I attacked head on the nonsense I’d been reading from many so called experts in the neuroscience, philosophy and machine learning space regarding weather or not consciousness was even an attribute that could emerge from a non biological substrate. I explained why this was nonsense and provided an outline of how simply adding salience modulation was all that one needed to emerge dynamic cognition (consciousness) …as it was an emergent trait from a fine grained number of very deterministic actions converging. http://sent2null.blogspot.com/2013/02/on-consciousness-there-is-no-binding.html
:A few months later in April I came across research that posited a reason for the billions of “glial” cells in the brain, cells which weren’t neurons but served specific function in the cognitive process that at the time was not known. The assertion that these cells were important to establishing “attention” made perfect sense to me as a means of controlling cognitive flow over general thought so that the switching between sensory experiences could in a way persist, this would serve as cognitive glue and thus solidify a unitary self. When pathological this subsystem could lead to autistic individuals incapable of tuning out certain types of experience or readily switched from one to another too easily, the need to simulate attentional persistence became clear to me.http://sent2null.blogspot.com/2013/04/autism-astrocytes-and-attentiona.html
:With the ADA implementation essentially complete the similarities between the ADA approach and the requirements of the Salience theory left little doubt that some extension of the ADA approach would be involved any any effort I mounted to build a dynamic cognition. Chiefly because the modeling process of AOW required establishment of relationships that were very similar to the stratification relationships that emerge in different sensory processing layers of the neocortex. I realized that AOW entities modeled biological sensory dimensions 1:1 and thus the algorithm could be the basis of the more general cortical algorithm that would be needed to create a functioning dynamic cognition cycle. Also I realized that the approach had the necessary fractal nature, hieararchical composition in being able to encode “action” across to any desired depth across a given sensory space. This way sounds could be decomposed into entities that modeled frequency, pitch, variation, harmonics and volume, images could be decomposed into floors, walls, objects in motion, objects standing still....etc. and so on across the sensory dimensions. Fractal resolution being key to building the arbitrarily deep set of nested relationships between entities in any given sensory dimension.http://sent2null.blogspot.com/2013/05/ada-on-road-to-dynamic-cognition-how-is.html
In September I came back to astrocytes and described them as a buffering system for experience in brains and that the depth of the buffering system would modulate the apparent consciousness. This was clear in the first examination of astrocytes as important to attention but specified how they were important...essentially as a queue for mixing data coming in with data being acted on in a controlled way.http://sent2null.blogspot.com/2013/09/an-engineering-analog-for-function-of.html
Then in November of 2013, I codified a set of hypothesis to form the Salience Theory of dynamic cogntion and consciousness. The key insight being the principle of cognitive equivalence in salience driven action and thought, consciousness in this theory is the same as dynamic cognition with only resolution of shifts from abstraction to abstraction (thoughts) being different between different classes of “mind”. I surmised that so long as the salience modules (emotional and autonomic import) could correctly provide feedback and feed forward to the right degrees cognitive dynamism would erupt....and be sustained...in the same way that it is sustained in an internal combustion engine when the spark plugs are fired in the correct sequence to gas injection in the cylinders.http://sent2null.blogspot.com/2013/11/salience-theory-of-dynamic-cognition.html
(part 3)
:20 days later I asserted the primary importance of one particular dimension of sensory experience over the others, that dimension being the one we have from the moment our fetuses form, somatsensory experience...the sense of touch. I asserted that cognitive complexity built around this primordial sensation and the connections built in the mind to enable embodiment. I discussed how cognition and consciousness must clearly be constructed by reference to its variable non existence at birth and slowly being built into the mind as the infant matures and learns about the world. I explained a recently published articles conclusion that it was easier for younger babies to learn various concepts than older babies in terms of the flowering of abstractions created in the mind as one pieced together a consciousness, I asserted an inverse relationship between speed of evaluation of various salience traits with number of previously gathered salience elements.http://sent2null.blogspot.com/2013/11/dynamic-cognition-in-babies-in-abstract.html
In April 2014 I focused on one of the more important autonomic driving dimensions, the need for a power source. I posited that this need would be a key attribute of dynamic cognition that exhibited sufficient apparent randomness to emerge truly novel cognitive dynamics that would be identified as being “conscious”. http://sent2null.blogspot.com/2014/04/azimo-best-and-last-of-modern-day.html
In June 2014, a paper describing the cognitive unique relationship of a set of siamese twins provided confirmation for a hypothesis that consciousness could be distributed but also be substrate dependent at the same time. Many feel that these two attributes are complementary but they are not if one thinks in terms of a salience based cognitive dynamism , sensory and memory evaluations can drive completely different sensory and action mechanics.http://sent2null.blogspot.com/2014/06/salience-theory-joined-at-mind.html
Today I present a simplified diagram showing the simple dynamic cognition cycle, I posit that any hard AI must have modular connection of the kind indicated by this graph. Feed forward and feed back happen in such a way that “action” execution can be continuously refined as new sensation triggers comparison and salience evaluation...all our hopes, dreams, thoughts and physical actions emerge from this cycle being executed and by so doing emerging our conscious self.I call this the “simple” cycle because it doesn’t describe the sub modules necessary or their self connection, for example very important modulation must be provided by autonomic and emotion salience sub modules as part of the “salience” node shown in the diagram, also the question of how different sensory dimensions (touch, sound, vision, taste...etc.) are multiplexed into this engine and further used to remodulate action to various degrees is not described. The more complex dcc diagram will be the basis of architecture for what I hope would be the first emergent self aware dynamic cognition (hard AI) on a non biological substrate. That more complex diagram is a work in process as I am still unsure of all the necessary sub element connections (I am sure I have all the modules) but it is 99% complete. I look forward to start writing code for such a cognition using the substrate of a smart phone in the next few years.In October 2014, I am not saying much still about how I would implement the comparison and salience nodes which are the meat of the difficulty of artificial cognition...building a mind. This article also addresses the idea of emergent evil AI and touches on the suggestion of mind “uploading” that some have wildly speculated about.http://sent2null.blogspot.com/2014/10/on-evil-ai-and-one-type-of-uploading.html
I redesigned the cognitive flow diagram to look more like a control systems diagram. I also explained how the Sensation, Comparison , Salience and Action nodes function in the context of dreams and what that means for any emerged AI.http://sent2null.blogspot.com/2014/10/new-cognitive-flow-diagram-and-possible.html
In 2015 I considered more deeply the importance of play in the process of emerging dynamic cognition of stable cognitive form and the ability to quickly encode reality in its dimensions of sensory capability. Ultimately I have concluded that play is akin to research coupled with drive, a reason for doing..in many cases emerging a feedback that benefits learning and thus the ultimate reason for it’s emergence is found. In creating an artificial cognition then the need to simulate the emergence of play will in its success be a sign of the correct direction being followed.http://sent2null.blogspot.com/2015/02/dynamic-cognition-on-meaning-of-play.html
The passage above taken from a Facebook note that is public that I’ve been updating with new articles in the area of research over time:
https://www.facebook.com/notes/david-saintloth/discovering-the-dynamic-cognition-cycle/10152513149708057
I think this article is correct, and it helps me to understand many of my own ideas better.
For example, it seems to me that the orthogonality thesis may well be true in principle, considered over all possible intelligent beings, but false in practice, in the sense that it may simply be unfeasible directly to program a goal like “maximize paperclips.”
A simple intuitive argument that a paperclip maximizer is simply not intelligent goes something like this. Any intelligent machine will have to understand abstract concepts, otherwise it will not be able to pass simple tests of intelligence such as conversational ability. But this means it will be capable of understanding the claim that “it would be good for you (the AI) not to make any more paperclips.” And if this claim is made by someone who has up to now made 100 billion statements to it, all of which have been verified to have at least 99.999% probability of being true, then it will almost certainly believe this statement. And in this case it will stop making paperclips, even if it was doing this before. Anything that cannot follow this simple process is just not going to be intelligent in any meaningful sense.
Of course, in principle it is easy to see that this argument cannot be conclusive. The AI could understand the claim, but simply respond “How utterly absurd!!!! There is nothing good or meaningful for me besides making paperclips!!!” But given the fact that abstract reasoners seem to deal with claims about “good” in the same way that they deal with other facts about the world, this does not seem like the way such an abstract reasoner would actually respond.
This article gives us reason to think that in practice, this simple intuitive argument is basically correct. The reason is that “maximize paperclips” is simply too complicated. It is not that human beings have complex value systems. Rather, they have an extremely simple value system, and everything else is learned. Consequently, it is reasonable to think that the most feasible AIs are also going to be machines with simple value systems, much simpler than “maximize paperclips,” and in fact it might be almost impossible to program an AI with such a goal (and much more would it be impossible to program an AI directly to “maximize human utility.”)
I believe the orthogonality thesis is probably mostly true in a theoretical sense. I thought I made it clear in the article that a ULM can have any utility function.
That being said the idea of programming in goals directly does not really apply to a ULM. You instead need to indirectly specify an initial approximate utility function and then train the ULM in just the right way. So it’s potentially much more complex than “program in the goal you want”.
However the end result is just as general. If evolution can create humans which roughly implement the goal of “be fruitful and multiply”, then we could probably create a ULM that implements the goal of “be fruitful and multiply paperclips”.
I agree that just because all utility functions are possible does not make them all equally likely.
The danger is not in paperclip maximizers, it is in simple and yet easy to specify utility functions. For example, the basic goal of “maximize knowledge” is probably much easier to specify than a human friendly utility function. Likewise the maximization of future freedom of action proposal from Wissner-Gross is pretty simple. But both probably result in very dangerous agents.
I think Ex Machina illustrated the most likely type of dangerous agent—it isn’t a paperclip maximizer. It’s more like a sociopath. A ULM with a too-simple initial utility function is likely to end up something like a sociopath.
I hope not too simple! This topic was beyond the scope of this article. If I have time in the future I will do a follow up article that focuses on the reward system, the human utility function, and neuroscience inspired value learning, and related ideas like inverse reinforcement learning.
“Be fruitful and multiply” is a subtly more complex goal than “maximize future freedom of action”. Humans need to be compelled to find suitable mates and form long lasting relationships stable enough to raise children (or get someone else to do it), etc. Humans perform these functions not because of some slow long logical reasoning from first principles. Instead the evolutionary goals are encoded into the value function directly—as that is the only practical efficient implementation. You can think of evolution having to encode it’s value function into the human brain using a small number of bits. It still ends up being more complex than the simplest viable utility functions.
This made me think. I’ve noticed that some machine learning types tend to have a tendency to dismiss MIRI’s standard “suppose we programmed an AI to build paperclips and it then proceeded to convert the world into paperclips” examples with a reaction like “duh, general AIs are not going to be programmed with goals directly in that way, these guys don’t know what they’re talking about”.
Which is fair on one hand, but also missing the point on the other hand.
It could be valuable to write a paper pointing out that sure, even if forget about that paperclipping example and instead assume a more deep learning-style AI that needs to grow and be given its goals in a more organic manner, most of the standard arguments about AI risk still hold.
Adding that to my todo-list...
Agreed that this would be valuable. I can’t measure it exactly, but I believe it took me some extra time/cognitive steps to get over the paperclip thing and realize that the more general point about human utility functions being difficult to specify is still quite true in any ML approach.
I’ve written about this before. The argument goes something like this.
RL implies self preservation, since dying prevents you from obtaining more reward. And self preservation leads to undesirable behavior.
E.g. making as many copies of yourself as possible for redundancy. Or destroying anything that has the tiniest probability of being a threat. Or trying to store as much mass and energy as possible to last against the heat death of the universe.
Or, you know, just maximizing your reward signal by wiring it that way in hardware. This would reduce your planning gradient to zero, which would suck for gradient-based planning algorithms, but there are also planning algorithms more closely tied to world-states that don’t rely on a reward gradient.
Even if the AI wires it’s reward signal to +INF, it probably still would consider time, and therefore self preservation.
Is this a mathematical argument, or a verbal argument?
Specifically, what eli_sennesh means by a “planning gradient” is that you compare a plan to alternative plans around it, and switch plans in the direction of more reward. If your reward function returns infinity for any possible plan, then you will be indifferent among all plans, and your utility function will not constrain what actions you take at all, and your behavior is ‘unspecified.’
I think you’re implicitly assuming that the reward function is housed in some other logic, and so it’s not that the AI is infinitely satisfied by every possibility, but that the AI is infinitely satisfied by continuing to exist, and thus seeks to maximize the amount of time that it exists. But if you’re going to wirehead, why would you leave this potential source for disappointment around, instead of making the entire reward logic just return “everything is as good as it could possibly be”?
Here’s one mathematical argument for it, based on the assumption that the AI can rewire its reward channel but not the whole reward/planning function: http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/ring-orseau-AGI-2011-delusion.pdf
Yes, that’s the basic problem with considering the reward signal to be a feature, to be maximized without reference to causal structure, rather than a variable internal to the world-model.
Again: that depends what planning algorithm it uses. Many reinforcement learners use planning algorithms which presume that the reward signal has no causal relationship to the world-model. Once these learners wirehead themselves, they’re effectively dead due to the AIXI Anvil-on-Head Problem, because they were programmed to assume that there’s no relationship between their physical existence and their reward signal, and they then destroyed the tenuous, data-driven correlation between the two.
I’m having a very hard time modelling how different AI types would act in extreme scenarios like that. I’m surprised there isn’t more written about this, because it seems extremely important to whether UFAI is even a threat at all. I would be very relieved if that was the case, but it doesn’t seem obvious to me.
Particularly I worry about AIs that predict future reward directly, and then just take the local action that predicts the highest future reward. Like is typically done in reinforcement learning. An example would be Deepmind’s Atari playing AI which got a lot of press.
I don’t think AIs with entire world models that use general planning algorithms would scale to real world problems.Too much irrelevant information to model, too large a search space to search.
As they train their internal model to predict what their reward will be in x time steps, and as x goes to infinity, they care more and more about self preservation. Even if they have already hijacked the reward signal completely.
Yes, a better example than Clippie is rather overdue.
But how likely are we to create a dangerous paperclipper whilst aiming for something else? How does your model accommodate single -trackedness, incorrigubility, etc.
Pretty unlikely, because a paperclipper is a relatively complex—and thus hard to specify—value function. It seems easy only when you think of explicitly programmed goals, rather than the more difficult, highly indirect route of encoding a value function into a ULM.
But to generalize your point, yes there is certainly the possibility that aiming for an externalized version of a human value shaped function could still get you something quite dangerous if you don’t get close enough. A better understanding of the neuro basis of altruism is probably important.
In particular super simple utility functions are easier to implement and thus intrinsically more likely. They also tend to be dangerous.
Could you give an example? I have never found that line of argument very convincing. We don’t all have identical value systems, so we are all near misses to each other. I don’t see why a full value system is needed anyway.
Maybe if you are building an agentive AI..
Does an oracle AI have a simple utility function? Is it dangerous?
We have some initial ideas for computable versions of curiosity and controlism (there is not a good word in english for the desire/drive to be in control). They both appear to be simple to specify. Human values are complex but they probably use something like simple curiosity and controlism heuristics as subfeatures.
So a brain-inspired approach could fail if the altruism components don’t work or become de-emphasized later. It could fail if the AI’s circle of empathy/altruism is too small or focused on say an individual (the creator, for example), and the AI then behaves oddly when they die.
At this time I am not aware of a realistic proposal for implementing altruism in a ML based AGI. Maybe it exists and just isn’t well known—if you’ve come across anything send some links.
Well, yes.
I do not believe the demand for or potential of oracle AI is remotely comparable to agentive AI. People will want agents to do their bidding, create wealth for them, help them live better, etc.
Autonomy? Arguably that’s Greek...
There is clearly a demand for agentive AI, in a sense, because people are already using agents to do their bidding, to achieve specific goals. Those qualifications are important because they distinguish a limited kind of AI, that people would want, from a more powerful kind, that they would not.
The idea of AI as “benevolent” dictator is not appealing to democritically minded types, who tend to suspect a slippery slope from benevolence to malevolence, and it is not appealing to dictator to have a superhuman rival...so who is motivated to build one?
Yudkowsky seems to think that there is a moral imperative to put an AI in charge of the world, because it would create billions of extra happy human lives, and not creating those lives is the equivalent of mass murder. That is a very unintuitive piece of reasoning, and it therefore cannot stand as a prediction of what AIs will be built, since it does not stand as a prediction about how people will reason morally.
The option of achieving safety by aiming lower...the technique that leads us to have speed limits, rather than struggling to make the faster possible car safe...is still available.
The God AI concept is related to another favourite MIRI theme, the need to instil the whole of human value into an AI, something MIRI admits would be very difficult. .
MIRI makes the methodological proposal that it simplifies the issue of friendliness or morality or safety to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics.
Not only are some human values morally relevant, than others some human values are what make humans dangerous to other humans, bordering on existential threat. I would rather not have superintelligent AIs with paranoia , supreme ambition, or tribal loyalty to other AIs in their value system.
So there are good reasons for thinking that installing subsets of human value would be both easier and safer.
Altruism, in particular is not needed for a limited agentive AI. Such AIs would perform specialised tasks, leaving it to humans to stitch the results into something that fulfils their values. We don’t want a Google car that takes us where it guesses we want to go
From section 5.1.1. of Responses to Catastrophic AGI Risk:
The weaponisation of AI has indeed already begun, so it is not a danger that needs pointing out. It suits the military to give drones, and so forth, greater autonomy, but it also suits the military to retain overall control....they are not going to build a God AI that is also a weapon, since there is no military mileagei n building a weapon that might attack you out of its own volition. So weaponised AI is limited agentive AI. Since the military want .to retain overall control, they will in effect conduct their own safety research, increasing the controlability of their systems in parallel with their increasing autonomy. MIRIs research is not very relevant to weaponised AI, because MIRI focuses on the hidden dangers of apparently benevolent AI, and on god AIs, powerful singletons.
You may be tacitly assuming that an AI is either passive, like Oracle AI , .or dangerously agentive. But we already have agentive AIs that haven’t killed us.
I am making a three way distinction between
Non agentive AI
Limited agentive AI
Maximally agentive AI, .or “God” AI.
Non agentive AI is passive, doing nothing once it has finished processing its current request. It is typified by Oracle AI. Limited agentive AI performs specific functions, and operates under effective overrides and safety protocols. (For instance, whilst it would destroy the effectiveness of automated trading software to have a human okaying each trade, it nonetheless has kill switches and sanity checks). Both are examples of Tool AI. Tool AI can be used to do dangerous things, but the responsibility ultimately falls on the tool us Maximally agentive AI is not passive by default, and has a wide range if capabilities. It may be in charge of other AIs, or have effectors that allow it to take real world actions directly. Attempts may have been made to add safety features, but their effectiveness would be in doubt...thatis just the hard problem of AI friendliness that MIRI writes so much about.
The contrary view is that there is no need to render God AIs safe technologically, because other is no incentive to build them.(Which does not mean the whole field of AI safety is pointless
ETA
On the other hand you may be distinguishing between limited and maximal agency, but arguing that there is a slippery slope leading from the one to the other. The political analogy shows that people are capable of putting a barrier across the slope: people are generally happy to give some power to some politicians, but resist moves to give all the power to one person.
On the other hand, people might be tempted to give AIs more power once they have a track record of reliability, but a track record of reliability is itself a kind of empirical safety proof.
There is a further argument to the effect that we are gradually giving more autonomy to agentive AIs (without moving entirely away from oracle AIs like Google) , but that gradual increase is being paralelled by an incremental approach to AI safety, for instance in automated trading systems, which have been given both more ability to trade without detailed oversight, and more powerful overrides. Hypothetically, increased autonomy without increased safety measures would mean increased danger, but that is not the case in reality. I am not arguing against AI danger and safety measures overall, I am arguing against a grandiose, all-or-nothing conception of AI safety and danger.
I like it.
(Replying to my own text above). On consideration this is wrong—Google is an oracle-AI more or less, and there is high demand for that. The demand for agenty AI is probably much greater, but there is still a role/demand for oracle AI and alot of other stuff in between.
Totally. I think this also goes hand in hand with understanding more about human values—how they evolved, how they are encoded, what is learned or not etc.
Of course—there are many niches for more specialized or limited agentive AI, and these designs probably don’t need altruism. That’s important more for the complex general agents, which would control/manage the specialists, narrow AIs, other software, etc.
That seems to be re introducing God AI. I think people would want to keep humans in the loop. That’s both a prediction, and a means of AI safety.
So if I spouted 100 billion true statements at you, then said, “It would be good for you to give me $100,000,” you’d pay up?
If you just said a bunch of trivial statements 1 billion times, and then demanded to give you money, it would seem extremely suspicious. It does not fit with your pattern of behavior.
If, on the other hand, you gave useful and non-obvious advice, I would do it. Because the demand to give you money wouldn’t seem any different than all the other things you told me to do that worked out.
I mean, that’s the essence of the human concept of earning trust, and betrayal.
Yes, but expecting any reasoner to develop well-grounded abstract concepts without any grounding in features and then care about them is… well, it’s not actually complete bullshit, but expecting it to actually happen relies on solving some problems I haven’t seen solved.
You could, hypothetically, just program your AI to infer “goodness” as a causal-role concept from the vast sums of data it gains about the real world and our human opinions of it, and then “maximize goodness”, formulated as another causal role. But this requires sophisticated machinery for dealing with causal-role concepts, which I haven’t seen developed to that extent in any literature yet.
Usually, reasoners develop causal-role concepts in order to explain what their feature-level concepts are doing, and thus, causal-role concepts abstracted over concepts that don’t eventually root themselves in features are usually dismissed as useless metaphysical speculation, or at least abstract wankery one doesn’t care about.
I don’t think you are responding the the correct comment. Or at least I have no idea what you are talking about.
If those 100 billion true statements were all (or even mostly) useful and better calibrated than my own priors, then I’d be likely to believe you, so yes. On the other hand, if you replace $100,000 with $100,000,000,000, I don’t think that would still hold.
I think you found an important caveat, which is that the fact that an agent will benefit from you believing a statement weakens the evidence that the statement is true, to the point that it’s literally zero for an agent that you don’t trust at all. And if an AI will have a human-like architecture, or even if not, I think that would still hold.
Yes, I would, assuming you don’t mean statements like “1+1 = 2”, but rather true statements spread over a variety of contexts such that I would reasonably believe that you would be trustworthy to that degree over random situations (and thus including such as whether I should give you money.)
(Also, the 100 billion true statements themselves would probably be much more valuable than $100,000).
According to game theory, this opens you to exploitation by an agent that wants your money for its own gain and can generate 100 billion true statements at a little cost.
You may be already doiving this, giving money to people whose claims you believe yoursel