This sounds really interesting and important (if true), but I have only a vague understanding of how you arrived at this conclusion. Please consider writing a post about it.
It’s not so much a conclusion as an intuition, and most of the inferences leading up to it are contained in this thread with PaulChristiano and a related discussion with Kaj Sotala.
I’m interested in IRL and I think it’s the most promising current candidate for value learning, but I must admit I haven’t read much of the relevant literature yet. Reading up on IRL and writing a discussion post on it has been on my todo list—your comment just bumped it up a bit. :)
Another related issue is the more general question of how the training data/environment determines/shapes safety issues for learning agents.
My reaction when I first came across IRL is similar to this author’s:
However, the current IRL methods are limited and cannot be used for inferring human values
because of their long list of assumptions. For instance, in most IRL methods
the environment is usually assumed to be stationary, fully observable, and some-
times known; the policy of the agent is assumed to be stationary and optimal
or near-optimal; the reward function is assumed to be stationary as well; and
the Markov property is assumed. Such assumptions are reasonable for limited
motor control tasks such as grasping and manipulation; however, if our goal is to
learn high-level human values, they become unrealistic.
But maybe it’s not a bad approach for solving a hard problem to first solve a very simplified version of it, then gradually relax the simplifying assumptions and try to build up to a solution of the full problem.
My reaction when I first came across IRL is similar to this author’s:
As a side note, that author’s attempt at value learning is likely to suffer from the same problem Christiano brought up in this thread—there is nothing to enforce that the optimization process will actually nicely separate the reward and agent functionality. Doing that requires some more complex priors and or training tricks.
The author’s critique about limiting assumptions may or may not be true, but the author only quotes a single paper from the IRL field—and its from 2000. That paper and it’s follow up both each have 500+ citations, and some of the newer work with IRL in the title is from 2008 or later. Also—most of the related research doesn’t use IRL in the title—ie “Probabilistic reasoning from observed context-aware behavior”.
But maybe it’s not a bad approach for solving a hard problem to first solve a very simplified version of it, then gradually relax the simplifying assumptions and try to build up to a solution of the full problem.
This is actually the mainline successful approach in machine learning—scaling up. MNIST is a small ‘toy’ visual learning problem, but it lead to CIFAR10/100 and eventually ImageNet. The systems that do well on ImageNet descend from the techniques that did well on MNIST decades ago.
MIRI/LW seems much more focused on starting with a top-down approach where you solve the full problem in an unrealistic model—given infinite compute—and then scale down by developing some approximation.
Compare MIRI/LW’s fascination with AIXI vs the machine learning community. Searching for “AIXI” on r/machinelearning gets a single hit vs 634 results on lesswrong. Based on #citations of around 150 or so, AIXI is a minor/average paper in ML (more minor than IRL), and doesn’t appear to have lead to great new insights in terms of fast approximations to bayesian inference (a very active field that connects mostly to ANN research).
MIRI is taking the top-down approach since that seems to be the best way to eventually obtain an AI for which you can derive theoretical guarantees. In the absence of such guarantees, we can’t be confident that an AI will behave correctly when it’s able to think of strategies or reach world states that are very far outside of its training and testing data sets. The price for pursuing such guarantees may well be slower progress in making efficient and capable AIs, with impressive and/or profitable applications, which would explain why the mainstream research community isn’t very interested in this approach.
I tend to agree with MIRI that the top-down approach is probably safest, but since it may turn out to be too slow to make any difference, we should be looking at other approaches as well. If you’re thinking about writing a post about recent progress in IRL and related ideas, I’d be very interested to see it.
MIRI is taking the top-down approach since that seems to be the best way to eventually obtain an AI for which you can derive theoretical guarantees.
I for one remain skeptical such theoretical guarantees are possible in principle for the domain of general AI. The utility of formal math towards a domain tends to vary inversely with domain complexity. For example in some cases it may be practically possible to derive formal guarantees about the full output space of a program, but not when that program is as complex as a modern video game, or let alone a human. The equivalent of theoretical guarantees may be possible/useful for something like a bridge, but less so for an airplane or a city.
For complex systems simulations are the key tool that enables predictions about future behavior.
In the absence of such guarantees, we can’t be confident that an AI will behave correctly when it’s able to think of strategies or reach world states that are very far outside of its training and testing data sets.
This indeed would be a problem if the AI’s training ever stopped, but I find this extremely unlikely. Some AI systems already learn continuously—whether using online learning directly or by just frequently patching the AI with the results of updated training data. Future AI systems will continue this trend—and learn continuously like humans.
Much depends on one’s particular models for how the future of AI will pan out. I contend that AI does not need to be perfect, just better than humans. AI drivers don’t need to make optimal driving decisions—they just need to drive better than humans. Likewise AI software engineers just need to code better than human coders, and AI AI researchers just need to do their research better than humans. And so on.
The price for pursuing such guarantees may well be slower progress in making efficient and capable AIs, with impressive and/or profitable applications, which would explain why the mainstream research community isn’t very interested in this approach.
For the record, I do believe that MIRI is/should be funded at some level—it’s sort of a moonshot, but one worth taking given the reasonable price. Mainstream opinion on the safety issue is diverse, and their are increasingly complex PR and career issues to consider. For example corporations are motivated to downplay long term existential risks, and in the future will be motivated to downplay similarity between AI and human cognition to avoid regulation.
If you’re thinking about writing a post about recent progress in IRL and related ideas, I’d be very interested to see it.
Future AI systems will continue this trend—and learn continuously like humans.
Sure, but when it comes to learning values, I see a few problems even with continuous learning:
The AI needs to know when to be uncertain about its values, and actively seek out human advice (or defer to human control) in those cases. If the AI is wrong and overconfident (like in http://www.evolvingai.org/fooling but for values instead of image classification) even once, we could be totally screwed.
On the other hand, if the AI can think much faster than a human (almost certainly the case, given how fast hardware neurons are even today), learning from humans in real time will be extremely expensive. There will be high incentive to lower the frequency of querying humans to a minimum. Those willing to take risks, or think that they have a simple utility function that the AI can learn quickly, could have a big advantage in how competitive their AIs are.
I don’t know what my own values are, especially when it comes to exotic world states that are achievable post-Singularity. (You could say that my own training set was too small. :) Ideally I’d like to train an AI to try to figure out my values the same way that I would (i.e., by doing philosophy), but that might require very different methods than for learning well-defined values. I don’t know if incremental progress in value learning could make that leap.
For complex systems simulations are the key tool that enables predictions about future behavior.
[...]
I contend that AI does not need to be perfect, just better than humans.
My point was that an AI could do well on test data, including simulations, but get tripped up at some later date (e.g., it over-confidently thinks that a certain world state would be highly desirable). Another way things could go wrong is that an AI learns wrong values, but does well in simulations because it infers that it’s being tested and tries to please the human controllers in order to be released into the real world.
I generally agree that learning values correctly will be a challenge, but it’s closely related to general AGI challenges.
I’m also reasonably optimistic that we will be able to reverse engineer the brain’s value learning mechanisms to create agents that are safer than humans. Fully explaining the reasons behind that cautious optimism would require a review of recent computational neuroscience (the LW consensus on the brain is informed primarily by a particular narrow viewpoint from ev psych and the H&B literature, and this position is in substantial disagreement with the viewpoint from comp neuroscience.)
The AI needs to know when to be uncertain about its values,
Mostly agreed. However it is not clear that actively deferring to humans is strictly necessary. In particular one route that circumvents most of these problems is testing value learning systems and architectures on a set of human-level AGIs contained to a virtual sandbox where the AGI does not know it is in a sandbox. This allows safe testing of designs to be used outside of the sandbox. The main safety control is knowledge limitation (which is something that MIRI has not considered much at all, perhaps because of their historical anti-machine learning stance).
The fooling CNN stuff does not show a particularly important failure mode for AI. These CNNs are trained only to recognize images in the sense of outputting a 10 bit label code for any input image. If you feed them a weird image, they just output the closest category. The fooling part (getting the CNN to misclassify an image) specifically requires implicitly reverse engineering the CNN and thus relies on the fact that current CNNs are naively deterministic. A CNN with some amount of random sampling based on a secure irreversible noise generator would not have this problem.
[Learning values could take too long, corps could take shortcuts.]
This could be a problem, but even today our main technique to speed up AI learning relies more on parallelization than raw serial speedup. The standard technique involves training 128 to 1024 copies of the AI in parallel, all on different data streams. The same general technique would allow an AI to learn values from large number of humans in parallel. This also happens to automatically solve some of the issues with value representativeness.
I don’t know what my own values are, especially when it comes to exotic world states that are achievable post-Singularity.
The current world is already exotic from the perspective of our recent ancestors. We already have some methods to investigate the interaction of our values with exotic future world states: namely our imagination, as realized in thought experiments and especially science fiction. AI could help us extend these powers.
My point was that an AI could do well on test data, including simulations, but get tripped up at some later date
This is just failure to generalize or overfitting, and how to avoid these problems is much of what machine learning is all about.
Another way things could go wrong is that an AI learns wrong values, but does well in simulations because it infers that it’s being tested and tries to please the human controllers in order to be released into the real world.
This failure requires a specific combination of: 1. that the AI learns a good model of the world, but 2. learns a poor model of human values, and 3. learns that it is in a sim. 4. wants to get out. 5. The operators fail to ever notice any of 2 through 4.
Is this type of failure possible? Sure. But the most secure/paranoid type of safety model I envision is largely immune to that class of failures. In the most secure model, potentially unsafe new designs are constrained to human-level intelligence and grow up in a safe VR sim (medieval or earlier knowledge-base). Designs which pass safety tests are then slowly percolated up to sims which are closer to the modern world. Each up migration step is like reincarnation—a new AI is grown from a similar seed. The final designs (seed architectures rather than individual AIs) that pass this vetting/testing process will have more evidence for safety/benevolence/altruism than humans.
Fully explaining the reasons behind that cautious optimism would require a review of recent computational neuroscience (the LW consensus on the brain is informed primarily by a particular narrow viewpoint from ev psych and the H&B literature, and this position is in substantial disagreement with the viewpoint from comp neuroscience.)
Sounds like another post to look forward to.
The current world is already exotic from the perspective of our recent ancestors.
I think we’ll need different methods to deal with future exoticness though. See this post for some of the reasons.
In the most secure model, potentially unsafe new designs are constrained to human-level intelligence and grow up in a safe VR sim (medieval or earlier knowledge-base).
Do you envision biological humans participating in the VR sim, in order to let the AI learn values from them? If so, how to handle speed differences that may be up to a factor of millions (which you previously suggested will be the case)? Only thing I can think of is to slow the AI down to human speed for the training, which might be fine if your AI group has a big lead and you know there aren’t any other AIs out there able to run at a million times human speed. Otherwise, even if you could massively parallelize the value learning and finish it in one day of real time, that could be giving a competitor a millions days of subjective time (times how many parallel copies of the AI they can spawn) to make further progress in AI design and other technologies.
The final designs (seed architectures rather than individual AIs) that pass this vetting/testing process will have more evidence for safety/benevolence/altruism than humans.
Safer than humans seems like a pretty low bar to me, given that I think most humans are terribly unsafe. :) But despite various problems I see with this approach, it may well be the best outcome that we can realistically hope for, if mainstream AI/ML continues to make progress at such a fast pace using designs that are hard to reasonable about formally.
I think we’ll need different methods to deal with future exoticness though. See this post for some of the reasons.
Perhaps. The question of uploading comes to mind as something like an ‘ontological crisis’. We start with a intuitive model of selfhood built around the concept of a single unique path extending through time, and the various uploading thought experiments upend that model. Humans (at least some) appear to be able to deal with these types of challenges given enough examples to cover the space and enough time to update models.
Do you envision biological humans participating in the VR sim, in order to let the AI learn values from them?
Of course. And eventually we can join the AIs in the VR sim more directly, or at least that’s the hope.
If so, how to handle speed differences that may be up to a factor of millions (which you previously suggested will be the case)?
Given some computing network running a big VR AI sim, in theory the compute power can be used to run N AIs in parallel or one AI N times accelerated or anything in between. In practice latency and bandwidth overhead considerations will place limits on the maximum serial speedup.
But either way the results are similar—the core problem is the total throughput of AI thought volume to human monitor thought volume. It’s essentially the student/teacher ratio problem. One human could perhaps monitor a couple dozen ‘children’ AI without sophisticated tools, or perhaps hundreds or even thousands with highly sophisticated narrow AI tools (automated thought monitors and visualizers).
I don’t expect this will be a huge issue in practice due to simple economical considerations. AGI is likely to arrive near the time the hardware cost of an AGI is similar to human salary/cost. So think of it in terms of the ratio of human teacher cost to AGI hardware cost. AGI is a no brainer investment when that cost ratio is 1:1, and just gets better over time.
The point in time at which AGI hardware costs say 1/100th of a human teacher - (say 20 cents per hour) that time is already probably well in to the singularity anyway. The current trend is steady exponential progress in driving down hypothetical AGI hardware cost. (which I estimate is vaguely around $1,000/hr today—the cost of about 1000 gpus) If that cost suddenly went down due to some new breakthrough, that would just accelerate the timeline.
Humans (at least some) appear to be able to deal with these types of challenges given enough examples to cover the space and enough time to update models.
Given some computing network running a big VR AI sim, in theory the compute power can be used to run N AIs in parallel or one AI N times accelerated or anything in between. In practice latency and bandwidth overhead considerations will place limits on the maximum serial speedup.
If you have hardware neurons running at 10^6 times biological speed (BTW, are you aware of HICANN, a chip that today implements neurons running at 10^4 faster than biological? See also this video presentation), would it make sense to implement a time-sharing system where one set of neurons is used to implement multiple AIs running at slower speed? Wouldn’t that create unnecessary communication costs (swapping AI mind states in and out of your chips) and coordination costs among the AIs?
would it make sense to implement a time-sharing system where one set of neurons is used to implement multiple AIs running at slower speed? Wouldn’t that create unnecessary communication costs
In short, If you don’t time share, then you are storing all synaptic data on the logic chip. Thus you need vastly more logic chips to simulate your model, and thus you have more communication costs.
There are a number of tradeoffs here that differ across GPUs vs neuro ASICs like HICANN or IBM TruNorth. The analog memristor approaches, if/when they work out, will have similar tradeoffs to neuro-ASICs. (for more on that and another viewpoint see this discussion with the Knowm guy )
GPUs are von neumman machines that take advantage of the 10x or more cost difference between the per transistor cost of logic vs that of memory. Logic is roughly 10x more expensive, so it makes sense to have roughly 10x more memory bits than logic bits. ie: a GPU with 5 billion transistors might have 4 gigabytes of offchip RAM.
So on the GPU (or any von neumman), typically you are always doing time-swapping: simulating some larger circuit by swapping pieces in and out of memory.
The advantage of the neuro-ASIC is energy efficiency: synapses are stored on chip, so you don’t have to pay the price of moving data which is most of the energy cost these days. The disadvantages are threefold: you lose most of your model flexibility, storing all your data on the logic chip is vastly more expensive per synapse, and you typically lose the flexibility to compress synaptic data—even basic weight sharing is no longer possible. Unfortunately these problems combine.
Lets look at some numbers. The HICANN chip has 128k synapses in 50 mm^2, and their 8-chip reticle is thus equivalent to a mid-high end GPU in die area. That’s 1 million synapses in 400 mm^2. It can update all of those synapses at about 1 mhz—which is about 1 trillion synop-hz.
A GPU using SOTA ANN simulation code can also hit about 1 trillion synop-hz, but with much more flexibility in the tradeoff between model size and speed. In particular 1 million synapses isn’t really enough—most competitive ANNS trained today are in the 1 to 10 billion synapse range—which would cost about 1000 times more for the HICANN, because it can only store 1 million synapses per chip, vs 1 billion or more for the GPU.
IBM’s truenorth can fit more synapses on a chip − 256 million on a GPU sized chip (5 billion transistors), but it runs slower, with a similar total synop-hz throughput. The GPU solutions are just far better, overall—for now.
Apparently HICANN was designed before 2008, and uses a 180nm CMOS process, whereas modern GPUs are using 28nm. It seems to me that if neuromorphic hardware catches up in terms of economy of scale and process technology, it should be far superior in cost per neural event. And if neuromorphic hardware does win, it seems that the first AGIs could have a huge amortized cost per hour of operation, and still have a lower cost per unit of cognitive work than human workers, due to running much faster than biological brains.
It seems like this GPU vs neuromorphic question could have a large impact on how the Singularity turns out, but I haven’t seen any discussion of it until now. Do you have any other thoughts or references on this topic?
Apparently HICANN was designed before 2008, and uses a 180nm CMOS process, whereas modern GPUs are using 28nm.
That’s true, but IBM’s TrueNorth is 28 nm, with about the same transistor count as a GPU. It descends from earlier research chips on old nodes that were then scaled up to new nodes. TrueNorth can fit 256 million low-bit synapses on a chip, vs 1 million for HICANN (normalized for chip area). The 28 nm process has roughly 40x the transistor density. So my default hypothesis is that if HICANN was scaled up to 28 nm it would end up similar to TrueNorth in terms of density (although TrueNorth is wierd in that it is intentionally much slower than it could be to save energy).
It seems to me that if neuromorphic hardware catches up in terms of economy of scale and process technology, it should be far superior in cost per neural event.
I expect this in the long term, but it will depend on how the end of Moore’s Law pans out. Also, current GPU code is not yet at the limits of software simulation efficiency for ANNs, and GPU hardware is still improving rapidly. It just so happens that I am working on a new type of ANN sim engine that is 10x or more faster than current SOTA for networks of interest. My approach could eventually be hardware accelerated. There are some companies already pursuing hardware acceleration of the standard algorithms—such as Nervana, targeting similar speedup but through dedicated neural asics.
One thing I can’t stress enough is the advantage of programmeable memory for storing weights—sharing and compressing weights helps solve much of the bandwidth problems the GPU would otherwise have.
It seems like this GPU vs neuromorphic question could have a large impact on how the Singularity turns out, but I haven’t seen any discussion of it until now. Do you have any other thoughts or references on this topic?
I don’t know much it really effects outcomes—whether one uses clever hardware or clever software, the brain is probably near or on the pareto surface for statistical inference energy efficiency, and we will probably get close in the near future.
It’s not so much a conclusion as an intuition, and most of the inferences leading up to it are contained in this thread with PaulChristiano and a related discussion with Kaj Sotala.
I’m interested in IRL and I think it’s the most promising current candidate for value learning, but I must admit I haven’t read much of the relevant literature yet. Reading up on IRL and writing a discussion post on it has been on my todo list—your comment just bumped it up a bit. :)
Another related issue is the more general question of how the training data/environment determines/shapes safety issues for learning agents.
My reaction when I first came across IRL is similar to this author’s:
But maybe it’s not a bad approach for solving a hard problem to first solve a very simplified version of it, then gradually relax the simplifying assumptions and try to build up to a solution of the full problem.
As a side note, that author’s attempt at value learning is likely to suffer from the same problem Christiano brought up in this thread—there is nothing to enforce that the optimization process will actually nicely separate the reward and agent functionality. Doing that requires some more complex priors and or training tricks.
The author’s critique about limiting assumptions may or may not be true, but the author only quotes a single paper from the IRL field—and its from 2000. That paper and it’s follow up both each have 500+ citations, and some of the newer work with IRL in the title is from 2008 or later. Also—most of the related research doesn’t use IRL in the title—ie “Probabilistic reasoning from observed context-aware behavior”.
This is actually the mainline successful approach in machine learning—scaling up. MNIST is a small ‘toy’ visual learning problem, but it lead to CIFAR10/100 and eventually ImageNet. The systems that do well on ImageNet descend from the techniques that did well on MNIST decades ago.
MIRI/LW seems much more focused on starting with a top-down approach where you solve the full problem in an unrealistic model—given infinite compute—and then scale down by developing some approximation.
Compare MIRI/LW’s fascination with AIXI vs the machine learning community. Searching for “AIXI” on r/machinelearning gets a single hit vs 634 results on lesswrong. Based on #citations of around 150 or so, AIXI is a minor/average paper in ML (more minor than IRL), and doesn’t appear to have lead to great new insights in terms of fast approximations to bayesian inference (a very active field that connects mostly to ANN research).
MIRI is taking the top-down approach since that seems to be the best way to eventually obtain an AI for which you can derive theoretical guarantees. In the absence of such guarantees, we can’t be confident that an AI will behave correctly when it’s able to think of strategies or reach world states that are very far outside of its training and testing data sets. The price for pursuing such guarantees may well be slower progress in making efficient and capable AIs, with impressive and/or profitable applications, which would explain why the mainstream research community isn’t very interested in this approach.
I tend to agree with MIRI that the top-down approach is probably safest, but since it may turn out to be too slow to make any difference, we should be looking at other approaches as well. If you’re thinking about writing a post about recent progress in IRL and related ideas, I’d be very interested to see it.
I for one remain skeptical such theoretical guarantees are possible in principle for the domain of general AI. The utility of formal math towards a domain tends to vary inversely with domain complexity. For example in some cases it may be practically possible to derive formal guarantees about the full output space of a program, but not when that program is as complex as a modern video game, or let alone a human. The equivalent of theoretical guarantees may be possible/useful for something like a bridge, but less so for an airplane or a city.
For complex systems simulations are the key tool that enables predictions about future behavior.
This indeed would be a problem if the AI’s training ever stopped, but I find this extremely unlikely. Some AI systems already learn continuously—whether using online learning directly or by just frequently patching the AI with the results of updated training data. Future AI systems will continue this trend—and learn continuously like humans.
Much depends on one’s particular models for how the future of AI will pan out. I contend that AI does not need to be perfect, just better than humans. AI drivers don’t need to make optimal driving decisions—they just need to drive better than humans. Likewise AI software engineers just need to code better than human coders, and AI AI researchers just need to do their research better than humans. And so on.
For the record, I do believe that MIRI is/should be funded at some level—it’s sort of a moonshot, but one worth taking given the reasonable price. Mainstream opinion on the safety issue is diverse, and their are increasingly complex PR and career issues to consider. For example corporations are motivated to downplay long term existential risks, and in the future will be motivated to downplay similarity between AI and human cognition to avoid regulation.
Cool—I’m working up to it.
Sure, but when it comes to learning values, I see a few problems even with continuous learning:
The AI needs to know when to be uncertain about its values, and actively seek out human advice (or defer to human control) in those cases. If the AI is wrong and overconfident (like in http://www.evolvingai.org/fooling but for values instead of image classification) even once, we could be totally screwed.
On the other hand, if the AI can think much faster than a human (almost certainly the case, given how fast hardware neurons are even today), learning from humans in real time will be extremely expensive. There will be high incentive to lower the frequency of querying humans to a minimum. Those willing to take risks, or think that they have a simple utility function that the AI can learn quickly, could have a big advantage in how competitive their AIs are.
I don’t know what my own values are, especially when it comes to exotic world states that are achievable post-Singularity. (You could say that my own training set was too small. :) Ideally I’d like to train an AI to try to figure out my values the same way that I would (i.e., by doing philosophy), but that might require very different methods than for learning well-defined values. I don’t know if incremental progress in value learning could make that leap.
My point was that an AI could do well on test data, including simulations, but get tripped up at some later date (e.g., it over-confidently thinks that a certain world state would be highly desirable). Another way things could go wrong is that an AI learns wrong values, but does well in simulations because it infers that it’s being tested and tries to please the human controllers in order to be released into the real world.
I generally agree that learning values correctly will be a challenge, but it’s closely related to general AGI challenges.
I’m also reasonably optimistic that we will be able to reverse engineer the brain’s value learning mechanisms to create agents that are safer than humans. Fully explaining the reasons behind that cautious optimism would require a review of recent computational neuroscience (the LW consensus on the brain is informed primarily by a particular narrow viewpoint from ev psych and the H&B literature, and this position is in substantial disagreement with the viewpoint from comp neuroscience.)
Mostly agreed. However it is not clear that actively deferring to humans is strictly necessary. In particular one route that circumvents most of these problems is testing value learning systems and architectures on a set of human-level AGIs contained to a virtual sandbox where the AGI does not know it is in a sandbox. This allows safe testing of designs to be used outside of the sandbox. The main safety control is knowledge limitation (which is something that MIRI has not considered much at all, perhaps because of their historical anti-machine learning stance).
The fooling CNN stuff does not show a particularly important failure mode for AI. These CNNs are trained only to recognize images in the sense of outputting a 10 bit label code for any input image. If you feed them a weird image, they just output the closest category. The fooling part (getting the CNN to misclassify an image) specifically requires implicitly reverse engineering the CNN and thus relies on the fact that current CNNs are naively deterministic. A CNN with some amount of random sampling based on a secure irreversible noise generator would not have this problem.
This could be a problem, but even today our main technique to speed up AI learning relies more on parallelization than raw serial speedup. The standard technique involves training 128 to 1024 copies of the AI in parallel, all on different data streams. The same general technique would allow an AI to learn values from large number of humans in parallel. This also happens to automatically solve some of the issues with value representativeness.
The current world is already exotic from the perspective of our recent ancestors. We already have some methods to investigate the interaction of our values with exotic future world states: namely our imagination, as realized in thought experiments and especially science fiction. AI could help us extend these powers.
This is just failure to generalize or overfitting, and how to avoid these problems is much of what machine learning is all about.
This failure requires a specific combination of: 1. that the AI learns a good model of the world, but 2. learns a poor model of human values, and 3. learns that it is in a sim. 4. wants to get out. 5. The operators fail to ever notice any of 2 through 4.
Is this type of failure possible? Sure. But the most secure/paranoid type of safety model I envision is largely immune to that class of failures. In the most secure model, potentially unsafe new designs are constrained to human-level intelligence and grow up in a safe VR sim (medieval or earlier knowledge-base). Designs which pass safety tests are then slowly percolated up to sims which are closer to the modern world. Each up migration step is like reincarnation—a new AI is grown from a similar seed. The final designs (seed architectures rather than individual AIs) that pass this vetting/testing process will have more evidence for safety/benevolence/altruism than humans.
Sounds like another post to look forward to.
I think we’ll need different methods to deal with future exoticness though. See this post for some of the reasons.
Do you envision biological humans participating in the VR sim, in order to let the AI learn values from them? If so, how to handle speed differences that may be up to a factor of millions (which you previously suggested will be the case)? Only thing I can think of is to slow the AI down to human speed for the training, which might be fine if your AI group has a big lead and you know there aren’t any other AIs out there able to run at a million times human speed. Otherwise, even if you could massively parallelize the value learning and finish it in one day of real time, that could be giving a competitor a millions days of subjective time (times how many parallel copies of the AI they can spawn) to make further progress in AI design and other technologies.
Safer than humans seems like a pretty low bar to me, given that I think most humans are terribly unsafe. :) But despite various problems I see with this approach, it may well be the best outcome that we can realistically hope for, if mainstream AI/ML continues to make progress at such a fast pace using designs that are hard to reasonable about formally.
Perhaps. The question of uploading comes to mind as something like an ‘ontological crisis’. We start with a intuitive model of selfhood built around the concept of a single unique path extending through time, and the various uploading thought experiments upend that model. Humans (at least some) appear to be able to deal with these types of challenges given enough examples to cover the space and enough time to update models.
Of course. And eventually we can join the AIs in the VR sim more directly, or at least that’s the hope.
Given some computing network running a big VR AI sim, in theory the compute power can be used to run N AIs in parallel or one AI N times accelerated or anything in between. In practice latency and bandwidth overhead considerations will place limits on the maximum serial speedup.
But either way the results are similar—the core problem is the total throughput of AI thought volume to human monitor thought volume. It’s essentially the student/teacher ratio problem. One human could perhaps monitor a couple dozen ‘children’ AI without sophisticated tools, or perhaps hundreds or even thousands with highly sophisticated narrow AI tools (automated thought monitors and visualizers).
I don’t expect this will be a huge issue in practice due to simple economical considerations. AGI is likely to arrive near the time the hardware cost of an AGI is similar to human salary/cost. So think of it in terms of the ratio of human teacher cost to AGI hardware cost. AGI is a no brainer investment when that cost ratio is 1:1, and just gets better over time.
The point in time at which AGI hardware costs say 1/100th of a human teacher - (say 20 cents per hour) that time is already probably well in to the singularity anyway. The current trend is steady exponential progress in driving down hypothetical AGI hardware cost. (which I estimate is vaguely around $1,000/hr today—the cost of about 1000 gpus) If that cost suddenly went down due to some new breakthrough, that would just accelerate the timeline.
I don’t know how to deal with this myself, and I doubt whether people who claim to be able to deal with these scenarios are doing so correctly. I wrote about this in http://lesswrong.com/lw/g0w/beware_selective_nihilism/
If you have hardware neurons running at 10^6 times biological speed (BTW, are you aware of HICANN, a chip that today implements neurons running at 10^4 faster than biological? See also this video presentation), would it make sense to implement a time-sharing system where one set of neurons is used to implement multiple AIs running at slower speed? Wouldn’t that create unnecessary communication costs (swapping AI mind states in and out of your chips) and coordination costs among the AIs?
In short, If you don’t time share, then you are storing all synaptic data on the logic chip. Thus you need vastly more logic chips to simulate your model, and thus you have more communication costs.
There are a number of tradeoffs here that differ across GPUs vs neuro ASICs like HICANN or IBM TruNorth. The analog memristor approaches, if/when they work out, will have similar tradeoffs to neuro-ASICs. (for more on that and another viewpoint see this discussion with the Knowm guy )
GPUs are von neumman machines that take advantage of the 10x or more cost difference between the per transistor cost of logic vs that of memory. Logic is roughly 10x more expensive, so it makes sense to have roughly 10x more memory bits than logic bits. ie: a GPU with 5 billion transistors might have 4 gigabytes of offchip RAM.
So on the GPU (or any von neumman), typically you are always doing time-swapping: simulating some larger circuit by swapping pieces in and out of memory.
The advantage of the neuro-ASIC is energy efficiency: synapses are stored on chip, so you don’t have to pay the price of moving data which is most of the energy cost these days. The disadvantages are threefold: you lose most of your model flexibility, storing all your data on the logic chip is vastly more expensive per synapse, and you typically lose the flexibility to compress synaptic data—even basic weight sharing is no longer possible. Unfortunately these problems combine.
Lets look at some numbers. The HICANN chip has 128k synapses in 50 mm^2, and their 8-chip reticle is thus equivalent to a mid-high end GPU in die area. That’s 1 million synapses in 400 mm^2. It can update all of those synapses at about 1 mhz—which is about 1 trillion synop-hz.
A GPU using SOTA ANN simulation code can also hit about 1 trillion synop-hz, but with much more flexibility in the tradeoff between model size and speed. In particular 1 million synapses isn’t really enough—most competitive ANNS trained today are in the 1 to 10 billion synapse range—which would cost about 1000 times more for the HICANN, because it can only store 1 million synapses per chip, vs 1 billion or more for the GPU.
IBM’s truenorth can fit more synapses on a chip − 256 million on a GPU sized chip (5 billion transistors), but it runs slower, with a similar total synop-hz throughput. The GPU solutions are just far better, overall—for now.
Apparently HICANN was designed before 2008, and uses a 180nm CMOS process, whereas modern GPUs are using 28nm. It seems to me that if neuromorphic hardware catches up in terms of economy of scale and process technology, it should be far superior in cost per neural event. And if neuromorphic hardware does win, it seems that the first AGIs could have a huge amortized cost per hour of operation, and still have a lower cost per unit of cognitive work than human workers, due to running much faster than biological brains.
It seems like this GPU vs neuromorphic question could have a large impact on how the Singularity turns out, but I haven’t seen any discussion of it until now. Do you have any other thoughts or references on this topic?
That’s true, but IBM’s TrueNorth is 28 nm, with about the same transistor count as a GPU. It descends from earlier research chips on old nodes that were then scaled up to new nodes. TrueNorth can fit 256 million low-bit synapses on a chip, vs 1 million for HICANN (normalized for chip area). The 28 nm process has roughly 40x the transistor density. So my default hypothesis is that if HICANN was scaled up to 28 nm it would end up similar to TrueNorth in terms of density (although TrueNorth is wierd in that it is intentionally much slower than it could be to save energy).
I expect this in the long term, but it will depend on how the end of Moore’s Law pans out. Also, current GPU code is not yet at the limits of software simulation efficiency for ANNs, and GPU hardware is still improving rapidly. It just so happens that I am working on a new type of ANN sim engine that is 10x or more faster than current SOTA for networks of interest. My approach could eventually be hardware accelerated. There are some companies already pursuing hardware acceleration of the standard algorithms—such as Nervana, targeting similar speedup but through dedicated neural asics.
One thing I can’t stress enough is the advantage of programmeable memory for storing weights—sharing and compressing weights helps solve much of the bandwidth problems the GPU would otherwise have.
I don’t know much it really effects outcomes—whether one uses clever hardware or clever software, the brain is probably near or on the pareto surface for statistical inference energy efficiency, and we will probably get close in the near future.