Why not test safety long before the system is superintelligent? - say when it is a population of 100 child like AGIs. As the population grows larger and more intelligent, the safest designs are propagated and made safer.
One problem is what Bostrom would call “the treacherous turn.” When the AGI is dependent on us, satisfying us is a very good idea for it—if it’s unsatisfactory it will be deleted. Behaving nicely is so good an idea that many different goal systems will independently choose this strategy. And so the fact that an AGI appears nice is only weak statistical evidence that it would be nice if it wasn’t dependent on us, and further trials are not independent and so don’t accumulate well. This type of problem appears when the AGI develops good enough long-term planning, and has information about its creators.
Another problem is the problem of expanding action spaces. Consider an AGI that wants to gather lots of stamps (example shamelessly stolen from Computerphile video). When the AGI is childlike, its effective action space only looks like spending money to purchase stamps. As it becomes as smart as a human its actions expand—now it might perform a job to make money to buy stamps, or try to steal money to buy stamps, or purchase a printing press to make its own stamps, or all the sorts of things you might do if you really wanted stamps. Then, as it becomes superintelligent, the stamp-gathering robot will proceed to take over the world and try to terraform the entire earth into stamps. This is a problem for using experimental evidence because as the set of actions expands, so do the possible preferences over actions. Which means that there are many possible sets of preferences that might lead to altruistic behavior among weak AIs; there’s some un-eliminable error when trying to predict “many-options” behavior just from “few-options” behavior.
One problem is what Bostrom would call “the treacherous turn.” When the AGI is dependent on us, satisfying us is a very good idea for it . . .This type of problem appears when the AGI develops good enough long-term planning, and has information about its creators.
Right—and I think you are enough of an old-timer to know one of my proposals for that particular problem : sim sandboxes where we test AGIs in an oblivious sim. Ideally the AGI is not only unware of its creators, but actually is atheist and thus believes there is no creator. This can solve the problem at the fundamental level.
When I proposed this long ago the knee-jerk reaction was—but super magic woo Bayesian SI will automagically hack its way out! Which of course is ridiculous—we control the AI’s knowledge.
Today we also have early experimental confirmation of sorts in the form of the DeepMind Atari agent which grows up in an Atari world and never becomes aware of it’s true existential status. Scaling up those techniques into the future I fully expect sandbox sim testing to remain the norm.
Another problem is the problem of expanding action spaces.
I agree this is a problem in theory, but it is surmountable in practice. You need to test an action space that provides sufficient coverage for the expected lifetime and impact of the agent. This can all be accomplished in comprehensive well designed virtual reality environments. These environments are needed anyway for high speed training and all successful DL systems already use this in simple form. You can’t time accelerate the real world.
As a more real world relevant example (why is it that people here always use weird examples with staples or paperclips—what’s with the office supplies?) - consider a self driving car agent. The most advanced current open world games already have highly realistic graphics and physics—you wouldn’t need much more in that department except for more realistic traffic, pedestrian and police modelling, etc. Agents can learn to drive safely in the environment—many in parallel, and it can all run much faster than real-time.
Such a sandbox seems fine for self-driving cars, but not so great for superintelligent agents. The sandbox will have limited resources that real-world agents might quickly exceed by acquiring more hardware. It would have to be much, much more realistic than a driving sim if you wanted to use it for general training of an AI that will interact with humans in very diverse ways, research physics, cause large economic disruption, etc. And if the AI itself has no plausible origin in the world, or if you leave other flaws, then sure, it might even figure out that it’s in a simulation, contaminating the experiment.
Sandboxing seems more useful for testing ideas that are well-understood enough to be inspected for success or failure, or tested without needing very good simulation of the real world. Like if you have an AI that is supposed to learn human values by doing futuristic unsupervised discovery of how the world works, and then assigns preference scores to local events by some futuristic procedure involving marked human feedback. This seems totally testable in simulation—you’ll get the wrong preferences, but might test the preference-learning method.
EDIT: It was brought to my attention that a similar sandbox sim testing idea was proposed by Chalmers in 2009 - he calls it a “Leakproof Singularity”.
I’ve tread this ground enough that I should just do a new writeup with my responses to standard objections.
Such a sandbox seems fine for self-driving cars, but not so great for superintelligent agents.
Practical AGI will first appear as sub-human animal level intelligence and human-level intelligence. Practical AGI will necessarily take the form of populations of agents due to computational constraints (costly memory transactions need to be amortized, parallel scaling constraints, etc). This is true today where you need to run at least around 100 AI instances on a single GPU at once to get good performance. This will remain true into the future—its a hard constraint from the physics of fast hardware.
Superintelligence can imply size (a big civilization), speed, or quality. All of this is under our control. We can test a smaller population in the sandbox, we can run them at manageable speed, and we control their knowledge. As far as we know the greeks had brains just as powerful as ours, but a population of a million AGIs with 2,000 year old knowledge are not that dangerous.
Obviously you don’t contain an entire superintelligent AGI civilization in the sandbox (and that would be a waste of resources regardless)! You use the sandbox to test new AGI architectures on smaller populations.
Sandboxing seems more useful for testing ideas that are well-understood enough to be inspected for success or failure, or tested without needing very good simulation of the real world
Computer graphics are advancing rapidly and will be completely revolutionized by machine learning in the decade ahead. Agents that grow up in a matrix will not be able to discern their status as easily as an agent that grew up in our world.
Sandboxing will test entire agent architectures—equivalent to DNA brain blueprints for humans—to determine if samples from those architectures have highly desirable mental properties such as altruism.
We can engineer entire world histories and scenarios to test the AGIs, and this could tie into the future of entertainment.
Remember AGI is going to be more similar to brain emulations than not—think Hansonian scenario but without the neeed for brain scanning.
Practical AGI will necessarily take the form of populations of agents due to computational constraints (costly memory transactions need to be amortized, parallel scaling constraints, etc). This is true today where you need to run at least around 100 AI instances on a single GPU at once to get good performance. This will remain true into the future—its a hard constraint from the physics of fast hardware.
I don’t know about this, but would be happy to hear more.
Superintelligence can imply size (a big civilization), speed, or quality. All of this is under our control. We can test a smaller population in the sandbox, we can run them at manageable speed, and we control their knowledge.
I don’t think the point is “controlling” these properties, I think the point is drawing conclusions about what an AI will do in the real world. Reduced speed might allow us to run “fast AIs” in simulation and draw conclusions about what they’ll do. Reduced speed might also let us run AI civilizations of large size (though it’s not obvious to me why you’d want such a thing) and draw conclusions about what they’ll do. Reducing the AI’s knowledge seems like a way to make a simulation more compuationally tractable and therefore get better predictions about what the AI will do—but it seems like a risky way that can introduce bias into a simulation.
Sandboxing will test entire agent architectures—equivalent to DNA brain blueprints for humans—to determine if samples from those architectures have highly desirable mental properties such as altruism.
My real problem is that I don’t think just testing for altruism (which I assume means altruistic behavior) is remotely good enough. If we could simulate our world out past an AI becoming more powerful than the human race, and select for altruism then, I’d be happy. But I am pretty confident that there will be big problems generalizing from a simulation to reality, if that simulation has both differences and restrictions on possible actions and possible values.
If we’re just testing a self-driving car, we can make a simulation that captures the available actions (both literal outputs and “effective actions” permitted by the dynamics) and has basically the right value function built in from the start. Additionally, self-driving cars generalize well from the model to reality. Suppose you have something unrealistic in the model (say, you have other cars follow set training trajectories rather than reacting to the actions of the car). A realistic self-driving car that does well in the simulation might be bad at some skills like negotiating for space on the road, but it won’t suddenly, say, try to use its tire tracks to spell out letters if it you put it into reality with humans.
To put what I think concretely, when exposed to a difference between training and reality, a “dumb, parametric AI” projects reality onto the space it learned in training and just keeps on plugging, making it somewhat insensitive to reality being complicated, and giving us a better idea about how it will generalize. But a “smart AI” doesn’t seem to have this property, it will learn the complications of reality that were omitted in testing, and can act very different as a result. This goes back to the problem of expanding sets of effective actions.
One problem is what Bostrom would call “the treacherous turn.” When the AGI is dependent on us, satisfying us is a very good idea for it—if it’s unsatisfactory it will be deleted. Behaving nicely is so good an idea that many different goal systems will independently choose this strategy. And so the fact that an AGI appears nice is only weak statistical evidence that it would be nice if it wasn’t dependent on us, and further trials are not independent and so don’t accumulate well. This type of problem appears when the AGI develops good enough long-term planning, and has information about its creators.
Another problem is the problem of expanding action spaces. Consider an AGI that wants to gather lots of stamps (example shamelessly stolen from Computerphile video). When the AGI is childlike, its effective action space only looks like spending money to purchase stamps. As it becomes as smart as a human its actions expand—now it might perform a job to make money to buy stamps, or try to steal money to buy stamps, or purchase a printing press to make its own stamps, or all the sorts of things you might do if you really wanted stamps. Then, as it becomes superintelligent, the stamp-gathering robot will proceed to take over the world and try to terraform the entire earth into stamps. This is a problem for using experimental evidence because as the set of actions expands, so do the possible preferences over actions. Which means that there are many possible sets of preferences that might lead to altruistic behavior among weak AIs; there’s some un-eliminable error when trying to predict “many-options” behavior just from “few-options” behavior.
Right—and I think you are enough of an old-timer to know one of my proposals for that particular problem : sim sandboxes where we test AGIs in an oblivious sim. Ideally the AGI is not only unware of its creators, but actually is atheist and thus believes there is no creator. This can solve the problem at the fundamental level.
When I proposed this long ago the knee-jerk reaction was—but super magic woo Bayesian SI will automagically hack its way out! Which of course is ridiculous—we control the AI’s knowledge.
Today we also have early experimental confirmation of sorts in the form of the DeepMind Atari agent which grows up in an Atari world and never becomes aware of it’s true existential status. Scaling up those techniques into the future I fully expect sandbox sim testing to remain the norm.
I agree this is a problem in theory, but it is surmountable in practice. You need to test an action space that provides sufficient coverage for the expected lifetime and impact of the agent. This can all be accomplished in comprehensive well designed virtual reality environments. These environments are needed anyway for high speed training and all successful DL systems already use this in simple form. You can’t time accelerate the real world.
As a more real world relevant example (why is it that people here always use weird examples with staples or paperclips—what’s with the office supplies?) - consider a self driving car agent. The most advanced current open world games already have highly realistic graphics and physics—you wouldn’t need much more in that department except for more realistic traffic, pedestrian and police modelling, etc. Agents can learn to drive safely in the environment—many in parallel, and it can all run much faster than real-time.
Such a sandbox seems fine for self-driving cars, but not so great for superintelligent agents. The sandbox will have limited resources that real-world agents might quickly exceed by acquiring more hardware. It would have to be much, much more realistic than a driving sim if you wanted to use it for general training of an AI that will interact with humans in very diverse ways, research physics, cause large economic disruption, etc. And if the AI itself has no plausible origin in the world, or if you leave other flaws, then sure, it might even figure out that it’s in a simulation, contaminating the experiment.
Sandboxing seems more useful for testing ideas that are well-understood enough to be inspected for success or failure, or tested without needing very good simulation of the real world. Like if you have an AI that is supposed to learn human values by doing futuristic unsupervised discovery of how the world works, and then assigns preference scores to local events by some futuristic procedure involving marked human feedback. This seems totally testable in simulation—you’ll get the wrong preferences, but might test the preference-learning method.
EDIT: It was brought to my attention that a similar sandbox sim testing idea was proposed by Chalmers in 2009 - he calls it a “Leakproof Singularity”.
I’ve tread this ground enough that I should just do a new writeup with my responses to standard objections.
Practical AGI will first appear as sub-human animal level intelligence and human-level intelligence. Practical AGI will necessarily take the form of populations of agents due to computational constraints (costly memory transactions need to be amortized, parallel scaling constraints, etc). This is true today where you need to run at least around 100 AI instances on a single GPU at once to get good performance. This will remain true into the future—its a hard constraint from the physics of fast hardware.
Superintelligence can imply size (a big civilization), speed, or quality. All of this is under our control. We can test a smaller population in the sandbox, we can run them at manageable speed, and we control their knowledge. As far as we know the greeks had brains just as powerful as ours, but a population of a million AGIs with 2,000 year old knowledge are not that dangerous.
Obviously you don’t contain an entire superintelligent AGI civilization in the sandbox (and that would be a waste of resources regardless)! You use the sandbox to test new AGI architectures on smaller populations.
Computer graphics are advancing rapidly and will be completely revolutionized by machine learning in the decade ahead. Agents that grow up in a matrix will not be able to discern their status as easily as an agent that grew up in our world.
Sandboxing will test entire agent architectures—equivalent to DNA brain blueprints for humans—to determine if samples from those architectures have highly desirable mental properties such as altruism.
We can engineer entire world histories and scenarios to test the AGIs, and this could tie into the future of entertainment.
Remember AGI is going to be more similar to brain emulations than not—think Hansonian scenario but without the neeed for brain scanning.
I don’t know about this, but would be happy to hear more.
I don’t think the point is “controlling” these properties, I think the point is drawing conclusions about what an AI will do in the real world. Reduced speed might allow us to run “fast AIs” in simulation and draw conclusions about what they’ll do. Reduced speed might also let us run AI civilizations of large size (though it’s not obvious to me why you’d want such a thing) and draw conclusions about what they’ll do. Reducing the AI’s knowledge seems like a way to make a simulation more compuationally tractable and therefore get better predictions about what the AI will do—but it seems like a risky way that can introduce bias into a simulation.
My real problem is that I don’t think just testing for altruism (which I assume means altruistic behavior) is remotely good enough. If we could simulate our world out past an AI becoming more powerful than the human race, and select for altruism then, I’d be happy. But I am pretty confident that there will be big problems generalizing from a simulation to reality, if that simulation has both differences and restrictions on possible actions and possible values.
If we’re just testing a self-driving car, we can make a simulation that captures the available actions (both literal outputs and “effective actions” permitted by the dynamics) and has basically the right value function built in from the start. Additionally, self-driving cars generalize well from the model to reality. Suppose you have something unrealistic in the model (say, you have other cars follow set training trajectories rather than reacting to the actions of the car). A realistic self-driving car that does well in the simulation might be bad at some skills like negotiating for space on the road, but it won’t suddenly, say, try to use its tire tracks to spell out letters if it you put it into reality with humans.
To put what I think concretely, when exposed to a difference between training and reality, a “dumb, parametric AI” projects reality onto the space it learned in training and just keeps on plugging, making it somewhat insensitive to reality being complicated, and giving us a better idea about how it will generalize. But a “smart AI” doesn’t seem to have this property, it will learn the complications of reality that were omitted in testing, and can act very different as a result. This goes back to the problem of expanding sets of effective actions.