Such a sandbox seems fine for self-driving cars, but not so great for superintelligent agents. The sandbox will have limited resources that real-world agents might quickly exceed by acquiring more hardware. It would have to be much, much more realistic than a driving sim if you wanted to use it for general training of an AI that will interact with humans in very diverse ways, research physics, cause large economic disruption, etc. And if the AI itself has no plausible origin in the world, or if you leave other flaws, then sure, it might even figure out that it’s in a simulation, contaminating the experiment.
Sandboxing seems more useful for testing ideas that are well-understood enough to be inspected for success or failure, or tested without needing very good simulation of the real world. Like if you have an AI that is supposed to learn human values by doing futuristic unsupervised discovery of how the world works, and then assigns preference scores to local events by some futuristic procedure involving marked human feedback. This seems totally testable in simulation—you’ll get the wrong preferences, but might test the preference-learning method.
EDIT: It was brought to my attention that a similar sandbox sim testing idea was proposed by Chalmers in 2009 - he calls it a “Leakproof Singularity”.
I’ve tread this ground enough that I should just do a new writeup with my responses to standard objections.
Such a sandbox seems fine for self-driving cars, but not so great for superintelligent agents.
Practical AGI will first appear as sub-human animal level intelligence and human-level intelligence. Practical AGI will necessarily take the form of populations of agents due to computational constraints (costly memory transactions need to be amortized, parallel scaling constraints, etc). This is true today where you need to run at least around 100 AI instances on a single GPU at once to get good performance. This will remain true into the future—its a hard constraint from the physics of fast hardware.
Superintelligence can imply size (a big civilization), speed, or quality. All of this is under our control. We can test a smaller population in the sandbox, we can run them at manageable speed, and we control their knowledge. As far as we know the greeks had brains just as powerful as ours, but a population of a million AGIs with 2,000 year old knowledge are not that dangerous.
Obviously you don’t contain an entire superintelligent AGI civilization in the sandbox (and that would be a waste of resources regardless)! You use the sandbox to test new AGI architectures on smaller populations.
Sandboxing seems more useful for testing ideas that are well-understood enough to be inspected for success or failure, or tested without needing very good simulation of the real world
Computer graphics are advancing rapidly and will be completely revolutionized by machine learning in the decade ahead. Agents that grow up in a matrix will not be able to discern their status as easily as an agent that grew up in our world.
Sandboxing will test entire agent architectures—equivalent to DNA brain blueprints for humans—to determine if samples from those architectures have highly desirable mental properties such as altruism.
We can engineer entire world histories and scenarios to test the AGIs, and this could tie into the future of entertainment.
Remember AGI is going to be more similar to brain emulations than not—think Hansonian scenario but without the neeed for brain scanning.
Practical AGI will necessarily take the form of populations of agents due to computational constraints (costly memory transactions need to be amortized, parallel scaling constraints, etc). This is true today where you need to run at least around 100 AI instances on a single GPU at once to get good performance. This will remain true into the future—its a hard constraint from the physics of fast hardware.
I don’t know about this, but would be happy to hear more.
Superintelligence can imply size (a big civilization), speed, or quality. All of this is under our control. We can test a smaller population in the sandbox, we can run them at manageable speed, and we control their knowledge.
I don’t think the point is “controlling” these properties, I think the point is drawing conclusions about what an AI will do in the real world. Reduced speed might allow us to run “fast AIs” in simulation and draw conclusions about what they’ll do. Reduced speed might also let us run AI civilizations of large size (though it’s not obvious to me why you’d want such a thing) and draw conclusions about what they’ll do. Reducing the AI’s knowledge seems like a way to make a simulation more compuationally tractable and therefore get better predictions about what the AI will do—but it seems like a risky way that can introduce bias into a simulation.
Sandboxing will test entire agent architectures—equivalent to DNA brain blueprints for humans—to determine if samples from those architectures have highly desirable mental properties such as altruism.
My real problem is that I don’t think just testing for altruism (which I assume means altruistic behavior) is remotely good enough. If we could simulate our world out past an AI becoming more powerful than the human race, and select for altruism then, I’d be happy. But I am pretty confident that there will be big problems generalizing from a simulation to reality, if that simulation has both differences and restrictions on possible actions and possible values.
If we’re just testing a self-driving car, we can make a simulation that captures the available actions (both literal outputs and “effective actions” permitted by the dynamics) and has basically the right value function built in from the start. Additionally, self-driving cars generalize well from the model to reality. Suppose you have something unrealistic in the model (say, you have other cars follow set training trajectories rather than reacting to the actions of the car). A realistic self-driving car that does well in the simulation might be bad at some skills like negotiating for space on the road, but it won’t suddenly, say, try to use its tire tracks to spell out letters if it you put it into reality with humans.
To put what I think concretely, when exposed to a difference between training and reality, a “dumb, parametric AI” projects reality onto the space it learned in training and just keeps on plugging, making it somewhat insensitive to reality being complicated, and giving us a better idea about how it will generalize. But a “smart AI” doesn’t seem to have this property, it will learn the complications of reality that were omitted in testing, and can act very different as a result. This goes back to the problem of expanding sets of effective actions.
Such a sandbox seems fine for self-driving cars, but not so great for superintelligent agents. The sandbox will have limited resources that real-world agents might quickly exceed by acquiring more hardware. It would have to be much, much more realistic than a driving sim if you wanted to use it for general training of an AI that will interact with humans in very diverse ways, research physics, cause large economic disruption, etc. And if the AI itself has no plausible origin in the world, or if you leave other flaws, then sure, it might even figure out that it’s in a simulation, contaminating the experiment.
Sandboxing seems more useful for testing ideas that are well-understood enough to be inspected for success or failure, or tested without needing very good simulation of the real world. Like if you have an AI that is supposed to learn human values by doing futuristic unsupervised discovery of how the world works, and then assigns preference scores to local events by some futuristic procedure involving marked human feedback. This seems totally testable in simulation—you’ll get the wrong preferences, but might test the preference-learning method.
EDIT: It was brought to my attention that a similar sandbox sim testing idea was proposed by Chalmers in 2009 - he calls it a “Leakproof Singularity”.
I’ve tread this ground enough that I should just do a new writeup with my responses to standard objections.
Practical AGI will first appear as sub-human animal level intelligence and human-level intelligence. Practical AGI will necessarily take the form of populations of agents due to computational constraints (costly memory transactions need to be amortized, parallel scaling constraints, etc). This is true today where you need to run at least around 100 AI instances on a single GPU at once to get good performance. This will remain true into the future—its a hard constraint from the physics of fast hardware.
Superintelligence can imply size (a big civilization), speed, or quality. All of this is under our control. We can test a smaller population in the sandbox, we can run them at manageable speed, and we control their knowledge. As far as we know the greeks had brains just as powerful as ours, but a population of a million AGIs with 2,000 year old knowledge are not that dangerous.
Obviously you don’t contain an entire superintelligent AGI civilization in the sandbox (and that would be a waste of resources regardless)! You use the sandbox to test new AGI architectures on smaller populations.
Computer graphics are advancing rapidly and will be completely revolutionized by machine learning in the decade ahead. Agents that grow up in a matrix will not be able to discern their status as easily as an agent that grew up in our world.
Sandboxing will test entire agent architectures—equivalent to DNA brain blueprints for humans—to determine if samples from those architectures have highly desirable mental properties such as altruism.
We can engineer entire world histories and scenarios to test the AGIs, and this could tie into the future of entertainment.
Remember AGI is going to be more similar to brain emulations than not—think Hansonian scenario but without the neeed for brain scanning.
I don’t know about this, but would be happy to hear more.
I don’t think the point is “controlling” these properties, I think the point is drawing conclusions about what an AI will do in the real world. Reduced speed might allow us to run “fast AIs” in simulation and draw conclusions about what they’ll do. Reduced speed might also let us run AI civilizations of large size (though it’s not obvious to me why you’d want such a thing) and draw conclusions about what they’ll do. Reducing the AI’s knowledge seems like a way to make a simulation more compuationally tractable and therefore get better predictions about what the AI will do—but it seems like a risky way that can introduce bias into a simulation.
My real problem is that I don’t think just testing for altruism (which I assume means altruistic behavior) is remotely good enough. If we could simulate our world out past an AI becoming more powerful than the human race, and select for altruism then, I’d be happy. But I am pretty confident that there will be big problems generalizing from a simulation to reality, if that simulation has both differences and restrictions on possible actions and possible values.
If we’re just testing a self-driving car, we can make a simulation that captures the available actions (both literal outputs and “effective actions” permitted by the dynamics) and has basically the right value function built in from the start. Additionally, self-driving cars generalize well from the model to reality. Suppose you have something unrealistic in the model (say, you have other cars follow set training trajectories rather than reacting to the actions of the car). A realistic self-driving car that does well in the simulation might be bad at some skills like negotiating for space on the road, but it won’t suddenly, say, try to use its tire tracks to spell out letters if it you put it into reality with humans.
To put what I think concretely, when exposed to a difference between training and reality, a “dumb, parametric AI” projects reality onto the space it learned in training and just keeps on plugging, making it somewhat insensitive to reality being complicated, and giving us a better idea about how it will generalize. But a “smart AI” doesn’t seem to have this property, it will learn the complications of reality that were omitted in testing, and can act very different as a result. This goes back to the problem of expanding sets of effective actions.