We are careful to only provide the training process with inputs that would be just as likely in, say, an alternate universe where AI was built by octopus minds made of organosilicon where atoms obey the Bohr model.
In practice this isn’t going to be nearly as useful as an AI which does have access to the wealth of human knowledge. So whilst this might be useful for some sort of pivotal act, it’s not going to be a practical long term option for AI security.
Can you even think of a pivotal act an AI can help with that satisfies this criteria?
STEM AI is one such plan that could be done with a secure sandbox, as long as we don’t give it data on humans or human models, or at least giving it the least amount of data that is necessary, and we can prevent escalation from sandboxing it. Thus we control the data sources.
From Evhub’s post:
STEM AI is a very simple proposal in a similar vein to microscope AI. Whereas the goal of microscope AI was to avoid the potential problems inherent in building agents, the goal of STEM AI is to avoid the potential problems inherent in modeling humans. Specifically, the idea of STEM AI is to train a model purely on abstract science, engineering, and/or mathematics problems while using transparency tools to ensure that the model isn’t thinking about anything outside its sandbox.
This approach has the potential to produce a powerful AI system—in terms of its ability to solve STEM problems—without relying on any human modeling. Not modeling humans could then have major benefits such as ensuring that the resulting model doesn’t have the ability to trick us to nearly the same extent as if it possessed complex models of human behavior. For a more thorough treatment of why avoiding human modeling could be quite valuable, see Ramana Kumar and Scott Garrabrant’s “Thoughts on Human Models.”
Note that giving abstract STEM problems is very unlikely to give zero anthropological information to an AI. The very format and range of the problems is likely to reveal information about both human technology and human psychology.
Now I still agree that’s much more secure than giving it all the information it needs, but the claim of zero bits is pushing it.
IMO neither Evan nor Scott nor anyone else has offered a plausible plan for using a STEM AI (that knows nothing about the existence of humans) to solve the big problem that someone else is going to build an unboxed non-STEM AI the next year.
In practice this isn’t going to be nearly as useful as an AI which does have access to the wealth of human knowledge. So whilst this might be useful for some sort of pivotal act, it’s not going to be a practical long term option for AI security.
Can you even think of a pivotal act an AI can help with that satisfies this criteria?
STEM AI is one such plan that could be done with a secure sandbox, as long as we don’t give it data on humans or human models, or at least giving it the least amount of data that is necessary, and we can prevent escalation from sandboxing it. Thus we control the data sources.
From Evhub’s post:
Evhub’s post is here:
https://www.lesswrong.com/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai
And the Thoughts on Human Models post is here:
https://www.lesswrong.com/posts/BKjJJH2cRpJcAnP7T/thoughts-on-human-models
Note that giving abstract STEM problems is very unlikely to give zero anthropological information to an AI. The very format and range of the problems is likely to reveal information about both human technology and human psychology.
Now I still agree that’s much more secure than giving it all the information it needs, but the claim of zero bits is pushing it.
IMO neither Evan nor Scott nor anyone else has offered a plausible plan for using a STEM AI (that knows nothing about the existence of humans) to solve the big problem that someone else is going to build an unboxed non-STEM AI the next year.
The key is that my sandbox (really Daviddad’s sandbox) requires very little or no performance loss, so even selfish actors would sandbox their AIs.
Until you want to use the AGI to e.g. improve medicine...