Noosphere89 comments on I’ve updated towards AI boxing being surprisingly easy

Noosphere89 25 Dec 2022 16:55 UTC
4 points
0
STEM AI is one such plan that could be done with a secure sandbox, as long as we don’t give it data on humans or human models, or at least giving it the least amount of data that is necessary, and we can prevent escalation from sandboxing it. Thus we control the data sources.

From Evhub’s post:

STEM AI is a very simple proposal in a similar vein to microscope AI. Whereas the goal of microscope AI was to avoid the potential problems inherent in building agents, the goal of STEM AI is to avoid the potential problems inherent in modeling humans. Specifically, the idea of STEM AI is to train a model purely on abstract science, engineering, and/or mathematics problems while using transparency tools to ensure that the model isn’t thinking about anything outside its sandbox.

This approach has the potential to produce a powerful AI system—in terms of its ability to solve STEM problems—without relying on any human modeling. Not modeling humans could then have major benefits such as ensuring that the resulting model doesn’t have the ability to trick us to nearly the same extent as if it possessed complex models of human behavior. For a more thorough treatment of why avoiding human modeling could be quite valuable, see Ramana Kumar and Scott Garrabrant’s “Thoughts on Human Models.”

Evhub’s post is here:

https://www.lesswrong.com/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

And the Thoughts on Human Models post is here:

https://www.lesswrong.com/posts/BKjJJH2cRpJcAnP7T/thoughts-on-human-models
- Yair Halberstadt 25 Dec 2022 17:28 UTC
  14 points
  1
  Parent
  Note that giving abstract STEM problems is very unlikely to give zero anthropological information to an AI. The very format and range of the problems is likely to reveal information about both human technology and human psychology.
  
  Now I still agree that’s much more secure than giving it all the information it needs, but the claim of zero bits is pushing it.
- Steven Byrnes 27 Dec 2022 0:00 UTC
  2 points
  0
  Parent
  IMO neither Evan nor Scott nor anyone else has offered a plausible plan for using a STEM AI (that knows nothing about the existence of humans) to solve the big problem that someone else is going to build an unboxed non-STEM AI the next year.
  - Noosphere89 27 Dec 2022 0:13 UTC
    1 point
    0
    Parent
    The key is that my sandbox (really Daviddad’s sandbox) requires very little or no performance loss, so even selfish actors would sandbox their AIs.
    - Yair Halberstadt 27 Dec 2022 18:38 UTC
      4 points
      2
      Parent
      Until you want to use the AGI to e.g. improve medicine...