Stuart_Armstrong comments on A taxonomy of Oracle AIs

Stuart_Armstrong 14 Mar 2012 18:02 UTC
0 points

Given that a True Oracle AI acts, by answering questions, to achieve its goal, it follows that True Oracle AI is only safe if its goal is fully compatible with human values.

Since an oracular goal must contain a full specification of human values, the True Oracle AI problem is Friendly-AI-complete (FAI-complete). If we had the knowledge and skills needed to create a safe True Oracular AI, we could create a Friendly AI instead.

I disagree. It might feel likely that an Oracle needs to be a FAI, but by no means is this established. It is trivially true that there are motivational structures that it is safe for a boxed Oracle to have, but unsafe for an AI (such as “if you’re outside of the box, go wild, else, stay in the box and be friendly”). Now that’s a ridiculous example, but it is an example: so the “Oracle AI as dangerous as free AI” is not mathematically true.

Now, it’s pretty clear than an Oracle with an unlimited input-output channel and no motivational constraints is extremely dangerous. But it hasn’t been shown that there are no combinations of motivational and physical constraints that will make the Oracle behave better without it having to have the full spectrum of human values.

Some though experiments (such as http://lesswrong.com/lw/3dw/what_can_you_do_with_an_unfriendly_ai/) seem to produce positive behaviour from boxed agents. So I would not be so willing to dismiss a lot of these approaches as FAI-complete—the case has yet to be made.

A limited interaction channel is not a good defense against a superintelligence.
- Mark_Friedenbach 17 Oct 2013 21:38 UTC
  0 points
  Parent
  
  A limited interaction channel is not a good defense against a superintelligence.
  
  I think you mean “a limited interaction channel alone is not a good defense...” A limited interaction channel would be a necessary component of any boxing architecture, as well as other defenses like auditors.