This is what would happen if you were magically given an extraordinarily powerful AI and then failed to aligned it,
Magically given a very powerful, unaligned, AI. (This ‘the utility function is in code, in one place, and can be changed’ assumption needs re-examination. Even if we assert it exists in there*, it might be hard to change in, say, a NN.)
* Maybe this is overgeneralizing from people, but what reason do we have to think an ‘AI’ will be really good at figuring out its utility function (so it can make changes without changing it, if it so desires). The postulate ‘it will be able to improve itself, so eventually it’ll be able to figure everything out (including how to do that)‘, seems to ignore things like ‘improvements might make it more complex and harder to do that while improving.’ Where and how do you distinguish between ‘this is my utility function’ and ‘this is a bias I have’? (How have you improved this, and your introspecting abilities? How would a NN do either of those?)
One important factor seems to be that Eliezer often imagines scenarios in which AI systems avoid making major technical contributions, or revealing the extent of their capabilities, because they are lying in wait to cause trouble later. But if we are constantly training AI systems to do things that look impressive, then SGD will be aggressively selecting against any AI systems who don’t do impressive-looking stuff. So by the time we have AI systems who can develop molecular nanotech, we will definitely have had systems that did something slightly-less-impressive-looking.
Now there’s an idea: due to competition, AIs do impressive things (which aren’t necessarily safe). An AI creates the last advance that when implemented causes a FOOM + bad stuff.
Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research),
This doesn’t necessarily require the above to be right or wrong—human level contributions (which aren’t safe) could, worst case scenario...etc.
[6.] Many of the “pivotal acts”
(Added the 6 back in when it disappeared while copying and pasting it here.)
There’s a joke about a philosopher king somewhere in there. (Ah, if only we had, an AI powerful enough to save us from AI, but still controlled by...)
I think Eliezer is probably wrong about how useful AI systems will become, including for tasks like AI alignment, before it is catastrophically dangerous.
I think others (or maybe the OP previously?) have pointed out that AI can affect the world in big ways way before ‘taking it over’. Domain limited, or ‘sub-/on par with/super-’ ‘human performance’, doesn’t necessarily matter which of those it is (though more power → more effect is the expectation). Some domains are big.
Magically given a very powerful, unaligned, AI. (This ‘the utility function is in code, in one place, and can be changed’ assumption needs re-examination. Even if we assert it exists in there*, it might be hard to change in, say, a NN.)
* Maybe this is overgeneralizing from people, but what reason do we have to think an ‘AI’ will be really good at figuring out its utility function (so it can make changes without changing it, if it so desires). The postulate ‘it will be able to improve itself, so eventually it’ll be able to figure everything out (including how to do that)‘, seems to ignore things like ‘improvements might make it more complex and harder to do that while improving.’ Where and how do you distinguish between ‘this is my utility function’ and ‘this is a bias I have’? (How have you improved this, and your introspecting abilities? How would a NN do either of those?)
Now there’s an idea: due to competition, AIs do impressive things (which aren’t necessarily safe). An AI creates the last advance that when implemented causes a FOOM + bad stuff.
This doesn’t necessarily require the above to be right or wrong—human level contributions (which aren’t safe) could, worst case scenario...etc.
(Added the 6 back in when it disappeared while copying and pasting it here.)
There’s a joke about a philosopher king somewhere in there. (Ah, if only we had, an AI powerful enough to save us from AI, but still controlled by...)
I think others (or maybe the OP previously?) have pointed out that AI can affect the world in big ways way before ‘taking it over’. Domain limited, or ‘sub-/on par with/super-’ ‘human performance’, doesn’t necessarily matter which of those it is (though more power → more effect is the expectation). Some domains are big.