Without deliberately stacking the deck, setup a situation in which an AI has a clear role to play in a toy model. But make the toy model somewhat sloppy, and give the AI great computer power, in the hope that it will “escape” and achieve its goals in unconventional way. If it doesn’t, that useful information; if it does, that’s even more useful, and we can get some info by seeing how it did that.
Then instead of the usual “paperclip maximiser goes crazy”, we could point to this example as a canonical model of misbehaviour. Not something that is loaded with human terms and seemingly vague sentiments about superintelligences, but more like “how do you prevent the types of behaviour that agent D-f55F showed in the factorising Fibonacci number in the FCS world? After all, interacting with humans and the outside world throws up far more vulnerabilities than the specific ones D-f55F took advantage of in that problem. What are you doing to formally rule out exploitation of these vulnerabilities?”
(if situations like this have happened before, then no need to recreate them, but they should be made more prominent).
The issue here is that almost no-one other than SI sees material utilitarianism as fundamental definition of intelligence (actually there probably aren’t even any proponents of material utilitarianism as something to strive for at all). We don’t have definition of what is number of paperclips, such definition seems very difficult to create, it is actually unnecessary for using computers to aid creation of paperclips, and it is trivially obvious that material utilitarianism is dangerous; you don’t need to go around raising awareness of that among AI researchers whom aren’t even working to implement material utilitarianism. If the SI wants to be taken seriously it ought to stop defining idiosyncratic meanings to the words and then confusing mainstream meanings with their own.
Basically, SI seem to see a very dangerous way of structuring intelligence as the only way, as the very definition of intelligence; that, coupled with nobody else seeing it as the only way, doesn’t make AI research dangerous, it makes SI dangerous.
It gets truly ridiculous when the oracle is discussed.
Reasonably, if I want to make useful machine that answers question, when I ask it how to make a cake, it would determine what information I lack for making a cake, determine communication protocol, and provide that information to me. Basically, it’d be an intelligent module which I can use. I would need that functionality as part of any other system that helps make a cake. I’m not asking to be convinced to make a cake. A system that tries to convince me to make a cake would clearly be annoying. I don’t need to think science fictional thoughts as of how it would destroy the world, it suffices that it is clearly doing something unnecessary and annoying (and in addition it would need inside itself a subsystem that does what i want). When building stuff bottom up there is no danger of accidentally building an aircraft carrier when all you want is a fishing boat and when its clear that aircraft carrier makes for a very crappy fishing boat.
In SI’s view, the oracle has to set cake existence as a goal for itself (material utilitarianism), and then there is the danger that the oracle is going to manipulate me into making the cake. Or it might set me physically having inside my brain information for making cake as material goal for itself. Or something else material. Or, to quote this exact piece more directly, the predictor may want to manipulate the world (as it has material goal of predicting). This is outright ridiculous as for determining the action for influencing the world, predictor needs a predictor within itself which would not seek alteration of the world but would evaluate consequences of actions. And herein lies the other issue, the SI’s intelligence is a monolithic, ontologically basic concept, and so the statements like these do not self defeat via the argument of “okay let’s just not implement the unnecessary part of the AI that will clearly make it run amok and kill everyone, or at best make it less useful”.
Check for an AI breakout in a toy model
Without deliberately stacking the deck, setup a situation in which an AI has a clear role to play in a toy model. But make the toy model somewhat sloppy, and give the AI great computer power, in the hope that it will “escape” and achieve its goals in unconventional way. If it doesn’t, that useful information; if it does, that’s even more useful, and we can get some info by seeing how it did that.
Then instead of the usual “paperclip maximiser goes crazy”, we could point to this example as a canonical model of misbehaviour. Not something that is loaded with human terms and seemingly vague sentiments about superintelligences, but more like “how do you prevent the types of behaviour that agent D-f55F showed in the factorising Fibonacci number in the FCS world? After all, interacting with humans and the outside world throws up far more vulnerabilities than the specific ones D-f55F took advantage of in that problem. What are you doing to formally rule out exploitation of these vulnerabilities?”
(if situations like this have happened before, then no need to recreate them, but they should be made more prominent).
The issue here is that almost no-one other than SI sees material utilitarianism as fundamental definition of intelligence (actually there probably aren’t even any proponents of material utilitarianism as something to strive for at all). We don’t have definition of what is number of paperclips, such definition seems very difficult to create, it is actually unnecessary for using computers to aid creation of paperclips, and it is trivially obvious that material utilitarianism is dangerous; you don’t need to go around raising awareness of that among AI researchers whom aren’t even working to implement material utilitarianism. If the SI wants to be taken seriously it ought to stop defining idiosyncratic meanings to the words and then confusing mainstream meanings with their own.
Basically, SI seem to see a very dangerous way of structuring intelligence as the only way, as the very definition of intelligence; that, coupled with nobody else seeing it as the only way, doesn’t make AI research dangerous, it makes SI dangerous.
It gets truly ridiculous when the oracle is discussed.
Reasonably, if I want to make useful machine that answers question, when I ask it how to make a cake, it would determine what information I lack for making a cake, determine communication protocol, and provide that information to me. Basically, it’d be an intelligent module which I can use. I would need that functionality as part of any other system that helps make a cake. I’m not asking to be convinced to make a cake. A system that tries to convince me to make a cake would clearly be annoying. I don’t need to think science fictional thoughts as of how it would destroy the world, it suffices that it is clearly doing something unnecessary and annoying (and in addition it would need inside itself a subsystem that does what i want). When building stuff bottom up there is no danger of accidentally building an aircraft carrier when all you want is a fishing boat and when its clear that aircraft carrier makes for a very crappy fishing boat.
In SI’s view, the oracle has to set cake existence as a goal for itself (material utilitarianism), and then there is the danger that the oracle is going to manipulate me into making the cake. Or it might set me physically having inside my brain information for making cake as material goal for itself. Or something else material. Or, to quote this exact piece more directly, the predictor may want to manipulate the world (as it has material goal of predicting). This is outright ridiculous as for determining the action for influencing the world, predictor needs a predictor within itself which would not seek alteration of the world but would evaluate consequences of actions. And herein lies the other issue, the SI’s intelligence is a monolithic, ontologically basic concept, and so the statements like these do not self defeat via the argument of “okay let’s just not implement the unnecessary part of the AI that will clearly make it run amok and kill everyone, or at best make it less useful”.