For example if the top-level decision function of an AI is:
def DecideWhatToDo(self, environment):
if environment.IsUnderWater():
return actions.SELF_DESTRUCT
else:
return self.EmergentComplexStochasticDecisionFunction(environment)
… and the AI doesn’t self-modify, then it can predict that it will decide to self destruct if it falls in the water, only by analysing the code, without running it (also assuming, of course, that it is good enough at code analysis).
Of course, you can imagine AIs that can’t predict any of it’s decisions, and as wedfrid says, in most non-trivial cases, most probably wouldn’t be able to.
(This may be important, because having provable decisions in certain situations could be key to cooperation in prisonner’s-dilemma-type situations)
Of course that is predictable, but that code wouldn’t exist in any intelligent program, or at least it isn’t an intelligent action; predicting it is like predicting that I’ll die if my brain is crushed.
Unknowns, we’ve been over this issue before. You don’t need to engage in perfect prediction in order to be able to usefully predict. Moreover, even if you can’t predict everything you can still examine and improve specific modules. For example, if an AI has a module for factoring integers using a naive, brute-force factoring algorithm, it could examine that and decide to replace it with a quicker, more efficient module for factoring (that maybe used the number field sieve for example). It can do that even though it can’t predict the precise behavior of the module without running it.
That’s also because this is a simplified example, merely intended to provide a counter-example to your original assertion.
As I’ve stated before, no AI can predict its own decisions in that sense (i.e. in detail, before it has made them.) Knowing its source code doesn’t help; it has to run the code in order to know what result it gets.
Agreed, it isn’t an intelligent action, but if you start saying intelligent agents can only take intelligent decisions, then you’re playing No True Scotsman.
I can imagine plenty of situations where someone might want to design an agent that takes certain unintelligent decisions in certain circumstances, or an agent that self-modifies in that way. If an agent can not only make promises, but also formally prove by showing it’s own source code that those promises are binding and that it can’t change them - then it may be at an advantage for negociations and cooperation over an agent that can’t do that.
So “stupid” decisions that can be predicted by reading one’s own source code isn’t a feature that I consider unlikely in the design-space of AIs.
I would agree with that. But I would just say that the AI would experience doing those things (for example keeping such promises) as we experience reflex actions, not as decisions.
It’s like that precisely because it is easily predictable; as I said in another reply, an AI will experience its decisions as indeterminate, so anything it knows in advance in such a determinate way, will not be understood as a decision, just as I don’t decide to die if my brain is crushed, but I know that will happen. In the same way the AI will merely know that it will self-destruct if it is placed under water.
From this, it seems like your argument for why this will not appear in its decision algorithm, is simply that you have a specific definition for “decision” that requires the AI to “understand it as a decision”. I don’t know why the AI has to experience its decisions as indeterminate (indeed, that seems like a flawed design if its decisions are actually determined!).
Rather, any code that leads from inputs to a decision should be called part of the AI’s ‘decision algorithm’ regardless of how it ‘feels’. I don’t have a problem with an AI ‘merely knowing’ that it will make a certain decision. (and be careful - ‘merely’ is an imprecise weasel word)
It isn’t a flawed design because when you start running the program, it has to analyze the results of different possible actions. Yes, it is determined objectively, but it has to consider several options as possible actions nonetheless.
I’m wondering why this got downvoted—it’s true!
For example if the top-level decision function of an AI is:
… and the AI doesn’t self-modify, then it can predict that it will decide to self destruct if it falls in the water, only by analysing the code, without running it (also assuming, of course, that it is good enough at code analysis).
Of course, you can imagine AIs that can’t predict any of it’s decisions, and as wedfrid says, in most non-trivial cases, most probably wouldn’t be able to.
(This may be important, because having provable decisions in certain situations could be key to cooperation in prisonner’s-dilemma-type situations)
Of course that is predictable, but that code wouldn’t exist in any intelligent program, or at least it isn’t an intelligent action; predicting it is like predicting that I’ll die if my brain is crushed.
Unknowns, we’ve been over this issue before. You don’t need to engage in perfect prediction in order to be able to usefully predict. Moreover, even if you can’t predict everything you can still examine and improve specific modules. For example, if an AI has a module for factoring integers using a naive, brute-force factoring algorithm, it could examine that and decide to replace it with a quicker, more efficient module for factoring (that maybe used the number field sieve for example). It can do that even though it can’t predict the precise behavior of the module without running it.
I certainly agree that an AI can predict some aspects of its behavior.
That’s also because this is a simplified example, merely intended to provide a counter-example to your original assertion.
Agreed, it isn’t an intelligent action, but if you start saying intelligent agents can only take intelligent decisions, then you’re playing No True Scotsman.
I can imagine plenty of situations where someone might want to design an agent that takes certain unintelligent decisions in certain circumstances, or an agent that self-modifies in that way. If an agent can not only make promises, but also formally prove by showing it’s own source code that those promises are binding and that it can’t change them - then it may be at an advantage for negociations and cooperation over an agent that can’t do that.
So “stupid” decisions that can be predicted by reading one’s own source code isn’t a feature that I consider unlikely in the design-space of AIs.
I would agree with that. But I would just say that the AI would experience doing those things (for example keeping such promises) as we experience reflex actions, not as decisions.
Why not?
In what way is it like that, and how is that relevant to the question?
It’s like that precisely because it is easily predictable; as I said in another reply, an AI will experience its decisions as indeterminate, so anything it knows in advance in such a determinate way, will not be understood as a decision, just as I don’t decide to die if my brain is crushed, but I know that will happen. In the same way the AI will merely know that it will self-destruct if it is placed under water.
From this, it seems like your argument for why this will not appear in its decision algorithm, is simply that you have a specific definition for “decision” that requires the AI to “understand it as a decision”. I don’t know why the AI has to experience its decisions as indeterminate (indeed, that seems like a flawed design if its decisions are actually determined!).
Rather, any code that leads from inputs to a decision should be called part of the AI’s ‘decision algorithm’ regardless of how it ‘feels’. I don’t have a problem with an AI ‘merely knowing’ that it will make a certain decision. (and be careful - ‘merely’ is an imprecise weasel word)
It isn’t a flawed design because when you start running the program, it has to analyze the results of different possible actions. Yes, it is determined objectively, but it has to consider several options as possible actions nonetheless.