An AI is just code: If the AI has the ability to write code it has the ability to self modify.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I’m proposing to get rid of. See my other comment.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify.
Unpack the word “itself.”
(This is basically the same response as drethelin’s, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn’t alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)
Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.
you’d have to actively stop it from doing so. An AI is just code: If the AI has the ability to write code it has the ability to self modify.
If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I’m proposing to get rid of. See my other comment.
If an AI can’t modify its own code it can just write a new AI that can.
Unpack the word “itself.”
(This is basically the same response as drethelin’s, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn’t alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)
Well, I’m imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say “the AI can’t modify itself”, what I mean more specifically is “none of the options given to the decision subroutine will be something that involves changing the AI’s code, or changing beliefs in unapproved ways”.
So perhaps “the AI could write some code” (meaning that the thinking algorithm creates a piece of code inside the belief database), but “the AI can’t replace parts of itself with that code” (meaning that the decision algorithm can’t make a decision to alter any of the AI’s subroutines or beliefs).
Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I’m assuming we’re going to first make safe AI and then let it out of the box, rather than the other way around.