Here’s a distinction you could make: an AI is self-modifying if it is effectively capable of making any change to its source code at any time, and non-self-modifying if it is not. (The phrase “capable of” is vague, of course.)
I can imagine non-self-modifying AI having an advantage over self-modifying AI, because it might be possible for an NSM AI to be protected from its own stupidity, so to speak. If the AI were to believe that overwriting all of its beliefs with the digits of pi is a good idea, nothing bad would happen, because it would be unable to do that. Of course, these same restrictions that make the AI incapable of breaking itself might also make it incapable of being really smart.
I believe I’ve heard someone say that any AI capable of being really smart must be effectively self-modifying, because being really smart involves the ability to make arbitrary calculations, and if you can make arbitrary calculations, then you’re not restricted. My objection is that there’s a big difference between making arbitrary calculations and running arbitrary code; namely, the ability to run arbitrary code allows you to alter other calculations running on the same machine.
Lemme expand on my thoughts a little bit. I imagine a non-self-modifying AI to be made of three parts: a thinking algorithm, a decision algorithm, and a belief database. The thinking and decision algorithms are immutable, and the belief database is (obviously) mutable. The supergoal is coded into the decision algorithm, so it can’t be changed. (Problem: the supergoal only makes sense in the concept of certain beliefs, and beliefs are mutable.) The contents of the belief database influence the thinking algorithm’s behavior, but they don’t determine its behavior.
The ideal possibility is that we can make the following happen:
The belief database is flexible enough that it can accommodate all types of beliefs from the very beginning. (If the thinking algorithm is immutable, it can’t be updated to handle new types of beliefs.)
The thinking algorithm is sufficiently flexible that the beliefs in the belief database can lead the algorithm in the right directions, producing super-duper intelligence.
The thinking algorithm is sufficiently inflexible that the beliefs in the belief database cannot cause the algorithm to do something really bad, producing insanity.
The supergoal remains meaningful in the context of the belief database regardless of how the thinking algorithm ends up behaving.
(My ideas haven’t been taken seriously in the past, and I have no special knowledge in this area, so it’s likely that my ideas are worthless. They feel valuable to me, however.)
This point seems like an argument as an argument in favor of the relevance of the problem laid out in this post. I have other complaints with this framing of the problem, which I expect you would share.
The key distinction between this and contemporary AI is not self-modification, but wanting to have the kind of agent which can look at itself and say, “I know that as new evidence comes in I will change my beliefs. Fortunately, it looks like I’m going to make better decisions as a result” or perhaps even more optimistically “But it looks like I’m not changing them in quite the right way, and I should make this slight change.”
The usual route is to build agents which don’t reason about their own evolution over time. But for sufficiently sophisticated agents, I would expect them to have some understanding of how they will behave in the future, and to e.g. pursue more information based on the explicit belief that by acquiring that information they will enable themselves to make better decisions. This seems like it is a more robust approach to getting the “right” behavior than having an agent which e.g. takes “Information is good” as a brute fact or has a rule for action that bakes in an ad hoc approach to estimating VOI. I think we can all agree that it would not be good to build an AI which calculated the right thing to do, and then did that with probability 99% and took a random action with probability 1%.
That said, even if you are a very sophisticated reasoner, having in hand some heuristics about VOI is likely to be helpful, and if you think that those heuristics are effective you may continue to use them. I just hope that you are using them because you believe they work (e.g. because of empirical observations of them working, the belief that you were intelligently designed to make good decisions, or whatever), not because they are built into your nature.
For a somewhat contrived and practically less relevant notion of self modifying. You could regard a calculator as being self modifying, not very relevantly.
It would be useful to understand why we think a calculator doesn’t “count” as self-modification. In particular, we don’t think calculators run into the Lob obstacle, so what is the difference between calculators and AIs?
As always in such matters, think of Turing Machines. If the transition function isn’t modified, the state of the Turing Machine may change. However, it’ll always be in a internal state prespecified in its transition function, it won’t get unknown or unknowable new entries in its action table.
Universal Turing Machines are designed to change, to take their transition function from the input tape as input, a prime example of self-modification. But they as well—having read their new transition function from their input tape—will go along their business as usual without further changes to their transition function. (You can of course program them to later continue changing their action table, but the point is that such changes to its own action table—to its own behavior—are clearly delineated from just contents in its memory / work tape.)
A calculator or a non-self-modifying AI will undergo changes in its memory, but it’ll never endeavor to define new internal states, with new rules, on its own. It’ll memorize whether you’ve entered “0.7734” in its display, but it’ll only perform its usual actions on that number. A game of tetris will change what blocks it displays on your screen, but that won’t modify its rules.
There may be accidental modifications (bugs etc.) leading to unknown states and behavior, but I wouldn’t usefully call that an active act of self-modification. (It’s not a special case to guard against, other than by the usual redundancy / using checksums. But that’s no more FAI research than rather the same constraints as when working with e.g. real time or mission critical applications.)
I don’t think this is quite there. A UTM is itself a TM, and its transition function is fixed. But it emulates a TM, and it could instead emulate a TM-with-variable-transition-function, and that thing would be self-modifying in a deeper sense than an emulation of a standard TM.
But it’s still not obvious to me how to formalize this, because (among other problems) you can replace an emulated TMWVTF with an emulated UTM which in turn emulates a TMWVTF...
Any agent that takes in information about the world is implicitly self-modifying all the time.
Here’s a distinction you could make: an AI is self-modifying if it is effectively capable of making any change to its source code at any time, and non-self-modifying if it is not. (The phrase “capable of” is vague, of course.)
I can imagine non-self-modifying AI having an advantage over self-modifying AI, because it might be possible for an NSM AI to be protected from its own stupidity, so to speak. If the AI were to believe that overwriting all of its beliefs with the digits of pi is a good idea, nothing bad would happen, because it would be unable to do that. Of course, these same restrictions that make the AI incapable of breaking itself might also make it incapable of being really smart.
I believe I’ve heard someone say that any AI capable of being really smart must be effectively self-modifying, because being really smart involves the ability to make arbitrary calculations, and if you can make arbitrary calculations, then you’re not restricted. My objection is that there’s a big difference between making arbitrary calculations and running arbitrary code; namely, the ability to run arbitrary code allows you to alter other calculations running on the same machine.
Lemme expand on my thoughts a little bit. I imagine a non-self-modifying AI to be made of three parts: a thinking algorithm, a decision algorithm, and a belief database. The thinking and decision algorithms are immutable, and the belief database is (obviously) mutable. The supergoal is coded into the decision algorithm, so it can’t be changed. (Problem: the supergoal only makes sense in the concept of certain beliefs, and beliefs are mutable.) The contents of the belief database influence the thinking algorithm’s behavior, but they don’t determine its behavior.
The ideal possibility is that we can make the following happen:
The belief database is flexible enough that it can accommodate all types of beliefs from the very beginning. (If the thinking algorithm is immutable, it can’t be updated to handle new types of beliefs.)
The thinking algorithm is sufficiently flexible that the beliefs in the belief database can lead the algorithm in the right directions, producing super-duper intelligence.
The thinking algorithm is sufficiently inflexible that the beliefs in the belief database cannot cause the algorithm to do something really bad, producing insanity.
The supergoal remains meaningful in the context of the belief database regardless of how the thinking algorithm ends up behaving.
(My ideas haven’t been taken seriously in the past, and I have no special knowledge in this area, so it’s likely that my ideas are worthless. They feel valuable to me, however.)
This point seems like an argument as an argument in favor of the relevance of the problem laid out in this post. I have other complaints with this framing of the problem, which I expect you would share.
The key distinction between this and contemporary AI is not self-modification, but wanting to have the kind of agent which can look at itself and say, “I know that as new evidence comes in I will change my beliefs. Fortunately, it looks like I’m going to make better decisions as a result” or perhaps even more optimistically “But it looks like I’m not changing them in quite the right way, and I should make this slight change.”
The usual route is to build agents which don’t reason about their own evolution over time. But for sufficiently sophisticated agents, I would expect them to have some understanding of how they will behave in the future, and to e.g. pursue more information based on the explicit belief that by acquiring that information they will enable themselves to make better decisions. This seems like it is a more robust approach to getting the “right” behavior than having an agent which e.g. takes “Information is good” as a brute fact or has a rule for action that bakes in an ad hoc approach to estimating VOI. I think we can all agree that it would not be good to build an AI which calculated the right thing to do, and then did that with probability 99% and took a random action with probability 1%.
That said, even if you are a very sophisticated reasoner, having in hand some heuristics about VOI is likely to be helpful, and if you think that those heuristics are effective you may continue to use them. I just hope that you are using them because you believe they work (e.g. because of empirical observations of them working, the belief that you were intelligently designed to make good decisions, or whatever), not because they are built into your nature.
For a somewhat contrived and practically less relevant notion of self modifying. You could regard a calculator as being self modifying, not very relevantly.
It would be useful to understand why we think a calculator doesn’t “count” as self-modification. In particular, we don’t think calculators run into the Lob obstacle, so what is the difference between calculators and AIs?
As always in such matters, think of Turing Machines. If the transition function isn’t modified, the state of the Turing Machine may change. However, it’ll always be in a internal state prespecified in its transition function, it won’t get unknown or unknowable new entries in its action table.
Universal Turing Machines are designed to change, to take their transition function from the input tape as input, a prime example of self-modification. But they as well—having read their new transition function from their input tape—will go along their business as usual without further changes to their transition function. (You can of course program them to later continue changing their action table, but the point is that such changes to its own action table—to its own behavior—are clearly delineated from just contents in its memory / work tape.)
A calculator or a non-self-modifying AI will undergo changes in its memory, but it’ll never endeavor to define new internal states, with new rules, on its own. It’ll memorize whether you’ve entered “0.7734” in its display, but it’ll only perform its usual actions on that number. A game of tetris will change what blocks it displays on your screen, but that won’t modify its rules.
There may be accidental modifications (bugs etc.) leading to unknown states and behavior, but I wouldn’t usefully call that an active act of self-modification. (It’s not a special case to guard against, other than by the usual redundancy / using checksums. But that’s no more FAI research than rather the same constraints as when working with e.g. real time or mission critical applications.)
I don’t think this is quite there. A UTM is itself a TM, and its transition function is fixed. But it emulates a TM, and it could instead emulate a TM-with-variable-transition-function, and that thing would be self-modifying in a deeper sense than an emulation of a standard TM.
But it’s still not obvious to me how to formalize this, because (among other problems) you can replace an emulated TMWVTF with an emulated UTM which in turn emulates a TMWVTF...