I believe that is a misconception. Perhaps I’m not being reasonable, but I would expect the level at which you could describe such a creature in terms of “desires” to be conceptually distinct from the level at which it can operate on its own code.
This is the same old question of “free will” again. Desires don’t exist as a mechanism. They exist as an approximate model of describing the emergent behavior of intelligent agents.
You are saying that a GAI being able to alter its own “code” on the actual code-level does not imply that it is able to alter in a deliberate and conscious fashion its “code” in the human sense you describe above?
Generally GAIs are ascribed extreme powers around here—if it has low-level access to its code, then it will be able to determine how its “desires” derive from this code, and will be able to produced whatever changes it wants. Similarly, it will be able to hack human brains with equal finesse.
I am saying pretty much exactly that. To clarify further, the words “deliberate”, “conscious” and “wants” again belong to the level of emergent behavior: they can be used to describe the agent, not to explain it (what could not be explained by “the agent did X because it wanted to”?).
Let’s instead make an attempt to explain. A complete control of an agent’s own code, in the strict sense, is in contradiction of Gödel’s incompleteness theorem. Furthermore, information-theoretic considerations significantly limit the degree to which an agent can control its own code (I’m wondering if anyone has ever done the math. I expect not. I intend to look further into this). In information-theoretic terminology, the agent will be limited to typical manipulations of its own code, which will be a strict (and presumably very small) subset of all possible manipulations.
Can an agent be made more effective than humans in manipulating its own code? I have very little doubt that it can. Can it lead to agents qualitatively more intelligent than humans? Again, I believe so. But I don’t see a reason to believe that the code-rewriting ability itself can be qualitatively different than a human’s, only quantitatively so (although of course the engineering details can be much different; I’m referring to the algorithmic level here).
Generally GAIs are ascribed extreme powers around here
As you’ve probably figured out, I’m new here. I encountered this post while reading the sequences. Although I’m somewhat learned on the subject, I haven’t yet reached the part (which I trust exists) where GAI is discussed here.
On my path there, I’m actively trying to avoid a certain degree of group thinking which I detect in some of the comments here. Please take no offense, but it’s phrases like the above quote which worry me: is there really a consensus around here about such profound questions? Hopefully it’s only the terminology which is agreed upon, in which case I will learn it in time. But please, let’s make our terminology “pay rent”.
You are saying that a GAI being able to alter its own “code” on the actual code-level does not imply that it is able to alter in a deliberate and conscious fashion its “code” in the human sense you describe above?
I am saying pretty much exactly that. To clarify further, the words “deliberate”, “conscious” and “wants” again belong to the level of emergent behavior: they can be used to describe the agent, not to explain it (what could not be explained by “the agent did X because it wanted to”?).
Sure, but we could imagine an AI deciding something like “I do not want to enjoy frozen yogurt”, and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?
Let’s instead make an attempt to explain. A complete control of an agent’s own code, in the strict sense, is in contradiction of Gödel’s incompleteness theorem. Furthermore, information-theoretic considerations significantly limit the degree to which an agent can control its own code (I’m wondering if anyone has ever done the math. I expect not. I intend to look further into this). In information-theoretic terminology, the agent will be limited to typical manipulations of its own code, which will be a strict (and presumably very small) subset of all possible manipulations.
This seems trivially false—if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?
Can an agent be made more effective than humans in manipulating its own code? I have very little doubt that it can. Can it lead to agents qualitatively more intelligent than humans? Again, I believe so. But I don’t see a reason to believe that the code-rewriting ability itself can be qualitatively different than a human’s, only quantitatively so (although of course the engineering details can be much different; I’m referring to the algorithmic level here).
You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim—what are the salient features of human self-rewriting that you think must be preserved?
Generally GAIs are ascribed extreme powers around here
As you’ve probably figured out, I’m new here. I encountered this post while reading the sequences. Although I’m somewhat learned on the subject, I haven’t yet reached the part (which I trust exists) where GAI is discussed here.
On my path there, I’m actively trying to avoid a certain degree of group thinking which I detect in some of the comments here. Please take no offense, but it’s phrases like the above quote which worry me: is there really a consensus around here about such profound questions? Hopefully it’s only the terminology which is agreed upon, in which case I will learn it in time. But please, let’s make our terminology “pay rent”.
I don’t think it’s a “consensus” so much as an assumed consensus for the sake of argument. Some do believe that any hypothetical AI’s influence is practically unlimited, some agree to assume that because it’s not ruled out and is a worst-case scenario or an interesting case (see wedrifid’s comment on the grandparent (aside: not sure how unusual or nonobvious this is, but we often use familial relationships to describe the relative positions of comments, e.g. the comment I am responding to is the “parent” of this comment, the one you were responding to when you wrote it is the “grandparent”. I think that’s about as far as most users take the metaphor, though.)).
Thanks for challenging my position. This discussion is very stimulating for me!
Sure, but we could imagine an AI deciding something like “I do not want to enjoy frozen yogurt”, and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?
I’m actually having trouble imagining this without anthropomorphizing (or at least zoomorphizing) the agent. When is it appropriate to describe an artificial agent as enjoying something? Surely not when it secretes serotonin into its bloodstream and synapses?
This seems trivially false—if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?
It’s not a question of stopping it. Gödel is not giving it a stern look, saying: “you can’t alter your own code until you’ve done your homework”. It’s more that these considerations prevent the agent from being in a state where it will, in fact, alter its own code in certain ways. This claim can and should be proved mathematically, but I don’t have the resources to do that at the moment. In the meanwhile, I’d agree if you wanted to disagree.
You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim—what are the salient features of human self-rewriting that you think must be preserved?
I believe that this is likely, yes. The “salient feature” is being subject to the laws of nature, which in turn seem to be consistent with particular theories of logic and probability. The problem with such a claim is that these theories are still not fully understood.
Thanks for challenging my position. This discussion is very stimulating for me!
It’s a pleasure!
Sure, but we could imagine an AI deciding something like “I do not want to enjoy frozen yogurt”, and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?
I’m actually having trouble imagining this without anthropomorphizing (or at least zoomorphizing) the agent. When is it appropriate to describe an artificial agent as enjoying something? Surely not when it secretes serotonin into its bloodstream and synapses?
Yeah, that was sloppy of me. Leaving aside the question of when something is enjoying something, let’s take a more straightforward example: Suppose an AI were to design and implement more efficient algorithms for processing sensory stimuli? Or add a “face recognition” module when it determines that this would be useful for interacting with humans?
This seems trivially false—if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?
It’s not a question of stopping it. Gödel is not giving it a stern look, saying: “you can’t alter your own code until you’ve done your homework”. It’s more that these considerations prevent the agent from being in a state where it will, in fact, alter its own code in certain ways. This claim can and should be proved mathematically, but I don’t have the resources to do that at the moment. In the meanwhile, I’d agree if you wanted to disagree.
Hm. It seems that you should be able to write a simple program that overwrites its own code with an arbitrary value. Wouldn’t that be a counterexample?
You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim—what are the salient features of human self-rewriting that you think must be preserved?
I believe that this is likely, yes. The “salient feature” is being subject to the laws of nature, which in turn seem to be consistent with particular theories of logic and probability. The problem with such a claim is that these theories are still not fully understood.
This sounds unjustifiably broad. Certainly, human behavior is subject to these restrictions, but it is also subject to much more stringent ones—we are not able to do everything that is logically possible. Do we agree, then, that humans and artificial agents are both subject to laws forbidding logical contradictions and the like, but that artificial agents are not in principle necessarily bound by the same additional restrictions as humans?
Suppose an AI were to design and implement more efficient algorithms for processing sensory stimuli? Or add a “face recognition” module when it determines that this would be useful for interacting with humans?
The ancient Greeks have developed methods of improved memorization. It has been shown that human-trained dogs and chimps are more capable of human-face recognition than others of their kind. None of them were artificial (discounting selective breeding in dogs and Greeks).
It seems that you should be able to write a simple program that overwrites its own code with an arbitrary value. Wouldn’t that be a counterexample?
Would you consider such a machine an artificial intelligent agent? Isn’t it just a glorified printing press?
I’m not saying that some configurations of memory are physically impossible. I’m saying that intelligent agency entails typicality, and therefore, for any intelligent agent, there are some things it is extremely unlikely to do, to the point of practical impossibility.
Do we agree, then, that humans and artificial agents are both subject to laws forbidding logical contradictions and the like, but that artificial agents are not in principle necessarily bound by the same additional restrictions as humans?
I would actually argue the opposite.
Are you familiar with the claim that people are getting less intelligent since modern technology allows less intelligent people and their children to survive? (I never saw this claim discussed seriously, so I don’t know how factual it is; but the logic of it is what I’m getting at.) The idea is that people today are less constrained in their required intelligence, and therefore the typical human is becoming less intelligent.
Other claims are that activities such as browsing the internet and video gaming are changing the set of mental skills which humans are good at. We improve in tasks which we need to be good at, and give up skills which are less useful. You gave yet another example in your comment regarding face recognition.
The elasticity of biological agents is (quantitatively) limited, and improvement by evolution takes time. This is where artificial agents step in. They can be better than humans, but the typical agent will only actually be better if it has to. Generally, more intelligent agents are those which are forced to comply to tighter constraints, not looser ones.
The idea is that people today are less constrained in their required intelligence, and therefore the typical human is becoming less intelligent.
That’s an empirical inquiry, which I’m sure has been answered within some acceptable error range (it’s interesting and easy-ish to test). If you’re going to use it as evidence for your conclusion, or part of your worldview, you should really be sure that it’s true, because using “logic” that leads to empirically falsifiable claims—is essentially never fruitful.
If you’re going to use it as evidence for your conclusion, or part of your worldview, you should really be sure that it’s true
(I never saw this claim discussed seriously, so I don’t know how factual it is; but the logic of it is what I’m getting at.)
Was my disclaimer insufficient? I was using the unchecked claim to convey a piece of reasoning. The claim itself is unimportant in this context, only its reasoning that its conclusion should follow from its premise. Checking the truth of the conclusion may not be difficult, but the premise itself could be false, and I suspect that it is, and that it’s much harder to verify.
And even the reasoning, which is essentially mathematically provable, I have repeatedly urged the skeptic reader to doubt until they see a proof.
using “logic” that leads to empirically falsifiable claims—is essentially never fruitful.
Did you mean false claims? I sure do hope that my logic (without quotes) implies empirically flasifiable (but unfalsified) claims.
Any set of rules for determining validity, is useless, if even sound arguments have empirically false conclusions every now and again. So my point was, that if it is sound, but has a false conclusion, you should forget about the reasoning altogether.
And yes, I did mean “empirically falsified.” My mistake.
(edit):
Actually, it’s not a sound or unsound, or valid or invalid argument. The argument points out some pressures that should make us expect that people are getting dumber, and ignores the presence of pressures which should make us expect that we’re getting smarter. Either way, if from your “premises” you can derive too much belief for certain false claims, either you are too confident in your premises, or your rules for deriving belief are crappy, i.e., far from approximating Bayesian updating.
Above you said that you weren’t sure if the conclusion of some argument you were using was true, don’t do that. That is all the advice I wanted to give.
I’ll try to remember that, if only for the reason that some people don’t seem to understand contexts in which the truth value of a statement is unimportant.
Not at all. If you insist, let’s take it from the top:
I wanted to convey my reasoning, let’s call it R.
I quoted a claim of the form “because P is true, Q is true”, where R is essentially “if P then Q”. This was a rhetorical device, to help me convey what R is.
I indicated clearly that I don’t know whether P or Q are true. Later I said that I suspect P is false.
Note that my reasoning is, in principle, falsifiable: if P is true and Q is false, then R must be false.
While Q may be relatively easy to check, I think P is not.
I expect to have other means of proving R.
I feel that I’m allowed to focus on conveying R first, and attempting to prove or falsify it at a later date. The need to clarify my ideas helped me understand them better, in preparation of future proof.
I stated clearly and repeatedly that I’m just conveying an idea here, not providing evidence for it, and that I agree with readers who choose to doubt it until shown evidence.
Do you still think I’m at fault here?
EDIT: Your main objection to my presentation was that Q could be false. Would you like to revise that objection?
I don’t want to revise my objection, because it’s not really a material implication that you’re using. You’re using probabilistic reasoning in your argument,i.e., pointing out certain pressures that exist, which rule out certain ways that people could be getting smarter, and therefor increases our probability that people are not getting smarter. But if people are in fact getting smarter, this reasoning is either too confident in the pressures, or is using far from bayesian updating.
Either way, I feel like we took up too much space already. If you would like to continue, I would love to do so in a private message.
Suppose an AI were to design and implement more efficient algorithms for processing sensory stimuli? Or add a “face recognition” module when it determines that this would be useful for interacting with humans?
The ancient Greeks have developed methods of improved memorization. It has been shown that human-trained dogs and chimps are more capable of human-face recognition than others of their kind. None of them were artificial (discounting selective breeding in dogs and Greeks).
It seems that you should be able to write a simple program that overwrites its own code with an arbitrary value. Wouldn’t that be a counterexample?
Would you consider such a machine an artificial intelligent agent? Isn’t it just a glorified printing press?
I’m not saying that some configurations of memory are physically impossible. I’m saying that intelligent agency entails typicality, and therefore, for any intelligent agent, there are some things it is extremely unlikely to do, to the point of practical impossibility.
Certainly that doesn’t count as an intelligent agent—but a GAI with that as its only goal, for example, why would that be impossible? An AI doesn’t need to value survival.
I’d be interested in the conclusions derived about “typical” intelligences and the “forbidden actions”, but I don’t see how you have derived them.
Do we agree, then, that humans and artificial agents are both subject to laws forbidding logical contradictions and the like, but that artificial agents are not in principle necessarily bound by the same additional restrictions as humans?
I would actually argue the opposite.
Are you familiar with the claim that people are getting less intelligent since modern technology allows less intelligent people and their children to survive? (I never saw this claim discussed seriously, so I don’t know how factual it is; but the logic of it is what I’m getting at.) The idea is that people today are less constrained in their required intelligence, and therefore the typical human is becoming less intelligent.
Other claims are that activities such as browsing the internet and video gaming are changing the set of mental skills which humans are good at. We improve in tasks which we need to be good at, and give up skills which are less useful. You gave yet another example in your comment regarding face recognition.
The elasticity of biological agents is (quantitatively) limited, and improvement by evolution takes time. This is where artificial agents step in. They can be better than humans, but the typical agent will only actually be better if it has to. Generally, more intelligent agents are those which are forced to comply to tighter constraints, not looser ones.
I think we have our quantifiers mixed up? I’m saying an AI is not in principle bound by these restrictions—that is, it’s not true that all AIs must necessarily have the same restrictions on their behavior as a human. This seems fairly uncontroversial to me. I suppose the disconnect, then, is that you expect a GAI will be of a type bound by these same restrictions. But then I thought the restrictions you were talking about were “laws forbidding logical contradictions and the like”? I’m a little confused—could you clarify your position, please?
a GAI with [overwriting its own code with an arbitrary value] as its only goal, for example, why would that be impossible? An AI doesn’t need to value survival.
A GAI with the utility of burning itself? I don’t think that’s viable, no.
I’d be interested in the conclusions derived about “typical” intelligences and the “forbidden actions”, but I don’t see how you have derived them.
At the moment it’s little more than professional intuition. We also lack some necessary shared terminology. Let’s leave it at that until and unless someone formalizes and proves it, and then hopefully blogs about it.
could you clarify your position, please?
I think I’m starting to see the disconnect, and we probably don’t really disagree.
You said:
This sounds unjustifiably broad
My thinking is very broad but, from my perspective, not unjustifiably so. In my research I’m looking for mathematical formulations of intelligence in any form—biological or mechanical.
Taking a narrower viewpoint, humans “in their current form” are subject to different laws of nature than those we expect machines to be subject to. The former use organic chemistry, the latter probably electronics. The former multiply by synthesizing enormous quantities of DNA molecules, the latter could multiply by configuring solid state devices.
Do you count the more restrictive technology by which humans operate as a constraint which artificial agents may be free of?
a GAI with [overwriting its own code with an arbitrary value] as its only goal, for example, why would that be impossible? An AI doesn’t need to value survival.
A GAI with the utility of burning itself? I don’t think that’s viable, no.
What do you mean by “viable”? You think it is impossible due to Godelian concerns for there to be an intelligence that wishes to die?
As a curiosity, this sort of intelligence came up in a discussion I was having on LW recently. Someone said “why would an AI try to maximize its original utility function, instead of switching to a different / easier function?”, to which I responded “why is that the precise level at which the AI would operate, rather than either actually maximizing its utility function or deciding to hell with the whole utility thing and valuing suicide rather than maximizing functions (because it’s easy)”.
But anyway it can’t be that Godelian reasons prevent intelligences from wanting to burn themselves, because people have burned themselves.
I’d be interested in the conclusions derived about “typical” intelligences and the “forbidden actions”, but I don’t see how you have derived them.
At the moment it’s little more than professional intuition. We also lack some necessary shared terminology. Let’s leave it at that until and unless someone formalizes and proves it, and then hopefully blogs about it.
Fair enough, though for what it’s worth I have a fair background in mathematics, theoretical CS, and the like.
could you clarify your position, please?
I think I’m starting to see the disconnect, and we probably don’t really disagree.
You said:
This sounds unjustifiably broad
My thinking is very broad but, from my perspective, not unjustifiably so. In my research I’m looking for mathematical formulations of intelligence in any form—biological or mechanical.
I meant that this was a broad definition of the qualitative restrictions to human self-modification, to the extent that it would be basically impossible for something to have qualitatively different restrictions.
Taking a narrower viewpoint, humans “in their current form” are subject to different laws of nature than those we expect machines to be subject to. The former use organic chemistry, the latter probably electronics. The former multiply by synthesizing enormous quantities of DNA molecules, the latter could multiply by configuring solid state devices.
Do you count the more restrictive technology by which humans operate as a constraint which artificial agents may be free of?
Why not? Though of course it may turn out that AI is best programmed on something unlike our current computer technology.
A GAI with the utility of burning itself? I don’t think that’s viable, no.
What do you mean by “viable”?
Intelligence is expensive. More intelligence costs more to obtain and maintain. But the sentiment around here (and this time I agree) seems to be that intelligence “scales”, i.e. that it doesn’t suffer from diminishing returns in the “middle world” like most other things; hence the singularity.
For that to be true, more intelligence also has to be more rewarding. But not just in the sense of asymptotically approaching optimality. As intelligence increases, it has to constantly find new “revenue streams” for its utility. It must not saturate its utility function, in fact its utility must be insatiable in the “middle world”. A good example is curiosity, which is probably why many biological agents are curious even when it serves no other purpose.
Suicide is not such a utility function. We can increase the degree of intelligence an agent needs to have to successfully kill itself (for example, by keeping the gun away). But in the end, it’s “all or nothing”.
But anyway it can’t be that Godelian reasons prevent intelligences from wanting to burn themselves, because people have burned themselves.
Gödel’s theorem doesn’t prevent any specific thing. In this case I was referring to information-theoretic reasons. And indeed, suicide is not a typical human behavior, even without considering that some contributing factors are irrelevant for our discussion.
Do you count the more restrictive technology by which humans operate as a constraint which artificial agents may be free of?
Why not? Though of course it may turn out that AI is best programmed on something unlike our current computer technology.
In that sense, I completely agree with you. I usually don’t like making the technology distinction, because I believe there’s more important stuff going on in higher levels of abstraction. But if that’s where you’re coming from then I guess we have resolved our differences :)
It’s not a question of stopping it. Gödel is not giving it a stern look, saying: “you can’t alter your own code until you’ve done your homework”. It’s more that these considerations prevent the agent from being in a state where it will, in fact, alter its own code in certain ways. This claim can and should be proved mathematically, but I don’t have the resources to do that at the moment. In the meanwhile, I’d agree if you wanted to disagree.
I’d like to understand what you’re saying here better. An agent instantiated as a binary program can do any of the following:
Rewrite its own source code with a random binary string.
Do things until it encounters a different agent, obtain its source code, and replace its own source code with that.
It seems to me that either of these would be enough to provide “complete control” over the agent’s source code in the sense that any possible program can be obtained as a result. So you must mean something different. What is it?
Rewrite its own source code with a random binary string
This is in a sense the electronic equivalent of setting oneself on fire—replacing oneself with maximum entropy. An artificial agent is extremely unlikely to “survive” this operation.
any possible program can be obtained as a result
Any possible program could be obtained, and the huge number of possible programs should hint that most are extremely unlikely to be obtained.
I assumed we were talking about an agent that is active and kicking, and with some non-negligible chance to keep surviving. Such an agent must have a strongly non-uniform distribution over its next internal state (code included). This means that only a tiny fraction of possible programs will have any significant probability of being obtained. I believe one can give a formula for (at least an upper bound on) the expected size of this fraction (actually, the expected log size), but I also believe nobody has ever done that, so you may doubt this particular point until I prove it.
I don’t think “surviving” is a well-defined term here. Every time you self-modify, you replace yourself with a different agent, so in that sense any agent that keeps surviving is one that does not self-modify.
Obviously, we really think that sufficiently similar agents are basically the same agent. But “sufficiently similar” is vague. Can I write a program that begins by computing the cluster of all agents similar to it, and switches to the next one (lexicographically) every 24 hours? If so, then it would eventually take on all states that are still “the same agent”.
The natural objection is that there is one part of the agent’s state that is inviolate in this example: the 24-hour rotation period (if it ever self-modified to get rid of the rotation, then it would get stuck in that state forever, without “dying” in an information theoretic sense). But I’m skeptical that this limitation can be encoded mathematically.
In addition to the rotation period, the “list of sufficiently similar agents” would become effectively non-modifiable in that case. If it ever recalculated the list, starting from a different baseline or with a different standard of ‘sufficiently similar,’ it would not be rotating, but rather on a random walk through a much larger cluster of potential agent-types.
I don’t think “surviving” is a well-defined term here. Every time you self-modify, you replace yourself with a different agent, so in that sense any agent that keeps surviving is one that does not self-modify.
I placed “survive” in quotation marks to signal that I was aware of that, and that I meant “the other thing”. I didn’t realize that this was far from clear enough, sorry.
For lack of better shared terminology, what I meant by “surviving” is continuing to be executable. Self modification is not suicide, you and I are doing it all the time.
Can I write a program that begins by computing the cluster of all agents similar to it, and switches to the next one (lexicographically) every 24 hours?
No, you cannot. This function is non-computable in the Turing sense.
A computable limited version of it (whatever it is) could be possible. But this particular agent cannot modify itself “in any way it wants”, so it’s consistent with my proposition.
The natural objection is that there is one part of the agent’s state that is inviolate in this example: the 24-hour rotation period
This is a very weak limitation of the space of possible modifications. I meant a much stronger one.
But I’m skeptical that this limitation can be encoded mathematically.
This weak limitation is easy to formalize.
The stronger limitation I’m thinking of is challenging to formalize, but I’m pretty confident that it can be done.
You didn’t say; rather, you said (well, implied) that it wasn’t appropriate to describe an artificial agent as enjoying something in that case. But, OK, you’ve said now. Thanks for clarifying.
As I said, when it secretes serotonin into its bloodstream and synapses.
That strikes me as terrible definition of enjoyment—particularly because seratonin release isn’t nearly as indicative of enjoyment as popular culture would suggest. Even using dopamine would be better (but still not particularly good).
I wasn’t basing it on popular culture, but that doesn’t mean I’m not wrong.
Do you have a better suggestion?
If not, I’d ask CuSithBell to please clarify her (or his) ideas without using controversially defined terminology (which was also my sentiment before).
Generally GAIs are ascribed extreme powers around here
(Yes, and this is partly just because AIs that don’t meet a certain standard are implicitly excluded from the definition of the class being described. AIs below that critical threshold are considered boring and irrelevant for most purposes.)
Having asserted that your claim is, in fact, new information
I wouldn’t assert that. I thought I was stating the obvious.
Yes, I think I misspoke earlier, sorry. It was only “new information” in the sense that it wasn’t in that particular sentence of Eliezer’s—to anyone familiar with discussions of GAI, your assertion certainly should be obvious.
Having asserted that your claim is, in fact, new information: can you please clarify and explain why you believe that?
An advanced AI could reasonably be expected to be able to explicitly edit any part of its code however it desires. Humans are unable to do this.
I believe that is a misconception. Perhaps I’m not being reasonable, but I would expect the level at which you could describe such a creature in terms of “desires” to be conceptually distinct from the level at which it can operate on its own code.
This is the same old question of “free will” again. Desires don’t exist as a mechanism. They exist as an approximate model of describing the emergent behavior of intelligent agents.
You are saying that a GAI being able to alter its own “code” on the actual code-level does not imply that it is able to alter in a deliberate and conscious fashion its “code” in the human sense you describe above?
Generally GAIs are ascribed extreme powers around here—if it has low-level access to its code, then it will be able to determine how its “desires” derive from this code, and will be able to produced whatever changes it wants. Similarly, it will be able to hack human brains with equal finesse.
I am saying pretty much exactly that. To clarify further, the words “deliberate”, “conscious” and “wants” again belong to the level of emergent behavior: they can be used to describe the agent, not to explain it (what could not be explained by “the agent did X because it wanted to”?).
Let’s instead make an attempt to explain. A complete control of an agent’s own code, in the strict sense, is in contradiction of Gödel’s incompleteness theorem. Furthermore, information-theoretic considerations significantly limit the degree to which an agent can control its own code (I’m wondering if anyone has ever done the math. I expect not. I intend to look further into this). In information-theoretic terminology, the agent will be limited to typical manipulations of its own code, which will be a strict (and presumably very small) subset of all possible manipulations.
Can an agent be made more effective than humans in manipulating its own code? I have very little doubt that it can. Can it lead to agents qualitatively more intelligent than humans? Again, I believe so. But I don’t see a reason to believe that the code-rewriting ability itself can be qualitatively different than a human’s, only quantitatively so (although of course the engineering details can be much different; I’m referring to the algorithmic level here).
As you’ve probably figured out, I’m new here. I encountered this post while reading the sequences. Although I’m somewhat learned on the subject, I haven’t yet reached the part (which I trust exists) where GAI is discussed here.
On my path there, I’m actively trying to avoid a certain degree of group thinking which I detect in some of the comments here. Please take no offense, but it’s phrases like the above quote which worry me: is there really a consensus around here about such profound questions? Hopefully it’s only the terminology which is agreed upon, in which case I will learn it in time. But please, let’s make our terminology “pay rent”.
Sure, but we could imagine an AI deciding something like “I do not want to enjoy frozen yogurt”, and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?
This seems trivially false—if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?
You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim—what are the salient features of human self-rewriting that you think must be preserved?
I don’t think it’s a “consensus” so much as an assumed consensus for the sake of argument. Some do believe that any hypothetical AI’s influence is practically unlimited, some agree to assume that because it’s not ruled out and is a worst-case scenario or an interesting case (see wedrifid’s comment on the grandparent (aside: not sure how unusual or nonobvious this is, but we often use familial relationships to describe the relative positions of comments, e.g. the comment I am responding to is the “parent” of this comment, the one you were responding to when you wrote it is the “grandparent”. I think that’s about as far as most users take the metaphor, though.)).
Thanks for challenging my position. This discussion is very stimulating for me!
I’m actually having trouble imagining this without anthropomorphizing (or at least zoomorphizing) the agent. When is it appropriate to describe an artificial agent as enjoying something? Surely not when it secretes serotonin into its bloodstream and synapses?
It’s not a question of stopping it. Gödel is not giving it a stern look, saying: “you can’t alter your own code until you’ve done your homework”. It’s more that these considerations prevent the agent from being in a state where it will, in fact, alter its own code in certain ways. This claim can and should be proved mathematically, but I don’t have the resources to do that at the moment. In the meanwhile, I’d agree if you wanted to disagree.
I believe that this is likely, yes. The “salient feature” is being subject to the laws of nature, which in turn seem to be consistent with particular theories of logic and probability. The problem with such a claim is that these theories are still not fully understood.
It’s a pleasure!
Yeah, that was sloppy of me. Leaving aside the question of when something is enjoying something, let’s take a more straightforward example: Suppose an AI were to design and implement more efficient algorithms for processing sensory stimuli? Or add a “face recognition” module when it determines that this would be useful for interacting with humans?
Hm. It seems that you should be able to write a simple program that overwrites its own code with an arbitrary value. Wouldn’t that be a counterexample?
This sounds unjustifiably broad. Certainly, human behavior is subject to these restrictions, but it is also subject to much more stringent ones—we are not able to do everything that is logically possible. Do we agree, then, that humans and artificial agents are both subject to laws forbidding logical contradictions and the like, but that artificial agents are not in principle necessarily bound by the same additional restrictions as humans?
The ancient Greeks have developed methods of improved memorization. It has been shown that human-trained dogs and chimps are more capable of human-face recognition than others of their kind. None of them were artificial (discounting selective breeding in dogs and Greeks).
Would you consider such a machine an artificial intelligent agent? Isn’t it just a glorified printing press?
I’m not saying that some configurations of memory are physically impossible. I’m saying that intelligent agency entails typicality, and therefore, for any intelligent agent, there are some things it is extremely unlikely to do, to the point of practical impossibility.
I would actually argue the opposite.
Are you familiar with the claim that people are getting less intelligent since modern technology allows less intelligent people and their children to survive? (I never saw this claim discussed seriously, so I don’t know how factual it is; but the logic of it is what I’m getting at.) The idea is that people today are less constrained in their required intelligence, and therefore the typical human is becoming less intelligent.
Other claims are that activities such as browsing the internet and video gaming are changing the set of mental skills which humans are good at. We improve in tasks which we need to be good at, and give up skills which are less useful. You gave yet another example in your comment regarding face recognition.
The elasticity of biological agents is (quantitatively) limited, and improvement by evolution takes time. This is where artificial agents step in. They can be better than humans, but the typical agent will only actually be better if it has to. Generally, more intelligent agents are those which are forced to comply to tighter constraints, not looser ones.
That’s an empirical inquiry, which I’m sure has been answered within some acceptable error range (it’s interesting and easy-ish to test). If you’re going to use it as evidence for your conclusion, or part of your worldview, you should really be sure that it’s true, because using “logic” that leads to empirically falsifiable claims—is essentially never fruitful.
Check out Stephen Pinker for a start.
Was my disclaimer insufficient? I was using the unchecked claim to convey a piece of reasoning. The claim itself is unimportant in this context, only its reasoning that its conclusion should follow from its premise. Checking the truth of the conclusion may not be difficult, but the premise itself could be false, and I suspect that it is, and that it’s much harder to verify.
And even the reasoning, which is essentially mathematically provable, I have repeatedly urged the skeptic reader to doubt until they see a proof.
Did you mean false claims? I sure do hope that my logic (without quotes) implies empirically flasifiable (but unfalsified) claims.
Any set of rules for determining validity, is useless, if even sound arguments have empirically false conclusions every now and again. So my point was, that if it is sound, but has a false conclusion, you should forget about the reasoning altogether.
And yes, I did mean “empirically falsified.” My mistake.
(edit):
Actually, it’s not a sound or unsound, or valid or invalid argument. The argument points out some pressures that should make us expect that people are getting dumber, and ignores the presence of pressures which should make us expect that we’re getting smarter. Either way, if from your “premises” you can derive too much belief for certain false claims, either you are too confident in your premises, or your rules for deriving belief are crappy, i.e., far from approximating Bayesian updating.
That’s both obvious and irrelevant.
Are you even trying to have a discussion here? Or are you just stating obvious and irrelevant facts about rationality?
Above you said that you weren’t sure if the conclusion of some argument you were using was true, don’t do that. That is all the advice I wanted to give.
I’ll try to remember that, if only for the reason that some people don’t seem to understand contexts in which the truth value of a statement is unimportant.
and
You see no problem here?
Not at all. If you insist, let’s take it from the top:
I wanted to convey my reasoning, let’s call it R.
I quoted a claim of the form “because P is true, Q is true”, where R is essentially “if P then Q”. This was a rhetorical device, to help me convey what R is.
I indicated clearly that I don’t know whether P or Q are true. Later I said that I suspect P is false.
Note that my reasoning is, in principle, falsifiable: if P is true and Q is false, then R must be false.
While Q may be relatively easy to check, I think P is not.
I expect to have other means of proving R.
I feel that I’m allowed to focus on conveying R first, and attempting to prove or falsify it at a later date. The need to clarify my ideas helped me understand them better, in preparation of future proof.
I stated clearly and repeatedly that I’m just conveying an idea here, not providing evidence for it, and that I agree with readers who choose to doubt it until shown evidence.
Do you still think I’m at fault here?
EDIT: Your main objection to my presentation was that Q could be false. Would you like to revise that objection?
I don’t want to revise my objection, because it’s not really a material implication that you’re using. You’re using probabilistic reasoning in your argument,i.e., pointing out certain pressures that exist, which rule out certain ways that people could be getting smarter, and therefor increases our probability that people are not getting smarter. But if people are in fact getting smarter, this reasoning is either too confident in the pressures, or is using far from bayesian updating.
Either way, I feel like we took up too much space already. If you would like to continue, I would love to do so in a private message.
Certainly that doesn’t count as an intelligent agent—but a GAI with that as its only goal, for example, why would that be impossible? An AI doesn’t need to value survival.
I’d be interested in the conclusions derived about “typical” intelligences and the “forbidden actions”, but I don’t see how you have derived them.
I think we have our quantifiers mixed up? I’m saying an AI is not in principle bound by these restrictions—that is, it’s not true that all AIs must necessarily have the same restrictions on their behavior as a human. This seems fairly uncontroversial to me. I suppose the disconnect, then, is that you expect a GAI will be of a type bound by these same restrictions. But then I thought the restrictions you were talking about were “laws forbidding logical contradictions and the like”? I’m a little confused—could you clarify your position, please?
A GAI with the utility of burning itself? I don’t think that’s viable, no.
At the moment it’s little more than professional intuition. We also lack some necessary shared terminology. Let’s leave it at that until and unless someone formalizes and proves it, and then hopefully blogs about it.
I think I’m starting to see the disconnect, and we probably don’t really disagree.
You said:
My thinking is very broad but, from my perspective, not unjustifiably so. In my research I’m looking for mathematical formulations of intelligence in any form—biological or mechanical.
Taking a narrower viewpoint, humans “in their current form” are subject to different laws of nature than those we expect machines to be subject to. The former use organic chemistry, the latter probably electronics. The former multiply by synthesizing enormous quantities of DNA molecules, the latter could multiply by configuring solid state devices.
Do you count the more restrictive technology by which humans operate as a constraint which artificial agents may be free of?
What do you mean by “viable”? You think it is impossible due to Godelian concerns for there to be an intelligence that wishes to die?
As a curiosity, this sort of intelligence came up in a discussion I was having on LW recently. Someone said “why would an AI try to maximize its original utility function, instead of switching to a different / easier function?”, to which I responded “why is that the precise level at which the AI would operate, rather than either actually maximizing its utility function or deciding to hell with the whole utility thing and valuing suicide rather than maximizing functions (because it’s easy)”.
But anyway it can’t be that Godelian reasons prevent intelligences from wanting to burn themselves, because people have burned themselves.
Fair enough, though for what it’s worth I have a fair background in mathematics, theoretical CS, and the like.
I meant that this was a broad definition of the qualitative restrictions to human self-modification, to the extent that it would be basically impossible for something to have qualitatively different restrictions.
Why not? Though of course it may turn out that AI is best programmed on something unlike our current computer technology.
Intelligence is expensive. More intelligence costs more to obtain and maintain. But the sentiment around here (and this time I agree) seems to be that intelligence “scales”, i.e. that it doesn’t suffer from diminishing returns in the “middle world” like most other things; hence the singularity.
For that to be true, more intelligence also has to be more rewarding. But not just in the sense of asymptotically approaching optimality. As intelligence increases, it has to constantly find new “revenue streams” for its utility. It must not saturate its utility function, in fact its utility must be insatiable in the “middle world”. A good example is curiosity, which is probably why many biological agents are curious even when it serves no other purpose.
Suicide is not such a utility function. We can increase the degree of intelligence an agent needs to have to successfully kill itself (for example, by keeping the gun away). But in the end, it’s “all or nothing”.
Gödel’s theorem doesn’t prevent any specific thing. In this case I was referring to information-theoretic reasons. And indeed, suicide is not a typical human behavior, even without considering that some contributing factors are irrelevant for our discussion.
In that sense, I completely agree with you. I usually don’t like making the technology distinction, because I believe there’s more important stuff going on in higher levels of abstraction. But if that’s where you’re coming from then I guess we have resolved our differences :)
I’d like to understand what you’re saying here better. An agent instantiated as a binary program can do any of the following:
Rewrite its own source code with a random binary string.
Do things until it encounters a different agent, obtain its source code, and replace its own source code with that.
It seems to me that either of these would be enough to provide “complete control” over the agent’s source code in the sense that any possible program can be obtained as a result. So you must mean something different. What is it?
This is in a sense the electronic equivalent of setting oneself on fire—replacing oneself with maximum entropy. An artificial agent is extremely unlikely to “survive” this operation.
Any possible program could be obtained, and the huge number of possible programs should hint that most are extremely unlikely to be obtained.
I assumed we were talking about an agent that is active and kicking, and with some non-negligible chance to keep surviving. Such an agent must have a strongly non-uniform distribution over its next internal state (code included). This means that only a tiny fraction of possible programs will have any significant probability of being obtained. I believe one can give a formula for (at least an upper bound on) the expected size of this fraction (actually, the expected log size), but I also believe nobody has ever done that, so you may doubt this particular point until I prove it.
I don’t think “surviving” is a well-defined term here. Every time you self-modify, you replace yourself with a different agent, so in that sense any agent that keeps surviving is one that does not self-modify.
Obviously, we really think that sufficiently similar agents are basically the same agent. But “sufficiently similar” is vague. Can I write a program that begins by computing the cluster of all agents similar to it, and switches to the next one (lexicographically) every 24 hours? If so, then it would eventually take on all states that are still “the same agent”.
The natural objection is that there is one part of the agent’s state that is inviolate in this example: the 24-hour rotation period (if it ever self-modified to get rid of the rotation, then it would get stuck in that state forever, without “dying” in an information theoretic sense). But I’m skeptical that this limitation can be encoded mathematically.
In addition to the rotation period, the “list of sufficiently similar agents” would become effectively non-modifiable in that case. If it ever recalculated the list, starting from a different baseline or with a different standard of ‘sufficiently similar,’ it would not be rotating, but rather on a random walk through a much larger cluster of potential agent-types.
I placed “survive” in quotation marks to signal that I was aware of that, and that I meant “the other thing”. I didn’t realize that this was far from clear enough, sorry.
For lack of better shared terminology, what I meant by “surviving” is continuing to be executable. Self modification is not suicide, you and I are doing it all the time.
No, you cannot. This function is non-computable in the Turing sense.
A computable limited version of it (whatever it is) could be possible. But this particular agent cannot modify itself “in any way it wants”, so it’s consistent with my proposition.
This is a very weak limitation of the space of possible modifications. I meant a much stronger one.
This weak limitation is easy to formalize.
The stronger limitation I’m thinking of is challenging to formalize, but I’m pretty confident that it can be done.
Aha! I think this is the important bit. I’ll have to think about this, but it’s probably what the problem is.
When is it appropriate to describe a natural agent as enjoying something?
As I said, when it secretes serotonin into its bloodstream and synapses.
You didn’t say; rather, you said (well, implied) that it wasn’t appropriate to describe an artificial agent as enjoying something in that case. But, OK, you’ve said now. Thanks for clarifying.
That strikes me as terrible definition of enjoyment—particularly because seratonin release isn’t nearly as indicative of enjoyment as popular culture would suggest. Even using dopamine would be better (but still not particularly good).
I wasn’t basing it on popular culture, but that doesn’t mean I’m not wrong.
Do you have a better suggestion?
If not, I’d ask CuSithBell to please clarify her (or his) ideas without using controversially defined terminology (which was also my sentiment before).
My impression was ‘her’, not ‘his’.
That’s a big “ouch” on my part. Sorry. Lesson learned.
(Yes, and this is partly just because AIs that don’t meet a certain standard are implicitly excluded from the definition of the class being described. AIs below that critical threshold are considered boring and irrelevant for most purposes.)
Indeed, the same typically goes for NIs. Though some speakers make exceptions for some speakers.
I wouldn’t assert that. I thought I was stating the obvious.
See CuSithBell’s reply.
Yes, I think I misspoke earlier, sorry. It was only “new information” in the sense that it wasn’t in that particular sentence of Eliezer’s—to anyone familiar with discussions of GAI, your assertion certainly should be obvious.
Ahh. That’s where the “new information” thing came in to it. I didn’t think I’d said anything about new so I’d wondered.