So most any value-core will go evil if allowed to unfold to its logical conclusions. This sounds correct to me, and also it sounds just like the motivation for FAI. Now your argument that humans solve this problem by balanced deterrence among value-cores (as opposed to weighing them together in one utility function) sounds to me like a novel intuition applicable to FAI. We have some researchers on the topic here, maybe they could speak up?
When you make every part of a balanced system more powerful without an overseeing process maintaining balance you don’t get a more powerful balanced system, you get an algae bloom.
Why without? We can put an overseeing process in. It probably doesn’t have to be very smart—after all, the overseeing process for humans is pretty stupid compared to a human.
An interesting observation! An objection to it is that this approach would require your AI to have inconsistent beliefs.
Personally, I believe that fast AI systems with inconsistencies, heuristics, and habits will beat verifiably-correct logic systems in most applications; and will achieve general AI long before any pure-logic systems. (This is one reason why I’m skeptical that coming up with the right decision logic is a workable approach to FAI. I wish that Eliezer had been at Ben Goertzel’s last AGI conference, just to see what he would have said to Selmer Bringsjord’s presentation claiming that the only safe AI would be a logic system using a consistent logic, so that we could verify that certain undesirable statements were false in that system. The AI practitioners present found the idea not just laughable, but insulting. I said that he was telling us to turn the clock back to 1960 and try again the things that we spent decades failing at. Richard Loosemore gave a long, rude, and devastating reply to Bringsjord, who remained blissfully ignorant of the drubbing he’d just received.)
He claims to have an argument that P=NP. He’s a philosopher, so “argument” != proof. Although approaching P=NP as a philosophical argument does strike me as kooky.
Better proof of kookhood is that he was at AGI mainly to present his work on hypercomputing, which he claimed was a computational system with more power than a Turing machine. One element of his argument was that proofs using hyperset logic (which he said is an entire field of logic nowadays; I wouldn’t know) use a notation that can not even theoretically be represented by a Turing machine. These proofs were published in two-dimensional journal articles, in black-and-white print. I did not notice any fractal fonts in the proofs.
He claims to have an argument that P=NP. He’s a philosopher, so “argument” != proof. Although approaching P=NP as a philosophical argument does strike me as kooky.
If it’s this argument, it’s wrong. It is based on the claim that soap films solve the Steiner problem, which they don’t. I tried this myself for four pins; here is a report of six-pin soap-film configurations. The soap film, obviously, only finds a local minimum, not a global one. But finding a local minimum is computationally easy.
Elsewhere, in a paper that detracts from the credibility of the journal it appears in, he argues that people can perform hypercomputation, on the grounds that we can imagine people performing hypercomputation. (Yes, I read all 24 pages, and that’s what it comes down to.)
One element of his argument was that proofs using hyperset logic (which he said is an entire field of logic nowadays; I wouldn’t know)
Judging by Google, the only wide use of the word “hyperset” in mathematics is in non-well-founded set theory. If that is what he was talking about, it’s equiconsistent with the usual sort of set theory and has no more significance for AI than the choice of programming language (which, in my view, has no significance for AI).
What is it with AI? Does it attract the insane, or does it drive them insane? ETA: Or attract the people that it can drive insane?
Oh… This is sad work (Bringsjord). His argument for hypercomputation by people seems remarkably similar to Alvin Plantinga’s Modal Ontological Argument for God.
I am also suspect of much of what Penrose has to say about Computationalism, although I am not yet sufficiently knowledgeable to be able to directly confront his work in any meaningful way (I am working to rectify that problem. I seem to have a knack for formal logic, and I am hoping that when I get to upper division logic classes that I will be able to more directly confront arguments like Penrose’s and Bringsjord’s)
It would be nice though, if outsiders could show some respect by demonstrating, as is probably demonstrable but difficult, that its object of study is incoherent, not just imaginary.
I’m not really sure it makes sense to talk about mathematical objects as being imaginary but not incoherent.
I’d be very surprised if this Universe was super-Turing, but you think it’s actually incoherent? I can definitely conceive of a hypercomputational cellular automata, what is it about the idea of our Universe being hypercomputational that seems incoherent to you?
I think that it is very common for things that we casually think we can definitely conceive of to actually be incoherent. I also think that almost everyone else underestimates how common it is.
I think I’m correcting for that. Do you agree that the halting oracle function itself is well-defined? If so, what seems inconceivable about a cellular automaton whose rules depend on the output of that oracle? OK, you have to stretch the definition of a cellular automaton to allow it, perhaps by allowing cells to have unbounded state, but the result is a wholly defined and therefore surely in-principle-conceivable Universe which is super-Turing. No?
It’s not incoherent. There could be such a thing as Hypercomputation.
However, nobody has found any evidence that it exists so far—and maybe they never will.
Hypercomputation enthusiasts claim that its existence doesn’t matter too much—and that it’s a valuable concept regardless of whether it exists or not. Maybe.
And, now I see why I am skeptical of hypercomputation. It seems to all necessitate some form of computation over an infinite number of steps. This would require some severe bending of the rules or constraints of physics, wouldn’t it?
timtyler’s comment below mine seems to be appropriate:
Hah! I just came across your comment, Phil :-) I was “Rude”?
Hey, you were sitting next to me, and egging me on by saying “No it isn’t” quietly to yourself every time Bringsjord tried to assert his (nonsensical) claim.
But anyway. I’d claim that I was not rude, really. Bringsjord kept interrupting my attempts to ask my question with loud, almost shouted comments like “If you really think that, I feel sorry for you: you really need to go back and try to get a grasp of elementary logic before you ask me questions like this!!”
An AI doesn’t have to have a purely logical structure (let alone a stupid one, e.g. structureless predicates for tables and chairs) in order to be able to logically prove important things about it. It seems to me that criticism of formally proving FAI by analogy to failed logical AI equivocates between these things.
the only safe AI would be a logic system using a consistent logic, so that we could verify that certain undesirable statements were false in that system
Could be correct or wildly incorrect, depending on exactly what he meant by it. Of course you have to delete “the only”, but I’d be pretty doubtful of any humans trying to do recursive self-modification in a way that didn’t involve logical proof of correctness to start with.
One of the big problems is that he was trying to talk about the logical correctness of human-level symbolic statements about the world. Even if the logic is correct, there is no correct, consistent mapping from the analog world, to symbolic descriptions, and back. A mapping that’s close enough to work 99.99% of the time isn’t good enough when you’re talking about proof.
Companies are the self-improving systems of today—e.g. see Google.
They don’t hack the human brain much—but they don’t need to. Brains are not perfect—but they can have their inputs preprocessed, their outputs post-processed, and they can be replaced entirely by computers—via the well-known process of automation.
Do the folk at Google proceed without logical proofs? Of course they do! Only the slowest and most tentative programmer tries to prove the correctness of their programs before they deploy them. Instead most programmers extensively employ testing methodologies. Testing is the mantra of modern programmers. Test, test, test! That way they get their products to the market before the sun explodes.
As Eliezer has already showed, “test, test, test”ing AIs that aren’t provably Friendly (their recursive self-modification leads to Friendly results) can have disastrous consequences.
I’d rather wait until the sun explodes rather than deploying an unFriendly AI by accident.
The consequences of failing to adopt rapid development technologies when it comes to the development of intelligent machines should be pretty obvious—the effect is to pass the baton to another team with a different development philosophy.
Waiting until the sun explodes is not one of the realistic options.
The box experiments seem irrelevant to the case of testing machine intelligence. When testing prototypes in a harness, you would use powerful restraints—not human gatekeepers.
Turn it off, encase it in nanofabricated diamond, and bury it in a deep pit. Destroy the experimental records, retaining only enough information to help future, wiser generations to one day take up again the challenge of building a Friendly AI. Scatter the knowledge in fragments, hidden in durable artifacts, scatter even the knowledge of how to find the knowledge likewise, and arrange a secret brotherhood to pass down through the centuries the ultimate keys to the Book That Does Not Permit Itself To Be Read.
Tens of thousands of years later, when civilisation has (alas) fallen and risen several times over, a collect-all-the-plot-coupons fantasy novel takes place.
Use a facility designed by the government with multiple guards and built with vastly more resources than the imprisoned man can muster.
Want to restrain a machine?
You use the same strategy. Or you could use drugs, or build in a test harness. Whatever—but however you look at it, it doesn’t seem like a problem.
We can restrain individuals pretty securely today—and there is no indication that future developments are going to change that.
What’s with the question about removing restraints? That isn’t a problem either. You are suggesting that the imprisoned agent contacts and manipulates humans “on the outside”—and they attempt a jail-break? That is a strategy available to other prisoners as well. It has a low success rate. Those few that do escape are typically hunted down and then imprisoned again.
If you are particularly paranoid about escaped prisoners, then build a higher security prison. Typically, you can have whatever security level you are prepared to pay for.
And not just by persuading the guards—the kind of AIs we are talking about, transhuman-level AIs, could potentially do all kinds of mind-hacking things of which we haven’t even yet conceived. Hell, they could do things that we will never be able to conceive unaided.
If we ever set up a system that relies on humans restraining a self-modifying AI, we had better be sure beforehand that the AI is Friendly. The only restraints that I can think of that would provably work involve limiting the AIs access to resources so that it never achieves a level of intelligence equal to or higher than human—but then, we haven’t quite made an AI, have we? Not much benefit to a glorified expert system.
If you haven’t read the AI Box experiment reports I linked above, I recommend them—apparently, it doesn’t quite take a transhuman-level AI to get out of a “test harness.”
Why not make a recursively improving AI in some strongly typed language who provably can only interact with the world through printing names of stocks to buy?
How about one that can only make blueprints for star ships?
We might say that humans as individuals do recursive self-modification when they practice at a skilled task such as playing football or riding a bike. Coaches and parents might or might not be conscious of logical proofs of correctness when teaching those tasks. Arguably a logical proof of (their definition of) correctness could be derived. But I am not sure that is what you mean.
Humans as a species do recursive self-modification through evolution. Correctness in that context is survival and the part under human control is selecting mates. I would like to have access to those proofs. They might come in handy when dating.
We might say that humans as individuals do recursive self-modification when they practice at a skilled task such as playing football or riding a bike.
Those are first-order self-modification, not recursive. Learning better ways to modify yourself, or better things to modify yourself towards doing, would be second-order self-modification. ISTM that it would be very difficult to do anything more than a third-order self-modification on our current wetware.
Although our current platform for self-modification is extremely flexible, and almost anything stored in it can be changed/deleted, we can’t make modifications to the platform itself.… which is where the “recursive” bit would really come into play.
(That having been said, most people have barely scratched the surface of their options for 2nd and 3rd order self-modification, recursive modification be damned.)
Your examples are all missing either the ‘self’ aspect or the ‘recursive’ aspect. See Intelligence Explosion for an actual example of recursive self-modification, or for a longer explanation of recursive self-improvement, this post.
I concede that the human learning process is not at all as explosive as the self-modifying AI processes of the future will be, but I was speaking to a different point:
Eliezer said: “I’d be pretty doubtful of any humans trying to do recursive self-modification in a way that didn’t involve logical proof of correctness to start with.”
I am arguing that humans do recursive self-modification all the time, without “proofs of correctness to start with” - even to the extent of developing gene therapies that modify our own hardware.
I fail to see how human learning is not recursive self-modification. All human intelligence can be thought of as deeply recursive. A playFootBall() function certainly calls itself repeatedly until the game is over. A football player certainly improves skill at football by repeated playing football. As skills sets develop human software (and instantiation) is being self-modified in the development of new neural networks and muscles (i.e. marathon runners have physically larger hearts, etc.) Arguably, hardware is being modified via epigenetics (phenotypes changing within narrow ranges of potential expression). As a species, we are definitely exploring genetic self-modification. A scientist who injects himself with a gene-based therapy is self-modifiying hardware.
We do all these things without foregoing proof of correctness and yet we still make improvements. I don’t think that we should ignore the possibility of an AI that destroys the world. I am very happy that some people are pursuing a guarantee that it won’t happen. I think it is worth noting that the process that will lead to provably friendly AI seems very different than the one that leads to not-necessarily-so-friendly humans and human society.
You will be right about it being genuine recursive self-modification when genetics advances sufficiently that a scientist discovers a gene therapy that confers a significant intelligence advantage, and she takes it herself so that she can more effectively discover even more powerful gene therapies. We’re not there yet, not even remotely close, and we’re even further away when it comes to epigenetics.
Your football example is not recursive self-modification, but the genetics examples would be if they actually come to pass. You’re right that if it happened, it would happen without a proof of correctness. The point is not that it’s not possible without a proof of correctness, but that it’s irresponsibly dangerous. If a single individual recursively self-improved his intelligence to the point that he was then easily able to thoroughly dominate the entire world economy, how much more dangerous would it be for a radically different kind of intelligence to reach that level at a rate of increase that is orders of magnitude greater? It depends on the kind of intelligence, in particular, unless we want to just “hope for the best” and see what happens, it depends on what we can prove about the particular kind of intelligence. Wanting a proof is just a way of saying that we want to really know how it will turn out rather than just hope and pray or rely on vague gap-filled arguments that may or may not turn out to be correct. That’s the point.
Steve Omhundro has given several talks that talk about the consequences of a purely logical or rationally exact AI system.
His talk at the Sing. Summit 2007 The Nature of Self-Improving AI discussed what would happen if such an Agent were to have the wrong rules constraining its behavior. I saw a purely logical system as being one such possible agent type to which he referred.
Great post, thanks, upvoted.
So most any value-core will go evil if allowed to unfold to its logical conclusions. This sounds correct to me, and also it sounds just like the motivation for FAI. Now your argument that humans solve this problem by balanced deterrence among value-cores (as opposed to weighing them together in one utility function) sounds to me like a novel intuition applicable to FAI. We have some researchers on the topic here, maybe they could speak up?
When you make every part of a balanced system more powerful without an overseeing process maintaining balance you don’t get a more powerful balanced system, you get an algae bloom.
Why without? We can put an overseeing process in. It probably doesn’t have to be very smart—after all, the overseeing process for humans is pretty stupid compared to a human.
An interesting observation! An objection to it is that this approach would require your AI to have inconsistent beliefs.
Personally, I believe that fast AI systems with inconsistencies, heuristics, and habits will beat verifiably-correct logic systems in most applications; and will achieve general AI long before any pure-logic systems. (This is one reason why I’m skeptical that coming up with the right decision logic is a workable approach to FAI. I wish that Eliezer had been at Ben Goertzel’s last AGI conference, just to see what he would have said to Selmer Bringsjord’s presentation claiming that the only safe AI would be a logic system using a consistent logic, so that we could verify that certain undesirable statements were false in that system. The AI practitioners present found the idea not just laughable, but insulting. I said that he was telling us to turn the clock back to 1960 and try again the things that we spent decades failing at. Richard Loosemore gave a long, rude, and devastating reply to Bringsjord, who remained blissfully ignorant of the drubbing he’d just received.)
That fellow Bringsjord seems to me an obvious kook, e.g. he claims to have proven that P=NP.
He claims to have an argument that P=NP. He’s a philosopher, so “argument” != proof. Although approaching P=NP as a philosophical argument does strike me as kooky.
Better proof of kookhood is that he was at AGI mainly to present his work on hypercomputing, which he claimed was a computational system with more power than a Turing machine. One element of his argument was that proofs using hyperset logic (which he said is an entire field of logic nowadays; I wouldn’t know) use a notation that can not even theoretically be represented by a Turing machine. These proofs were published in two-dimensional journal articles, in black-and-white print. I did not notice any fractal fonts in the proofs.
If it’s this argument, it’s wrong. It is based on the claim that soap films solve the Steiner problem, which they don’t. I tried this myself for four pins; here is a report of six-pin soap-film configurations. The soap film, obviously, only finds a local minimum, not a global one. But finding a local minimum is computationally easy.
Elsewhere, in a paper that detracts from the credibility of the journal it appears in, he argues that people can perform hypercomputation, on the grounds that we can imagine people performing hypercomputation. (Yes, I read all 24 pages, and that’s what it comes down to.)
Judging by Google, the only wide use of the word “hyperset” in mathematics is in non-well-founded set theory. If that is what he was talking about, it’s equiconsistent with the usual sort of set theory and has no more significance for AI than the choice of programming language (which, in my view, has no significance for AI).
What is it with AI? Does it attract the insane, or does it drive them insane? ETA: Or attract the people that it can drive insane?
Oh… This is sad work (Bringsjord). His argument for hypercomputation by people seems remarkably similar to Alvin Plantinga’s Modal Ontological Argument for God.
I am also suspect of much of what Penrose has to say about Computationalism, although I am not yet sufficiently knowledgeable to be able to directly confront his work in any meaningful way (I am working to rectify that problem. I seem to have a knack for formal logic, and I am hoping that when I get to upper division logic classes that I will be able to more directly confront arguments like Penrose’s and Bringsjord’s)
I came across a wikipedia article on hypercomputing a while back, http://en.wikipedia.org/wiki/Hypercomputation , the whole theory doesn’t seem at all well supported to me.
It is a field with an imaginary object of study.
It would be nice though, if outsiders could show some respect by demonstrating, as is probably demonstrable but difficult, that its object of study is incoherent, not just imaginary.
I’m not really sure it makes sense to talk about mathematical objects as being imaginary but not incoherent.
I’d be very surprised if this Universe was super-Turing, but you think it’s actually incoherent? I can definitely conceive of a hypercomputational cellular automata, what is it about the idea of our Universe being hypercomputational that seems incoherent to you?
I think that it is very common for things that we casually think we can definitely conceive of to actually be incoherent. I also think that almost everyone else underestimates how common it is.
I think I’m correcting for that. Do you agree that the halting oracle function itself is well-defined? If so, what seems inconceivable about a cellular automaton whose rules depend on the output of that oracle? OK, you have to stretch the definition of a cellular automaton to allow it, perhaps by allowing cells to have unbounded state, but the result is a wholly defined and therefore surely in-principle-conceivable Universe which is super-Turing. No?
Respectful outsiders?
Is that a reference to the inner sanctum of the Hypercomputation sect? ;-)
It’s not incoherent. There could be such a thing as Hypercomputation.
However, nobody has found any evidence that it exists so far—and maybe they never will.
Hypercomputation enthusiasts claim that its existence doesn’t matter too much—and that it’s a valuable concept regardless of whether it exists or not. Maybe.
I don’t disagree (i.e., I don’t see any positive reason to doubt the coherence of hypercomputation – though Michael sounds like he has one), but remember not to confuse subjective conceivability and actual coherence.
And, now I see why I am skeptical of hypercomputation. It seems to all necessitate some form of computation over an infinite number of steps. This would require some severe bending of the rules or constraints of physics, wouldn’t it?
timtyler’s comment below mine seems to be appropriate:
Doesn’t Newtonian gravity require computation over an infinite number of steps?
Hah! I just came across your comment, Phil :-) I was “Rude”?
Hey, you were sitting next to me, and egging me on by saying “No it isn’t” quietly to yourself every time Bringsjord tried to assert his (nonsensical) claim.
But anyway. I’d claim that I was not rude, really. Bringsjord kept interrupting my attempts to ask my question with loud, almost shouted comments like “If you really think that, I feel sorry for you: you really need to go back and try to get a grasp of elementary logic before you ask me questions like this!!”
So I got a little …. testy. :-) :-)
I really wish someone had recorded that exchange.
An AI doesn’t have to have a purely logical structure (let alone a stupid one, e.g. structureless predicates for tables and chairs) in order to be able to logically prove important things about it. It seems to me that criticism of formally proving FAI by analogy to failed logical AI equivocates between these things.
Will beat equals be developed first or be more capable than.
Selmer doesn’t understand LOTS of things that Eliezer understood at age 12, he’s superficially similar, but it’s a very superficial similarity.
Could be correct or wildly incorrect, depending on exactly what he meant by it. Of course you have to delete “the only”, but I’d be pretty doubtful of any humans trying to do recursive self-modification in a way that didn’t involve logical proof of correctness to start with.
One of the big problems is that he was trying to talk about the logical correctness of human-level symbolic statements about the world. Even if the logic is correct, there is no correct, consistent mapping from the analog world, to symbolic descriptions, and back. A mapping that’s close enough to work 99.99% of the time isn’t good enough when you’re talking about proof.
Companies are the self-improving systems of today—e.g. see Google.
They don’t hack the human brain much—but they don’t need to. Brains are not perfect—but they can have their inputs preprocessed, their outputs post-processed, and they can be replaced entirely by computers—via the well-known process of automation.
Do the folk at Google proceed without logical proofs? Of course they do! Only the slowest and most tentative programmer tries to prove the correctness of their programs before they deploy them. Instead most programmers extensively employ testing methodologies. Testing is the mantra of modern programmers. Test, test, test! That way they get their products to the market before the sun explodes.
As Eliezer has already showed, “test, test, test”ing AIs that aren’t provably Friendly (their recursive self-modification leads to Friendly results) can have disastrous consequences.
I’d rather wait until the sun explodes rather than deploying an unFriendly AI by accident.
The consequences of failing to adopt rapid development technologies when it comes to the development of intelligent machines should be pretty obvious—the effect is to pass the baton to another team with a different development philosophy.
Waiting until the sun explodes is not one of the realistic options.
The box experiments seem irrelevant to the case of testing machine intelligence. When testing prototypes in a harness, you would use powerful restraints—not human gatekeepers.
What powerful restraints would you suggest that would not require human judgment or human-designed decision algorithms to remove?
Turn it off, encase it in nanofabricated diamond, and bury it in a deep pit. Destroy the experimental records, retaining only enough information to help future, wiser generations to one day take up again the challenge of building a Friendly AI. Scatter the knowledge in fragments, hidden in durable artifacts, scatter even the knowledge of how to find the knowledge likewise, and arrange a secret brotherhood to pass down through the centuries the ultimate keys to the Book That Does Not Permit Itself To Be Read.
Tens of thousands of years later, when civilisation has (alas) fallen and risen several times over, a collect-all-the-plot-coupons fantasy novel takes place.
Want to restrain a man?
Use a facility designed by the government with multiple guards and built with vastly more resources than the imprisoned man can muster.
Want to restrain a machine?
You use the same strategy. Or you could use drugs, or build in a test harness. Whatever—but however you look at it, it doesn’t seem like a problem.
We can restrain individuals pretty securely today—and there is no indication that future developments are going to change that.
What’s with the question about removing restraints? That isn’t a problem either. You are suggesting that the imprisoned agent contacts and manipulates humans “on the outside”—and they attempt a jail-break? That is a strategy available to other prisoners as well. It has a low success rate. Those few that do escape are typically hunted down and then imprisoned again.
If you are particularly paranoid about escaped prisoners, then build a higher security prison. Typically, you can have whatever security level you are prepared to pay for.
The hypothetical AI is assumed to be able to talk normal humans assigned to guard it into taking its side.
In other words, the safest way to restrain it is to simply not turn it on.
And not just by persuading the guards—the kind of AIs we are talking about, transhuman-level AIs, could potentially do all kinds of mind-hacking things of which we haven’t even yet conceived. Hell, they could do things that we will never be able to conceive unaided.
If we ever set up a system that relies on humans restraining a self-modifying AI, we had better be sure beforehand that the AI is Friendly. The only restraints that I can think of that would provably work involve limiting the AIs access to resources so that it never achieves a level of intelligence equal to or higher than human—but then, we haven’t quite made an AI, have we? Not much benefit to a glorified expert system.
If you haven’t read the AI Box experiment reports I linked above, I recommend them—apparently, it doesn’t quite take a transhuman-level AI to get out of a “test harness.”
You don’t use a few humans to restrain an advanced machine intelligence. That would be really stupid.
Safest, but maybe not the only safe way?
Why not make a recursively improving AI in some strongly typed language who provably can only interact with the world through printing names of stocks to buy?
How about one that can only make blueprints for star ships?
We might say that humans as individuals do recursive self-modification when they practice at a skilled task such as playing football or riding a bike. Coaches and parents might or might not be conscious of logical proofs of correctness when teaching those tasks. Arguably a logical proof of (their definition of) correctness could be derived. But I am not sure that is what you mean.
Humans as a species do recursive self-modification through evolution. Correctness in that context is survival and the part under human control is selecting mates. I would like to have access to those proofs. They might come in handy when dating.
Those are first-order self-modification, not recursive. Learning better ways to modify yourself, or better things to modify yourself towards doing, would be second-order self-modification. ISTM that it would be very difficult to do anything more than a third-order self-modification on our current wetware.
Although our current platform for self-modification is extremely flexible, and almost anything stored in it can be changed/deleted, we can’t make modifications to the platform itself.… which is where the “recursive” bit would really come into play.
(That having been said, most people have barely scratched the surface of their options for 2nd and 3rd order self-modification, recursive modification be damned.)
Your examples are all missing either the ‘self’ aspect or the ‘recursive’ aspect. See Intelligence Explosion for an actual example of recursive self-modification, or for a longer explanation of recursive self-improvement, this post.
I found those links posted above interesting.
I concede that the human learning process is not at all as explosive as the self-modifying AI processes of the future will be, but I was speaking to a different point:
Eliezer said: “I’d be pretty doubtful of any humans trying to do recursive self-modification in a way that didn’t involve logical proof of correctness to start with.”
I am arguing that humans do recursive self-modification all the time, without “proofs of correctness to start with” - even to the extent of developing gene therapies that modify our own hardware.
I fail to see how human learning is not recursive self-modification. All human intelligence can be thought of as deeply recursive. A playFootBall() function certainly calls itself repeatedly until the game is over. A football player certainly improves skill at football by repeated playing football. As skills sets develop human software (and instantiation) is being self-modified in the development of new neural networks and muscles (i.e. marathon runners have physically larger hearts, etc.) Arguably, hardware is being modified via epigenetics (phenotypes changing within narrow ranges of potential expression). As a species, we are definitely exploring genetic self-modification. A scientist who injects himself with a gene-based therapy is self-modifiying hardware.
We do all these things without foregoing proof of correctness and yet we still make improvements. I don’t think that we should ignore the possibility of an AI that destroys the world. I am very happy that some people are pursuing a guarantee that it won’t happen. I think it is worth noting that the process that will lead to provably friendly AI seems very different than the one that leads to not-necessarily-so-friendly humans and human society.
You will be right about it being genuine recursive self-modification when genetics advances sufficiently that a scientist discovers a gene therapy that confers a significant intelligence advantage, and she takes it herself so that she can more effectively discover even more powerful gene therapies. We’re not there yet, not even remotely close, and we’re even further away when it comes to epigenetics.
Your football example is not recursive self-modification, but the genetics examples would be if they actually come to pass. You’re right that if it happened, it would happen without a proof of correctness. The point is not that it’s not possible without a proof of correctness, but that it’s irresponsibly dangerous. If a single individual recursively self-improved his intelligence to the point that he was then easily able to thoroughly dominate the entire world economy, how much more dangerous would it be for a radically different kind of intelligence to reach that level at a rate of increase that is orders of magnitude greater? It depends on the kind of intelligence, in particular, unless we want to just “hope for the best” and see what happens, it depends on what we can prove about the particular kind of intelligence. Wanting a proof is just a way of saying that we want to really know how it will turn out rather than just hope and pray or rely on vague gap-filled arguments that may or may not turn out to be correct. That’s the point.
Steve Omhundro has given several talks that talk about the consequences of a purely logical or rationally exact AI system.
His talk at the Sing. Summit 2007 The Nature of Self-Improving AI discussed what would happen if such an Agent were to have the wrong rules constraining its behavior. I saw a purely logical system as being one such possible agent type to which he referred.