Is it possible for an intelligence to simulate itself?
Is self-modification practicable without running simulations to prove goal stability?
Does creating an improved second version of oneself demand prove of friendliness?
I’m asking because I sense that those questions might be important in regard to the argument in the original post that each significant increase in capability requires a doubling of resources.
Is it possible for an intelligence to simulate itself?
The naive and direct answer is “Yes, but only at reduced speed and perhaps with the use of external simulated memory”. A smarter answer is “Of course it can, if it can afford to pay for the necessary computing resources.”
Is self-modification practicable without running simulations to prove goal stability?
I would say that self-modification is incredibly stupid without proof of much more fundamental things than goal-stability. And that you really can’t prove anything by simulation, since you cannot exercise a simulation with all possible test data, and you want a proof that good things happen forall data. So you want to prove the correctness of the self-modification symbolically.
Does creating an improved second version of oneself demand prove of friendliness?
That seems to be a wise demand to make of any powerful AI.
I’m asking because I sense that those questions might be important in regard to the argument in the original post that each significant increase in capability requires a doubling of resources.
Could you explain why you sense that?
I will comment that, in the course of this thread, I have become more and more puzzled at the usual assumption here that we are concerned with one big AI with unlimited power which self-modifies, rather than being concerned with an entire ecosystem of AIs, no single one of which is overwhelmingly powerful, which don’t self-modify but rather build machines of the next generation, and which are entirely capable of proving the friendliness of next-generation machines before those new machines are allowed to have any significant power.
I was curious if there might exist something analog to the halting problem regarding provability of goal-stability and general friendliness.
Could you explain why you sense that?
If every AGI has to prove the friendliness of its successor then this might demand considerable effort and therefore resources as it would demand great care and extensive simulations or, as your answer suggests, proving the correctness of the self-modification symbolically. In both cases the AGI would have to acquire new resources slowly, as it couldn’t just self-improve to come up with faster and more efficient solutions. In other words, self-improvement would demand resources, therefore the AGI could not profit from its ability to self-improve regarding the necessary acquisition of resources to be able to self-improve in the first place.
I was curious if there might exist something analogous to the halting problem regarding provability of goal-stability and general friendliness.
There is a lot of confusion out there as to whether the logic of the halting problem means that the idea of proving program correctness is somehow seriously limited. It does not mean that. All it means is that there can be programs so badly written that it is impossible to say what they do.
However, if programs are written for some purpose, and their creator can say why he expects the programs to do what he meant them to do, then that understanding of the programs can be turned into a checkable proof of program correctness.
It may be the case that it is impossible to decide whether an arbitrary program even halts. But we are not interested in arbitrary programs. We are interested in well-written programs accompanied by a proof of their correctness. Checking such a proof is not only a feasible task, it is an almost trivial task.
To say this again, it may well be the case that there are programs for which it is impossible to find a proof that they work. But no one sane would run such a program with any particular expectations regarding its behavior. However, if a program was constructed for some purpose, and the constructor can give reasons for believing that the program does what it is supposed to do, then a machine can check whether those reasons are sufficient.
It is a reasonable conjecture that for every correct program there exists a provably correct program which does the same thing and requires only marginally more resources to do it.
In other words, self-improvement would demand resources, therefore the AGI could not profit from its ability to self-improve regarding the necessary acquisition of resources to be able to self-improve in the first place.
Huh?
If that is meant to be an argument against a singleton AI being able to FOOM without using external resources, then you may be right. But why would it not have access to external resources? And why limit consideration to singleton AIs?
All it means is that there can be programs so badly written that it is impossible to say what they do.
It’s true that the result is “there exists programs such that”, and it says nothing about the relative sizes of those that can be proved to terminate vs those that can’t in the subset we care about and are likely to write.
However, if programs are written for some purpose,
Sometimes that purpose is to test mathematical conjectures. In that case, the straightforward and clear way to write the program actually is the one used, and is often apparently does fall into this subclass.
(And then there are all the programs that aren’t designed to normally terminate, and then we should be thinking about codata and coinduction.)
Sometimes that purpose is to test mathematical conjectures.
Upvoted because I think this best captures what the halting problem actually means. The programs for which we really want to know if they halt aren’t recursive functions that might be missing a base case or something. Rather, they’re like the program that enumerates the non-trivial zeroes of the Riemann zeta function until it finds one with real part not equal to 1⁄2.
If we could test whether such programs halted, we would be able to solve any mathematical existence problem ever that we could appropriately describe to a computer.
People researching program verification for purposes of improving software engineering really are interested in the kinds of mistakes programmers make, like “recursive functions that might be missing a base case or something”. Or like unbounded iteration for which we can prove termination, but really ought to double-check that proof.
Unbounded iteration for which we cannot (currently) prove termination may be of interest to mathematicians, but even a mathematician isn’t stupid enough to attempt to resolve the Zeta function conjecture by mapping the problem to a question about halting computer programs. That is a step in the wrong direction.
And if we build an AGI that attempts to resolve the conjecture by actually enumerating the roots in order, then I think that we have an AI that is either no longer rational or has too much time on its hands.
I suppose I wasn’t precise enough. Actual researchers in that area are, rightly, interested in more-or-less reasonable halting questions. But these don’t contradict our intuition about what halting problems are. In fact, I would guess that a large class of reasonable programming mistakes can be checked algorithmically.
But as far as consequences for there being, hypothetically, a solution to the general halting problem… the really significant result would be the existence of a general “problem solving algorithm”. Sure, Turing’s undecidability problem proves that the halting problem is undecidable… but this is my best attempt at capturing the intuition of why it’s not even a little bit close to being decidable.
Of course, even in a world where we had a halting solver, we probably wouldn’t want to use it to prove the Riemann Hypothesis. But your comment actually reminds me of a clever little problem I once saw. It went as follows:
If someone ever proves that P=NP, it’s possible the proof will be non-constructive: it won’t actually show how to solve an NP-complete problem in polynomial time. However, this doesn’t actually matter. Write an algorithm for solving SAT (or any other NP-complete problem) with the following property: if P=NP, then your algorithm should run in polynomial time.
(the connection to your comment is that even if P=NP, we wouldn’t actually use the solution to the above problem in practical applications. We’d try our best to find something better)
That’s a great question. I haven’t found anything on a brief search, but it seems like we can fairly readily embed a normal program inside a coinductive-style one and have it be productive after the normal program terminates.
All it means is that there can be programs so badly written that it is impossible to say what they do.
I would like to see all introductions to the Halting Problem explain this point. Unfortunately, it seems that “computer scientists” have only limited interest in real physical computers and how people program them.
I’m working on my ph.d. in program verification. Every problem we’re trying to solve is as hard as the halting problem, and so we make the assumption, essentially, that we’re operating over real programs: programs that humans are likely to write, and actually want to run. It’s the only way we can get any purchase on the problem.
Trouble is, the field doesn’t have any recognizable standard for what makes a program “human-writable”, so we don’t talk much about that assumption. We should really get a formal model, so we have some basis for expecting that a particular formal method will work well before we implement it… but that would be harder to publish, so no one in academia is likely to do it.
Similarly, inference (conditioning) is incomputable in general, even if your prior is computable. However, if you assume that observations are corrupted by independent, absolutely continuous noise, conditioning becomes computable.
Offhand, I don’t see any good way of specifying it in general, either (even the weirdest program might be written as part of some sort of arcane security exploit). Why don’t you guys limit yourselves to some empirical subset of programs that humans have written, like ‘Debian 5.0’?
But we are not interested in arbitrary programs. We are interested in well-written programs accompanied by a proof of their correctness. Checking such a proof is not only a feasible task, it is an almost trivial task.
Interesting! I have read that many mathematical proofs today require computer analysis. If it is such a trivial task to check the correctness of code that gives rise to the level of intelligence above ours, then my questions have not been relevant regarding the argument of the original post. I mistakenly expected that no intelligence is able to easily verify conclusively the workings of another intelligence that is a level above its own without painstakingly acquiring resources, devising the necessary tools and building the required infrastructure. I expected that what applied for humans creating superhuman AGI would also hold for AGI creating its successor. Humans first had to invent science, bumble through the industrial revolution and develop computers to be able to prove modern mathematical problems. So I thought a superhuman AGI would have to invent meta-science, advanced nanotechnology and figure out how to create an intelligence that could solve problems it couldn’t solve itself.
If it is such a trivial task to check the correctness of code that gives rise to the level of intelligence above ours …
Careful! I said checking the validity of a proof of code correctness, not checking the correctness of code. The two are very different in theory, but not (I conjecture) very different in practice, because code is almost never correct unless the author of the code has at least a sketch proof that it does what he intended.
I see, it wasn’t my intention to misquote you, I was simply not aware of a significant distinction.
By the way, has the checking of validity of a proof of code correctness to be validated or proven as well or is there some agreed limit to the depth of verification sufficient to be sure that if you run the code nothing unexpected will happen?
The usual approach, taken in tools like Coq and HOL, is to reduce the proof checker to a small component of the overall proof assistant, with the hope that these small pieces can be reasonably understood by humans, and proved correct without machine assistance. This keeps the human correctness-checking burden low, while also keeping the piece that must be verified in a domain that’s reasonable for humans to work with, as opposed to, say, writing meta-meta-meta proofs.
The consequence of this approach is that, since the underlying logic is very simple, the resulting proof language is very low-level. Human users give higher-level specifications, and the bulk of the proof assistant compiles those into low-level proofs.
There are a couple of branches worth mentioning in the “meta-proof”. One is the proof of correctness of the proof-checking software. I believe that fiddlemath is addressing this problem. The “specification” of the proof-checking software is essentially a presentation of an “inference system”. The second is the proof of the soundness of the inference system with respect to the programming language and specification language. In principle, this is a task which is done once—when the languages are defined. The difficulty of this task is (AFAIK) roughly like that of a proof of cut-elimination—that is O(n^2) where n is the number of productions in the language grammars.
I got some questions regarding self-improvement:
Is it possible for an intelligence to simulate itself?
Is self-modification practicable without running simulations to prove goal stability?
Does creating an improved second version of oneself demand prove of friendliness?
I’m asking because I sense that those questions might be important in regard to the argument in the original post that each significant increase in capability requires a doubling of resources.
The naive and direct answer is “Yes, but only at reduced speed and perhaps with the use of external simulated memory”. A smarter answer is “Of course it can, if it can afford to pay for the necessary computing resources.”
I would say that self-modification is incredibly stupid without proof of much more fundamental things than goal-stability. And that you really can’t prove anything by simulation, since you cannot exercise a simulation with all possible test data, and you want a proof that good things happen forall data. So you want to prove the correctness of the self-modification symbolically.
That seems to be a wise demand to make of any powerful AI.
Could you explain why you sense that?
I will comment that, in the course of this thread, I have become more and more puzzled at the usual assumption here that we are concerned with one big AI with unlimited power which self-modifies, rather than being concerned with an entire ecosystem of AIs, no single one of which is overwhelmingly powerful, which don’t self-modify but rather build machines of the next generation, and which are entirely capable of proving the friendliness of next-generation machines before those new machines are allowed to have any significant power.
I was curious if there might exist something analog to the halting problem regarding provability of goal-stability and general friendliness.
If every AGI has to prove the friendliness of its successor then this might demand considerable effort and therefore resources as it would demand great care and extensive simulations or, as your answer suggests, proving the correctness of the self-modification symbolically. In both cases the AGI would have to acquire new resources slowly, as it couldn’t just self-improve to come up with faster and more efficient solutions. In other words, self-improvement would demand resources, therefore the AGI could not profit from its ability to self-improve regarding the necessary acquisition of resources to be able to self-improve in the first place.
There is a lot of confusion out there as to whether the logic of the halting problem means that the idea of proving program correctness is somehow seriously limited. It does not mean that. All it means is that there can be programs so badly written that it is impossible to say what they do.
However, if programs are written for some purpose, and their creator can say why he expects the programs to do what he meant them to do, then that understanding of the programs can be turned into a checkable proof of program correctness.
It may be the case that it is impossible to decide whether an arbitrary program even halts. But we are not interested in arbitrary programs. We are interested in well-written programs accompanied by a proof of their correctness. Checking such a proof is not only a feasible task, it is an almost trivial task.
To say this again, it may well be the case that there are programs for which it is impossible to find a proof that they work. But no one sane would run such a program with any particular expectations regarding its behavior. However, if a program was constructed for some purpose, and the constructor can give reasons for believing that the program does what it is supposed to do, then a machine can check whether those reasons are sufficient.
It is a reasonable conjecture that for every correct program there exists a provably correct program which does the same thing and requires only marginally more resources to do it.
Huh?
If that is meant to be an argument against a singleton AI being able to FOOM without using external resources, then you may be right. But why would it not have access to external resources? And why limit consideration to singleton AIs?
It’s true that the result is “there exists programs such that”, and it says nothing about the relative sizes of those that can be proved to terminate vs those that can’t in the subset we care about and are likely to write.
Sometimes that purpose is to test mathematical conjectures. In that case, the straightforward and clear way to write the program actually is the one used, and is often apparently does fall into this subclass.
(And then there are all the programs that aren’t designed to normally terminate, and then we should be thinking about codata and coinduction.)
Upvoted because I think this best captures what the halting problem actually means. The programs for which we really want to know if they halt aren’t recursive functions that might be missing a base case or something. Rather, they’re like the program that enumerates the non-trivial zeroes of the Riemann zeta function until it finds one with real part not equal to 1⁄2.
If we could test whether such programs halted, we would be able to solve any mathematical existence problem ever that we could appropriately describe to a computer.
What you mean ‘we,’ white man?
People researching program verification for purposes of improving software engineering really are interested in the kinds of mistakes programmers make, like “recursive functions that might be missing a base case or something”. Or like unbounded iteration for which we can prove termination, but really ought to double-check that proof.
Unbounded iteration for which we cannot (currently) prove termination may be of interest to mathematicians, but even a mathematician isn’t stupid enough to attempt to resolve the Zeta function conjecture by mapping the problem to a question about halting computer programs. That is a step in the wrong direction.
And if we build an AGI that attempts to resolve the conjecture by actually enumerating the roots in order, then I think that we have an AI that is either no longer rational or has too much time on its hands.
I suppose I wasn’t precise enough. Actual researchers in that area are, rightly, interested in more-or-less reasonable halting questions. But these don’t contradict our intuition about what halting problems are. In fact, I would guess that a large class of reasonable programming mistakes can be checked algorithmically.
But as far as consequences for there being, hypothetically, a solution to the general halting problem… the really significant result would be the existence of a general “problem solving algorithm”. Sure, Turing’s undecidability problem proves that the halting problem is undecidable… but this is my best attempt at capturing the intuition of why it’s not even a little bit close to being decidable.
Of course, even in a world where we had a halting solver, we probably wouldn’t want to use it to prove the Riemann Hypothesis. But your comment actually reminds me of a clever little problem I once saw. It went as follows:
(the connection to your comment is that even if P=NP, we wouldn’t actually use the solution to the above problem in practical applications. We’d try our best to find something better)
Which raises a question I have never considered …
Is there something similar to the halting-problem theorem for this kind of computation?
That is, can we classify programs as either productive or non-productive, and then prove the non-existence of a program that can classify?
That’s a great question. I haven’t found anything on a brief search, but it seems like we can fairly readily embed a normal program inside a coinductive-style one and have it be productive after the normal program terminates.
I would like to see all introductions to the Halting Problem explain this point. Unfortunately, it seems that “computer scientists” have only limited interest in real physical computers and how people program them.
I’m working on my ph.d. in program verification. Every problem we’re trying to solve is as hard as the halting problem, and so we make the assumption, essentially, that we’re operating over real programs: programs that humans are likely to write, and actually want to run. It’s the only way we can get any purchase on the problem.
Trouble is, the field doesn’t have any recognizable standard for what makes a program “human-writable”, so we don’t talk much about that assumption. We should really get a formal model, so we have some basis for expecting that a particular formal method will work well before we implement it… but that would be harder to publish, so no one in academia is likely to do it.
Similarly, inference (conditioning) is incomputable in general, even if your prior is computable. However, if you assume that observations are corrupted by independent, absolutely continuous noise, conditioning becomes computable.
Offhand, I don’t see any good way of specifying it in general, either (even the weirdest program might be written as part of some sort of arcane security exploit). Why don’t you guys limit yourselves to some empirical subset of programs that humans have written, like ‘Debian 5.0’?
Interesting! I have read that many mathematical proofs today require computer analysis. If it is such a trivial task to check the correctness of code that gives rise to the level of intelligence above ours, then my questions have not been relevant regarding the argument of the original post. I mistakenly expected that no intelligence is able to easily verify conclusively the workings of another intelligence that is a level above its own without painstakingly acquiring resources, devising the necessary tools and building the required infrastructure. I expected that what applied for humans creating superhuman AGI would also hold for AGI creating its successor. Humans first had to invent science, bumble through the industrial revolution and develop computers to be able to prove modern mathematical problems. So I thought a superhuman AGI would have to invent meta-science, advanced nanotechnology and figure out how to create an intelligence that could solve problems it couldn’t solve itself.
Careful! I said checking the validity of a proof of code correctness, not checking the correctness of code. The two are very different in theory, but not (I conjecture) very different in practice, because code is almost never correct unless the author of the code has at least a sketch proof that it does what he intended.
I see, it wasn’t my intention to misquote you, I was simply not aware of a significant distinction.
By the way, has the checking of validity of a proof of code correctness to be validated or proven as well or is there some agreed limit to the depth of verification sufficient to be sure that if you run the code nothing unexpected will happen?
The usual approach, taken in tools like Coq and HOL, is to reduce the proof checker to a small component of the overall proof assistant, with the hope that these small pieces can be reasonably understood by humans, and proved correct without machine assistance. This keeps the human correctness-checking burden low, while also keeping the piece that must be verified in a domain that’s reasonable for humans to work with, as opposed to, say, writing meta-meta-meta proofs.
The consequence of this approach is that, since the underlying logic is very simple, the resulting proof language is very low-level. Human users give higher-level specifications, and the bulk of the proof assistant compiles those into low-level proofs.
There are a couple of branches worth mentioning in the “meta-proof”. One is the proof of correctness of the proof-checking software. I believe that fiddlemath is addressing this problem. The “specification” of the proof-checking software is essentially a presentation of an “inference system”. The second is the proof of the soundness of the inference system with respect to the programming language and specification language. In principle, this is a task which is done once—when the languages are defined. The difficulty of this task is (AFAIK) roughly like that of a proof of cut-elimination—that is O(n^2) where n is the number of productions in the language grammars.