The claims about Seed AI are not clear—it should read, ‘as far as we know, a seed AI could be a few hundred kilobytes’. I rather suspect that a friendly utility function isn’t that compressible.
Actually, we do have reasons to believe a seed AI could be expressed in a few hundred kilobytes.
By Chaitin’s incompleteness theorem, there is a constant L, depending only on our axiom system and choice of language, for which we cannot prove that any string s has Kolmogorov complexity greater than L.
An estimate by a friend of John Baez puts this constant for Peano arithmetic and Python somewhere less than a few kilobites. The entire blog post is worth reading for the intution and exposition.
Edit: It has been pointed out that this is barely a reason for it to be small, if that, however I think it is still an interesting side note.
There’s a gulf of difference between being unable to prove that -any- string has complexity greater than some constant and actually being able to actually specify -arbitrary- strings using less than that constant in computational resources.
Chaitin’s Incompleteness Theorem doesn’t limit complexity, only -provable- complexity.
On the other hand, if you take a random string of length S, no matter how long it is there’s only about a 1/2^100 chance that you can shave 100 bits off it by compression.
By Chaitin’s incompleteness theorem, there is a constant L, depending only on our axiom system and choice of language, for which we cannot prove that any string s has Kolmogorov complexity greater than L.
I dunno, that sounds a bit too fishy. It’s like the proof that there are no uninteresting numbers.
Couldn’t one simply write a program that takes a bit string, and then searches through programs until it finds one that outputs that bit string? This will always halt if every bit string has a Kolmogorov complexity. Then you iterate through bit strings, looking for one that has greater than a certain complexity. We know there are arbitrarily large complexities because you’d run out of programs otherwise, so this works. Does this work because it uses a system weaker than PA? Does it just fail?
Say your bit string is s and the program you find that outputs s is p. Now you know that K(s) ≤ |p|, meaning the length of p. However, we wanted to show that K(s)>L, so the bound is in the wrong direction. We need to show that there’s no shorter program that also outputs s, which is impossible by Chaitin’s theorem.
The actual proof of Chaitin’s theorem is somewhat similar to your argument here. Basically, if we have can prove that any string has a Kolmogorov complexity greater than L, we write a program to search through all proofs until it find a proof that some string s has complexity greater than L and outputs that string. By choosing L sufficiently large, this program itself has complexity less than L, but it outputs s, contradicting K(s)>L.
For any algorithm PP such that will prove a string has complexity X (where complexity is the minimum amount of informational content necessary to specify it):
This algorithm is encoded by program OC, which iterates through every possible string until it finds one of complexity X>L. (L is at this point undefined.)
The program OC has complexity M. (This is the sneaky bit right here. You’ll see why in a moment.)
Any string output by OC has complexity M as an upper bound (being specified by program OC). Thus, L has an upper bound of M. Therefore, any algorithm PP cannot prove any string has a greater complexity than that necessary to specify the program OC which encapsulates it.
(This is a -very- rough approximation of the proof. Wikipedia’s is worse, at least in my view.)
Note that this doesn’t limit the complexity of any strings so much as the power of any algorithm PP and derivative program OC.
Enumerate and run every bitstring, increasing in length, until one emits the requisite bitstring; by construction, this program is the shortest bitstring which emits it and so the program is the Kolmogorov complexity. Many of these programs will not terminate, so you use the dovetail strategy to allocate computing time to every program.
Unfortunately, you cannot prove whether or not an arbitrary program will terminate, meaning the best this scheme will do is provide an upper bound for KC.
Without waiting forever there’s no way of knowing if one of the programs smaller than that bound, which hasn’t terminated yet, isn’t going to eventually terminate and output the string.
Gwern: I think you understand this, but for the benefit of other readers:
The strategy of enumerate, run in parallel, and pick the first to halt doesn’t give Kolmogorov complexity. It gives an upper bound. There might be some shorter program that will halt and give the appropriate output, but it just hasn’t gotten there yet when you find the first thing that halts.
Note that this only proves the minimum complexity in the system used to run this operation; it could have a different complexity in a different system.
[ETA]: Also, I think this runs into the Chaitin issue. (Personally, I think the issue is with the flawed definition of “complexity” such that it incorporates only the size of the reference, and not the processing power necessary to disentangle the target from the reference.)
If there weren’t such a constant, I think it would follow that in effect K.C. wouldn’t be generally incomputable. (I may dwell on it further, once I’m sober in the morrow.)
(...) looking for one that has greater than a certain complexity.
The problem with your approach is that K. C. isn’t computable, so you wouldn’t know if you’ve found the exact K.C. of that bit string. Even iterating through all programs that generate it wouldn’t give you the answer to that one, since you’re not iterating from lowest K.C. to highest, but only from the uncompressed smallest program upwards.
The problem with your approach is that K. C. isn’t computable, so you wouldn’t know if you’ve found the exact K.C. of that bit string. Even iterating through all programs that generate it wouldn’t give you the answer to that one, since you’re not iterating from lowest K.C. to highest, but only from the uncompressed smallest program upwards.
Ah, right, halting problem. Can’t do step 1. Okay.
First, I don’t understand this domain well enough to pinpoint why that doesn’t work, but I trust in the math enough to believe the result regardless.
That said, I don’t think that you can write a program which searches through programs until it finds one which outputs a specific string, that also halts. It seems like it could always get stuck on some program that doesn’t halt, and it can’t figure out if a given program will halt or not.
You could have it do a few steps of each program at a time. Then it doesn’t get stuck on the non-halting programs, they just eat up more and more of its resources.
Actually, we do have reasons to believe a seed AI could be expressed in a few hundred kilobytes.
By Chaitin’s incompleteness theorem, there is a constant L, depending only on our axiom system and choice of language, for which we cannot prove that any string s has Kolmogorov complexity greater than L.
An estimate by a friend of John Baez puts this constant for Peano arithmetic and Python somewhere less than a few kilobites. The entire blog post is worth reading for the intution and exposition.
Edit: It has been pointed out that this is barely a reason for it to be small, if that, however I think it is still an interesting side note.
There’s a gulf of difference between being unable to prove that -any- string has complexity greater than some constant and actually being able to actually specify -arbitrary- strings using less than that constant in computational resources.
Chaitin’s Incompleteness Theorem doesn’t limit complexity, only -provable- complexity.
On the other hand, if you take a random string of length S, no matter how long it is there’s only about a 1/2^100 chance that you can shave 100 bits off it by compression.
I dunno, that sounds a bit too fishy. It’s like the proof that there are no uninteresting numbers.
Couldn’t one simply write a program that takes a bit string, and then searches through programs until it finds one that outputs that bit string? This will always halt if every bit string has a Kolmogorov complexity. Then you iterate through bit strings, looking for one that has greater than a certain complexity. We know there are arbitrarily large complexities because you’d run out of programs otherwise, so this works. Does this work because it uses a system weaker than PA? Does it just fail?
Say your bit string is s and the program you find that outputs s is p. Now you know that K(s) ≤ |p|, meaning the length of p. However, we wanted to show that K(s)>L, so the bound is in the wrong direction. We need to show that there’s no shorter program that also outputs s, which is impossible by Chaitin’s theorem.
The actual proof of Chaitin’s theorem is somewhat similar to your argument here. Basically, if we have can prove that any string has a Kolmogorov complexity greater than L, we write a program to search through all proofs until it find a proof that some string s has complexity greater than L and outputs that string. By choosing L sufficiently large, this program itself has complexity less than L, but it outputs s, contradicting K(s)>L.
The proof works (roughly) like this:
For any algorithm PP such that will prove a string has complexity X (where complexity is the minimum amount of informational content necessary to specify it):
This algorithm is encoded by program OC, which iterates through every possible string until it finds one of complexity X>L. (L is at this point undefined.)
The program OC has complexity M. (This is the sneaky bit right here. You’ll see why in a moment.)
Any string output by OC has complexity M as an upper bound (being specified by program OC). Thus, L has an upper bound of M. Therefore, any algorithm PP cannot prove any string has a greater complexity than that necessary to specify the program OC which encapsulates it.
(This is a -very- rough approximation of the proof. Wikipedia’s is worse, at least in my view.)
Note that this doesn’t limit the complexity of any strings so much as the power of any algorithm PP and derivative program OC.
Kolmogorov complexity isn’t computable, so how do you do this?
Enumerate and run every bitstring, increasing in length, until one emits the requisite bitstring; by construction, this program is the shortest bitstring which emits it and so the program is the Kolmogorov complexity. Many of these programs will not terminate, so you use the dovetail strategy to allocate computing time to every program.
Unfortunately, you cannot prove whether or not an arbitrary program will terminate, meaning the best this scheme will do is provide an upper bound for KC.
Without waiting forever there’s no way of knowing if one of the programs smaller than that bound, which hasn’t terminated yet, isn’t going to eventually terminate and output the string.
Yes, that’s the loophole for Manfred’s scheme (if I haven’t read into his comment something he didn’t intend).
Gwern: I think you understand this, but for the benefit of other readers:
The strategy of enumerate, run in parallel, and pick the first to halt doesn’t give Kolmogorov complexity. It gives an upper bound. There might be some shorter program that will halt and give the appropriate output, but it just hasn’t gotten there yet when you find the first thing that halts.
Note that this only proves the minimum complexity in the system used to run this operation; it could have a different complexity in a different system.
[ETA]: Also, I think this runs into the Chaitin issue. (Personally, I think the issue is with the flawed definition of “complexity” such that it incorporates only the size of the reference, and not the processing power necessary to disentangle the target from the reference.)
If there weren’t such a constant, I think it would follow that in effect K.C. wouldn’t be generally incomputable. (I may dwell on it further, once I’m sober in the morrow.)
The problem with your approach is that K. C. isn’t computable, so you wouldn’t know if you’ve found the exact K.C. of that bit string. Even iterating through all programs that generate it wouldn’t give you the answer to that one, since you’re not iterating from lowest K.C. to highest, but only from the uncompressed smallest program upwards.
Ah, right, halting problem. Can’t do step 1. Okay.
First, I don’t understand this domain well enough to pinpoint why that doesn’t work, but I trust in the math enough to believe the result regardless.
That said, I don’t think that you can write a program which searches through programs until it finds one which outputs a specific string, that also halts. It seems like it could always get stuck on some program that doesn’t halt, and it can’t figure out if a given program will halt or not.
You could have it do a few steps of each program at a time. Then it doesn’t get stuck on the non-halting programs, they just eat up more and more of its resources.