This idea is based on a whole range of confusions and misunderstandings. Program code does not have the redundancy or flexibility of genetics, as you know with syntax errors it shatters if a single character is wrong—for this reason it’s a mistake to use it as the carrier for the genetic paradaigm. Another mistake is extrapolating from a finite data source: you can’t expect to get a correct program this way, the code cannot contain any algorithmic insight into the data for any reason other than pure chance. As you’ll know the software crisis is still ongoing,and Critticall is just another example of people not understanding the true nature of programs and trying to get something for nothing.
This is as good an opportunity as any to link this presentation and thesecoolpapers about evolving patches for real-world C programs in minutes. In light of an actual working example of a computer program that fixes other programs’ bugs, broad brush claims about how programs can’t have algorithmic insight, won’t lead to correct programs, and lack “the redundancy or flexibility of genetics” won’t really wash.
This research (it is a single piece of research written up in 4 different ways) simply concerns taking a rough piece of wood (a program that is almost correct) and sanding down the edges (fixing a small number of test cases). I didn’t say “programs can’t have algorithmic insight”, I said randomly generating problems by “evolutionary” means will not contain insight by any other means than coincidence. The research you linked doesn’t contradict that because all it concerns is smoothing down rough edges. Degeneracy is one of the fundamental features of a genetic code that is required for the theory of evolution to apply so I don’t know why you say that “doesn’t wash”, it’s a fact.
Program code does not have the redundancy or flexibility of genetics
A well-written program does not have it, but many real-life programs do. In my programming environments I have usually warnings turned very sensitive, for example they complain about declared but never used variables/functions, and when I read other people’s code, I see this often.
with syntax errors it shatters if a single character is wrong
Could we see a program with syntax error as a fatal mutation? Fortunately, these fatal mutations are easy to detect. (Alternatively, we could mutate the parse trees.)
Another mistake is extrapolating from a finite data source: you can’t expect to get a correct program this way, the code cannot contain any algorithmic insight into the data for any reason other than pure chance.
Still I have seen humans do this and take their paychecks. Yes, I would expect such evolved program be full of bugs; but real-life applications also contain bugs and it does not prevent them from being useful. I admit I would hate to read the machine-generated code, or even worse fix the bugs or add new functionality. But this is like saying that reading DNA is, ahem, not very user-friendly. Sure, it is not; but it works.
Two more things to consider. First, unit tests—the better unit tests you make, the higher chance is that a random program that passes them is correct. If test-driven development can reduce random human errors, it can also reduce computer-generated error, though of course the computer will make much more errors. Second, we are OK with randomized algorithms having some chance of error, as long at the chance is very very small.
So the prior probability of computer generating a correct algorithm is small, but non-zero. By making unit tests we increase the probability that the result will be correct. If we can increase this probability from epsilon to 1 - epsilon, we win.
Randomized algorithms are completely different to random program code: Randomized algorithms actually fit a precise specification while randomized algorithms just (try to) fit a range of test cases (hopefully I don’t need to explain why a range of test cases is not an adequete specification). It’s a misconception that unit tests help increase the probability of mistakes, A unit test cannot ever tell you that your program is free of errors. They’re the complete wrong paradaigm for making correct programs.
You are trying to submit too fast. try again in 9 minutes. every time I post a comment.
Randomized algorithms are completely different to random program code
The distinction between data and code is mostly useful, but in some situations we intentionally cross the boundary. If you have a text editor, then the separation is valid—the program is code, and the text documents are data. But if you have a compiler, you have a compiler program, which is code, and the compiled program, which is data, but on a different level it is also code. If you have a dynamically loaded library, while you load it, it is data, but then you start treating it as code by running it. If you have an interpreter, which is a code, you emulate the program, which on one level is data and simultaneously on another level is code. If you have a decompiler or debugger, which is code, it takes another code and treats it like data. A bootloader loads data and then starts them as a code. Etc.
This means that treating data as code, or code as data, is not necessarily a mistake or confusion of levels, and sometimes it is unavoidable. How else could you for example start a program, which originally exists only as a sequence of bytes in a file on a disk? You treat it as data when loading the file to a memory, and then you treat it as a code.
A unit test cannot ever tell you that your program is free of errors.
Some maths here: let X be a problem we want to solve, and let’s suppose that the problem is algorithmically solvable. Further, let P be a set of all programs, PX a set of programs that correctly solve problem X, U is a unit test and PU is a set of programs that passes the unit test.
We will call U a correct unit test for problem X if each program from PX belongs to PU (a correct program always passes the unit test) and some programs in P-PX don’t belong to PU (some incorrect programs don’t pass the unit test). In other words, PX is a subset of PU, which is a strict subset of P.
A random code generator will provide a random distribution R of programs. Now the question is whether R ∩ PX is non-empty, in other words whether the generator is capable of generating a correct solution assuming it is incredibly lucky. This depends on the generator and the problem. For example if generator only creates random programs up to 100 lines of code, but the shortest possible code that solves the problem has 150 lines, then the generator cannot create a correct program. But if the generator is capable to create a program of any length, or more precisely capable of creating any program, then if the program is algorithmically solvable, the generator will generate the correct program with a non-zero probability. Just imagine that you wrote the correct program—well, with some non-zero probability, the output of the random generator will be exactly the same text. The only problem is that p(x in PX | x in R) is very small. Sometimes so small that even if we could reliably automatically test whether a program belongs to PX, we just wouldn’t have enough time to wait until first such program is generated. But let’s assume that it is a simple enough program, so if we generate a few millions of programs, we can expect that some of them belong to PX.
Yeah, we have a couple assumptions here. Again, they are: (1) a correct program exists, and (2) some correct program has Kolmogorov complexity small enough (i.e. has a short code, if written optimally) that we can realistically wait until a random generator generates it. Actually, also (3) some correct program with small Kolmogorov complexity runs fast enough that it will pass the unit test in given time limit.
Assuming this all, we have a set of generated random programs R, some of them belong to PX, and we are trying to find them. If we replace the set R with R ∩ PU, we have a smaller set, and R ∩ PX is a subset of R ∩ PU, so we did not lose any solution. In probabilistic terms, p(x in R ∩ PX | x in R ∩ PU) is greater than p(x in PX | x in R), in human speech: if we choose a random program that passes the unit test, we have better chance than if we choose any random program.
Question is, how much does the test U reduce the set of all programs. If only 1% of programs passes the unit test, then p(x in R ∩ PX | x in R ∩ PU) = 100×p(x in R ∩ PX | x in R), so by doing the unit test we increased our luck 100 times… though the result may be still pretty close to zero. We can filter the results by another test, but now we need that only a few of programs that have passed the previous test can pass the next test. Which may be difficult; and also pretty difficult to estimate in advance. We can only measure it afterwards, how many programs that passed the first test were filtered by the second one. The idea is, if a generated program has chance 1:10^10 to be correct, if we can make a sequence of 5 unit tests where each one filters out 99% of programs that have passed the previous tests, we have a decent chance at the end.
Of course to make all this work, we need a decent generator, which will give programs reasonable prior probabilities, therefore it would be better to generate parse trees than source texts, etc. We could even invent a new language that could allow us to write shorter programs. For example it should have an instruction “forAllItemsInSet(SET) { X | CODE }” made of 4 symbols, instead of “for (X = 1; X < SET.size; X++) { CODE }” with 11 symbols, etc.
This idea is based on a whole range of confusions and misunderstandings. Program code does not have the redundancy or flexibility of genetics, as you know with syntax errors it shatters if a single character is wrong—for this reason it’s a mistake to use it as the carrier for the genetic paradaigm. Another mistake is extrapolating from a finite data source: you can’t expect to get a correct program this way, the code cannot contain any algorithmic insight into the data for any reason other than pure chance. As you’ll know the software crisis is still ongoing,and Critticall is just another example of people not understanding the true nature of programs and trying to get something for nothing.
This is as good an opportunity as any to link this presentation and these cool papers about evolving patches for real-world C programs in minutes. In light of an actual working example of a computer program that fixes other programs’ bugs, broad brush claims about how programs can’t have algorithmic insight, won’t lead to correct programs, and lack “the redundancy or flexibility of genetics” won’t really wash.
This research (it is a single piece of research written up in 4 different ways) simply concerns taking a rough piece of wood (a program that is almost correct) and sanding down the edges (fixing a small number of test cases). I didn’t say “programs can’t have algorithmic insight”, I said randomly generating problems by “evolutionary” means will not contain insight by any other means than coincidence. The research you linked doesn’t contradict that because all it concerns is smoothing down rough edges. Degeneracy is one of the fundamental features of a genetic code that is required for the theory of evolution to apply so I don’t know why you say that “doesn’t wash”, it’s a fact.
Degeneracy is an important feature of DNA-based evolution (biological evolution as-it-is) but it’s not fundamental to evolution as-it-could-be.
And a single mutation in DNA in the wrong place can be disastrous, too.
A well-written program does not have it, but many real-life programs do. In my programming environments I have usually warnings turned very sensitive, for example they complain about declared but never used variables/functions, and when I read other people’s code, I see this often.
Could we see a program with syntax error as a fatal mutation? Fortunately, these fatal mutations are easy to detect. (Alternatively, we could mutate the parse trees.)
Still I have seen humans do this and take their paychecks. Yes, I would expect such evolved program be full of bugs; but real-life applications also contain bugs and it does not prevent them from being useful. I admit I would hate to read the machine-generated code, or even worse fix the bugs or add new functionality. But this is like saying that reading DNA is, ahem, not very user-friendly. Sure, it is not; but it works.
Two more things to consider. First, unit tests—the better unit tests you make, the higher chance is that a random program that passes them is correct. If test-driven development can reduce random human errors, it can also reduce computer-generated error, though of course the computer will make much more errors. Second, we are OK with randomized algorithms having some chance of error, as long at the chance is very very small.
So the prior probability of computer generating a correct algorithm is small, but non-zero. By making unit tests we increase the probability that the result will be correct. If we can increase this probability from epsilon to 1 - epsilon, we win.
Randomized algorithms are completely different to random program code: Randomized algorithms actually fit a precise specification while randomized algorithms just (try to) fit a range of test cases (hopefully I don’t need to explain why a range of test cases is not an adequete specification). It’s a misconception that unit tests help increase the probability of mistakes, A unit test cannot ever tell you that your program is free of errors. They’re the complete wrong paradaigm for making correct programs.
You are trying to submit too fast. try again in 9 minutes. every time I post a comment.
The distinction between data and code is mostly useful, but in some situations we intentionally cross the boundary. If you have a text editor, then the separation is valid—the program is code, and the text documents are data. But if you have a compiler, you have a compiler program, which is code, and the compiled program, which is data, but on a different level it is also code. If you have a dynamically loaded library, while you load it, it is data, but then you start treating it as code by running it. If you have an interpreter, which is a code, you emulate the program, which on one level is data and simultaneously on another level is code. If you have a decompiler or debugger, which is code, it takes another code and treats it like data. A bootloader loads data and then starts them as a code. Etc.
This means that treating data as code, or code as data, is not necessarily a mistake or confusion of levels, and sometimes it is unavoidable. How else could you for example start a program, which originally exists only as a sequence of bytes in a file on a disk? You treat it as data when loading the file to a memory, and then you treat it as a code.
Some maths here: let X be a problem we want to solve, and let’s suppose that the problem is algorithmically solvable. Further, let P be a set of all programs, PX a set of programs that correctly solve problem X, U is a unit test and PU is a set of programs that passes the unit test.
We will call U a correct unit test for problem X if each program from PX belongs to PU (a correct program always passes the unit test) and some programs in P-PX don’t belong to PU (some incorrect programs don’t pass the unit test). In other words, PX is a subset of PU, which is a strict subset of P.
A random code generator will provide a random distribution R of programs. Now the question is whether R ∩ PX is non-empty, in other words whether the generator is capable of generating a correct solution assuming it is incredibly lucky. This depends on the generator and the problem. For example if generator only creates random programs up to 100 lines of code, but the shortest possible code that solves the problem has 150 lines, then the generator cannot create a correct program. But if the generator is capable to create a program of any length, or more precisely capable of creating any program, then if the program is algorithmically solvable, the generator will generate the correct program with a non-zero probability. Just imagine that you wrote the correct program—well, with some non-zero probability, the output of the random generator will be exactly the same text. The only problem is that p(x in PX | x in R) is very small. Sometimes so small that even if we could reliably automatically test whether a program belongs to PX, we just wouldn’t have enough time to wait until first such program is generated. But let’s assume that it is a simple enough program, so if we generate a few millions of programs, we can expect that some of them belong to PX.
Yeah, we have a couple assumptions here. Again, they are: (1) a correct program exists, and (2) some correct program has Kolmogorov complexity small enough (i.e. has a short code, if written optimally) that we can realistically wait until a random generator generates it. Actually, also (3) some correct program with small Kolmogorov complexity runs fast enough that it will pass the unit test in given time limit.
Assuming this all, we have a set of generated random programs R, some of them belong to PX, and we are trying to find them. If we replace the set R with R ∩ PU, we have a smaller set, and R ∩ PX is a subset of R ∩ PU, so we did not lose any solution. In probabilistic terms, p(x in R ∩ PX | x in R ∩ PU) is greater than p(x in PX | x in R), in human speech: if we choose a random program that passes the unit test, we have better chance than if we choose any random program.
Question is, how much does the test U reduce the set of all programs. If only 1% of programs passes the unit test, then p(x in R ∩ PX | x in R ∩ PU) = 100×p(x in R ∩ PX | x in R), so by doing the unit test we increased our luck 100 times… though the result may be still pretty close to zero. We can filter the results by another test, but now we need that only a few of programs that have passed the previous test can pass the next test. Which may be difficult; and also pretty difficult to estimate in advance. We can only measure it afterwards, how many programs that passed the first test were filtered by the second one. The idea is, if a generated program has chance 1:10^10 to be correct, if we can make a sequence of 5 unit tests where each one filters out 99% of programs that have passed the previous tests, we have a decent chance at the end.
Of course to make all this work, we need a decent generator, which will give programs reasonable prior probabilities, therefore it would be better to generate parse trees than source texts, etc. We could even invent a new language that could allow us to write shorter programs. For example it should have an instruction “forAllItemsInSet(SET) { X | CODE }” made of 4 symbols, instead of “for (X = 1; X < SET.size; X++) { CODE }” with 11 symbols, etc.