That is the sampling part of AlphaGo. When you ask it a question, if you let take more and more samples as potential solutions, the more likely you are to get a correct answer. The success rate of this seems to grow log linearly with the number of samples, which is pretty poor.
Oh, I see, I thought you meant AlphaCode had some sort of recurrency (does it?), i.e. iterating over its answer like a human would when debugging code. Log-linear doesn’t sound bad, you wouldn’t expect a human to be able to write a correct program on the first try without debugging either, would you?
It is pretty bad because the constant matters here. If you have a method which achieves a success rate % of C log( 1+ k) where k is the number of samples then, if C =1, you’d need 2^100~10^30 many samples per inference to get all the competitive programing questions right. As it stands, C seems pretty small. For k=100000 you get solve maybe 33% of problems. And it isn’t much better than k=1000.
That’s leaving aside the question of whether you will continue to get log-linear improvements, which I find fairly unlikely. Though if you could get C to something like 10 or 20 with one or two innovations, I’d be scared or even terrified.
Yeah, basically, but I was using log_2(x) instead of log_10(x) in my made up example. Here’s some actual data. What they mean by “unlimited attempts” is that there is no filter on the sample solutions, so they are all submitted to codebase. I expect false positives/incredibly slow to be more likely than performance actually increasing without limit.
What do you mean by “give it enough tries”?
That is the sampling part of AlphaGo. When you ask it a question, if you let take more and more samples as potential solutions, the more likely you are to get a correct answer. The success rate of this seems to grow log linearly with the number of samples, which is pretty poor.
Oh, I see, I thought you meant AlphaCode had some sort of recurrency (does it?), i.e. iterating over its answer like a human would when debugging code. Log-linear doesn’t sound bad, you wouldn’t expect a human to be able to write a correct program on the first try without debugging either, would you?
It is pretty bad because the constant matters here. If you have a method which achieves a success rate % of C log( 1+ k) where k is the number of samples then, if C =1, you’d need 2^100~10^30 many samples per inference to get all the competitive programing questions right. As it stands, C seems pretty small. For k=100000 you get solve maybe 33% of problems. And it isn’t much better than k=1000.
That’s leaving aside the question of whether you will continue to get log-linear improvements, which I find fairly unlikely. Though if you could get C to something like 10 or 20 with one or two innovations, I’d be scared or even terrified.
To check my understanding:
If C was 20, then 100,000 samples would be enough to solve approximately 100% of these problems?
But instead C is something like 6-7, because 100,000 samples gets you to 33%?
Yeah, basically, but I was using log_2(x) instead of log_10(x) in my made up example. Here’s some actual data. What they mean by “unlimited attempts” is that there is no filter on the sample solutions, so they are all submitted to codebase. I expect false positives/incredibly slow to be more likely than performance actually increasing without limit.