I was string #54, and scored −1.7 in round 2, fairly middle of the pack. My probabilities:
24 × 0%, 3 were real (13%)
4 × 5%, 0 were real (0%)
5 × 10%, 3 were real (60%)
4 × 15%, 3 were real (75%) (!)
2 × 20%, 1 was real (50%)
7 × 25%, 1 was real (14%)
8 × 30%, 7 were real (88%) (!)
14 × 75%, 10 were real (71%)
40 × 80%, 24 were real (60%)
16 × 85%, 10 were real (63%)
I think the thing I found difficult was I didn’t know how to do a bayesian update. I can calculate the probability of a statistic under the “random” hypothesis, but I had no idea what probability to assign it under the “fake” hypothesis, so what next?
In the end I just sorted the strings by a certain statistic and went with my gut, taking into account other info like “number of 1s” and “was this my own string”, and starting with a baseline of “given the number of fakes I’m pretty sure I’ve eliminated, I don’t think I can be more than, uh, about 85% confident that any given string is real”. I worked from both ends and tried to get my average probability somewhere close to 50%, but that left a lot of space in the middle that I didn’t end up using. Clearly this didn’t work very well.
(The statistic was something like, probability of the observed “longest run length” combined with “number of changes from 1→0 or 0→1″. The two aren’t independent, so I got longest run length in closed form and did many simulations to get a rough distribution for number of changes given that.)
I was string #54, and scored −1.7 in round 2, fairly middle of the pack. My probabilities:
24 × 0%, 3 were real (13%)
4 × 5%, 0 were real (0%)
5 × 10%, 3 were real (60%)
4 × 15%, 3 were real (75%) (!)
2 × 20%, 1 was real (50%)
7 × 25%, 1 was real (14%)
8 × 30%, 7 were real (88%) (!)
14 × 75%, 10 were real (71%)
40 × 80%, 24 were real (60%)
16 × 85%, 10 were real (63%)
I think the thing I found difficult was I didn’t know how to do a bayesian update. I can calculate the probability of a statistic under the “random” hypothesis, but I had no idea what probability to assign it under the “fake” hypothesis, so what next?
In the end I just sorted the strings by a certain statistic and went with my gut, taking into account other info like “number of 1s” and “was this my own string”, and starting with a baseline of “given the number of fakes I’m pretty sure I’ve eliminated, I don’t think I can be more than, uh, about 85% confident that any given string is real”. I worked from both ends and tried to get my average probability somewhere close to 50%, but that left a lot of space in the middle that I didn’t end up using. Clearly this didn’t work very well.
(The statistic was something like, probability of the observed “longest run length” combined with “number of changes from 1→0 or 0→1″. The two aren’t independent, so I got longest run length in closed form and did many simulations to get a rough distribution for number of changes given that.)