bsol comments on Estimating the Number of Players from Game Result Percentages

bsol 29 Apr 2024 18:19 UTC
1 point
0
For one, with e = 0.001 the algorithm N([0.64, 0.36, 0, 0, 0]) = 3 gives a wrong result. Using e = 0.0001 would give a better answer in this case. I haven’t given more thought though on what e is optimal.
Though the bigger issue is the number of players can’t strictly be computed based on percentages alone.
It would be nice if we could make the assumption that the least common multiple (with some leeway to account for rounding) is the correct answer, but I don’t feel comfortable with that. This assumption comes from it being just past midnight and therefore we assume the number of players is “small.” But how small?
What should N([0.5, 0.5, 0, 0, 0]) return? It could be any multiple of 2, many of which are small.
The only way I can think to get a correct answer is to frequently refresh the page to get a time series of percentages. Then an algorithm could possibly give the number of players.
For example say you saw [0.5, 0.5, 0, 0, 0] and then refresh and immediately after saw [0.444, 0.444, 0.111, 0, 0]. If you were confident that only 1 player joined between refreshes, you would know that the number of players is now 9. If you can’t be sure it was only 1 player, it would be a lot more complicated though.
- gwern 29 Apr 2024 23:25 UTC
  2 points
  0
  Parent
  I’m not sure what “margin of error” is. This is just rounding, is it not? It’s not like the website is adding random epsilon numbers to screw with you: it is simply rounding off percentages. They are exact and deterministic up to rounding.
  
  Though the bigger issue is the number of players can’t strictly be computed based on percentages alone.
  
  Since it’s a non-deterministic problem, you’d represent all possible answers in ascending order as a lazy generator. You can then filter it by any known constraints or requirements (maybe you know players have to be paired, so it’s always an even number of players, and you can filter out all odd values). Since the number of possible valid values will increase greatly as the total N increases, this might get slow and require some sort of sieve. (At a guess, since it’s rounding, the range of possible values presumably increases rapidly, and so it might make more sense to instead returns pairs of (lower,upper) bounds?)
  
  Personally, I would first start with the forward problem, since it is so simple. Then I could test any algorithms or tweaks by generating a random N of players and testing that all of the generator values ⇐ N are correct.
  
  This has the benefit that you can also easily do simple Bayesian updates by ABC without the rigmarole of pymc or Stan etc: just draw a sample from the prior over the n players, feed it in, see if you replicate the exact observed %s, and if not, delete the sample; the samples you keep are the new posterior.