Thanks Emile,
Is there anything you’d like to see added?
For example, I was thinking of running it on nodejs and logging the scores of players, so you could see how you compare. (I don’t have a way to host this, right now, though.)
Or another possibility is to add diagnostics. E.g. were you setting your guess too high systematically or was it fluctuating more than the data would really say it should (under some models for the prior/posterior, say).
Also, I’d be happy to have pointers to your calibration apps or others you’ve found useful.
It’s certainly in the right spirit. He’s reasoning backwards in the same way Bayesian reasoning does: here’s what I see; here’s what I know about possible mechanisms for how that could be observed and their prior probabilities; so here what I think is most likely to be really going on.