Bayes-Up: An App for Sharing Bayesian-MCQ

Inspired by Lê Nguyên Hoang’s post on Bayesian Examination, I have been developing (as a hobby) a new app called Bayes-Up (available at: bayes-up.web.app). The app is now in a state where it is working well enough to be shared with others. In this post I list a few things you can do with it, because I expect that it will spark some interest within the community.

  • Test and improve your calibration: Bayes-Up uses a collection of good quality trivia questions from the open trivia database. The main point of the app is that you can find a list of multiple choice quizzes, answer questions by assigning probabilities to each of the possible choices, receive a score based off a quadratic proper scoring rule and later find statistics about the quality of your calibration. A good place to start is the quiz from the book Factfulness by Hans Rosling that I included in the app.

  • Create quizzes and upload them. There exists already a small number of calibration training apps. Bayes-Up differs mainly because it allows to upload and share your own quizzes. This can solve one of the problems of calibration apps which is to create good quality content (quizzes /​ questions). If you are a teacher and want your students to develop more metacognitive skills and intellectual honesty, or if you are organizing workshops on probability calibration, Bayes-Up can make it easier for you. To add a quiz, simply write it in a spreadsheet, export it as a CSV file and upload it in Bayes-Up.

  • Recommend UI improvements, new features, report bugs, or contribute to the implementation. Only very little feedback has been collected so far and certainly a lot could be improved with little effort. The code of the app is open source and hosted on github.

  • Analyse the data from Bayes-Up users. So far about 30′000 questions have been answered by about 1′300 users since the end of December 2019. The collected data is available at this link and will likely grow in the following months. Simple questions that analysing this data could answer are: Do users become better calibrated over time? Is calibration topic-specific or transferrable? How can the answers of users with unknown calibration and unknown knowledge be aggregated to predict the right answers to every question? Let me know if you want to do something with it or need a better documentation.