This is a long answer, in which I list around ten concrete problem types that such a site could have.
Before I go into my concrete proposals, here are some general points:
I think the rationality community has focused too much on quantifying subjective uncertainty / probabilistic calibration, and too little on quantitative thinking and numeric literacy in general.
The set of possible exercises for the latter is way larger and pretty unexplored.
There are lots of existing calibration tools, so I’d caution against the failure mode of making Yet Another Calibration Tool.
(Though I agree with abstractapplic that a calibration tool that’s Actually Really Good still doesn’t exist.)
More generally, I feel like at least I (and possibly the rationality community at large) has gotten too fixated on a few particular forms of rationality training: cognitive bias training, calibration training, spotting logical fallacies.
The low-hanging fruit here might be mostly plucked / pushing the frontier requires some thought (c.f. abstractapplic’s comment).
Project Euler is worth looking as an example of a well-executed problem database. A few things I like about it:
A comment thread for those who have solved the problem.
A wide distribution of problem difficulty (with those difficulties shown by the problems).
Numbers Going Up when you solve problems is pretty motivating (as are public leaderboards).
The obvious thing: there is a large diverse set of original, high-quality problems.
(Project Euler has the big benefit that there is always an objective numerical answer that can be used for verifying user solutions; rationality has a harder task here.)
Two key features a good site would (IMO) have:
Support a wide variety of problem types. You say that LeetCode has the issue of overfitting; I think the same holds for rationality training. The skillset we are trying to develop is large, too.
Allow anyone to submit problems with a low barrier. This seems really important if you want to have a large, high-quality problem set.
I feel like the following two are separate entities worth distinguishing:
High-quantity examples “covering the basics”. Calibration training is a central example here. Completing a single instance of the exercise would take some seconds or minutes at top, and the idea is that you do lots of repetitions.
High-effort “advanced examples”. The “Dungeons and Data Science” exercises strike me as a central example here, where completion presumably takes at least minutes and maybe at least hours.
(At the very least, the UI / site design should think about “an average user completes 0-10 tasks of this form” and “an average user completes 300 tasks of this form” separately.)
And overall I think that having an Actually Really Good website for rationality training would be extremely valuable, so I’m supportive of efforts in this direction.
I brainstormed some problem types that I think such a site could include.
1: Probabilistic calibration training for quantifying uncertainty
This is the obvious one. I already commented on this, in particular that I don’t think this should be the main focus. (But if one were to execute this: I think that the lack of quantity and/or diversity of questions in existing tools is a core reason I don’t do this more.)
2: “Statistical calibration”
I feel like there are lots of quantitative statistics one could ask questions about. Here are some basic ones:
What is the GPD of [country]?
What share of [country]’s national budget goes to [domain]?
How many people work in [sector/company]?
How many people die of [cause] yearly?
Various economic trends, e.g. productivity gains / price drops in various sectors over time.
How much time do people spend doing [activity] daily/yearly?
(For more ideas, you can e.g. look at Statistics Finland’s list here. And there just are various quantitative statistics floating around: e.g. today I learned that salt intake in 1800s Europe was ~18g/day, which sure is more than I’d have guessed.)
3: Quantitative modeling
(The line between this and the previous one is blurry.)
Fermi estimates are the classic one here; see Quantified Intuitions’ The Estimation Game. See also this recent post that’s thematically related.
There’s room for more sophisticated quantitative modeling, too. Here are two examples to illustrate what I have in mind:
Example 1. How much value would it create to increase the speed of all passenger airplanes by 5%?
Example 2. Consider a company that has two options: either have its employees visit nearby restaurants for lunch, or hire food personnel and start serving lunch at its own spaces. How large does the company need to be for the second one to become profitable?
It’s not obvious how to model these phenomena, and the questions are (intentionally) underspecified; I think the interesting part would be comparing modeling choices and estimates of parameters with different users rather than simply comparing outputs.
I suspect there’s a lot of pedagogically useful examples of poor discourse happening the wild (e.g. tilted or poorly researched newspaper articles, heated discussions in Twitter or elsewhere). This feels like a better way to execute what the “spot cognitive biases / logical fallacies” exercises aim to do. Answering questions like “How is this text misleading?”, “How did this conversation go off the rails?” or “What would have been a better response instead of what was said here?” and then comparing one’s notes to others seems like it could make a useful exercise.
6: Re-deriving established concepts
Recently it occurred to me that I didn’t know how inflation works and what its upsides are. Working this through (with some vague memories and hints from my friend) felt like a pretty good exercise to me.
Another example: I don’t know how people make vacuums in practice, but when I sat and thought it through, it wasn’t too hard to think of a way to create a space with much less air molecules than atmosphere with pretty simple tools.
Third example: I’ve had partial success prompting people to re-derive the notion of Shapley value.
I like this sort of problems: they are a bit confusing, in that part of the problem is asking the right questions, but there are established, correct (or at least extremely good) solutions.
(Of course someone might already know the canonical answer to any given question, but that’s fine. I think there are lots of good examples in economics—e.g. Vickrey auction, prediction markets, why price controls are bad / price gouging is pretty good, “fair” betting odds—for this, but maybe this is just because I don’t know much economics.)
An exercise I did at some point is “Generate 25 ideas for interventions that might improve learning and other outcomes in public education”. I feel like the ability to come up with multiple ideas to a given problem is pretty useful (e.g. this is something I face in my work all the time, and this list itself is an example of “think of many things”). This is similar to the babble exercises, though I’m picturing more “serious” prompts than the ones there.
Another way to train this skill would be to have interactive exercises that are about doing science (c.f. the 2-4-6 problem) and aiming to complete them as efficiently as possible (This article is thematically relevant.)
(Discussion of half-developed ideas that I don’t yet quite see how to turn into exercises.)
8: Getting better results with more effort
Two personal anecdotes:
I used to play chess as a child, but stopped at some point. When I years later played again, I noticed something: my quick intuitions felt just as weak as before, but I felt like I was better at thinking about what to think, and using more time to make better decisions by thinking more. Whereas when I was younger, I remember often making decisions pretty quickly and not seeing what else I could do.
I did math olympiads in high school. Especially early on, some problems just felt fundamentally unapproachable to me—I just couldn’t make any progress on them. Whereas nowadays when I encounter problems, in math or otherwise, I’m rarely stuck in this sense. “Oh, obviously if I just spent more time on this, I could figure this stuff out eventually”
A type of exercise where you are supposed to first give an initial answer after X time, and then are allowed to revise your answer for Y time, seems like it could train this and other skills. (Maybe brainstorming exercises of the form “if you had a week/month/year of time, how would you solve [problem]?” could help, too.)
9: I think there’s something in the genre of “be specific”, and more specifically in “operationalize vague claims into something that has a truth value”, that’d be nice to have in large-quantity exercise form. See this post for related discussion. I’m also reminded of this comment.
There are definitely things not covered by this list; in particular, I have little of directly training to apply all this in real life (c.f. TAPs, which is definitely a very real-life-y technique). So while I did keep practicality in mind, I’d be happy to see exercises that bridge the theory-practice-gap even more.
This is a long answer, in which I list around ten concrete problem types that such a site could have.
Before I go into my concrete proposals, here are some general points:
I think the rationality community has focused too much on quantifying subjective uncertainty / probabilistic calibration, and too little on quantitative thinking and numeric literacy in general.
The set of possible exercises for the latter is way larger and pretty unexplored.
There are lots of existing calibration tools, so I’d caution against the failure mode of making Yet Another Calibration Tool.
(Though I agree with abstractapplic that a calibration tool that’s Actually Really Good still doesn’t exist.)
More generally, I feel like at least I (and possibly the rationality community at large) has gotten too fixated on a few particular forms of rationality training: cognitive bias training, calibration training, spotting logical fallacies.
The low-hanging fruit here might be mostly plucked / pushing the frontier requires some thought (c.f. abstractapplic’s comment).
Project Euler is worth looking as an example of a well-executed problem database. A few things I like about it:
A comment thread for those who have solved the problem.
A wide distribution of problem difficulty (with those difficulties shown by the problems).
Numbers Going Up when you solve problems is pretty motivating (as are public leaderboards).
The obvious thing: there is a large diverse set of original, high-quality problems.
(Project Euler has the big benefit that there is always an objective numerical answer that can be used for verifying user solutions; rationality has a harder task here.)
Two key features a good site would (IMO) have:
Support a wide variety of problem types. You say that LeetCode has the issue of overfitting; I think the same holds for rationality training. The skillset we are trying to develop is large, too.
Allow anyone to submit problems with a low barrier. This seems really important if you want to have a large, high-quality problem set.
I feel like the following two are separate entities worth distinguishing:
High-quantity examples “covering the basics”. Calibration training is a central example here. Completing a single instance of the exercise would take some seconds or minutes at top, and the idea is that you do lots of repetitions.
High-effort “advanced examples”. The “Dungeons and Data Science” exercises strike me as a central example here, where completion presumably takes at least minutes and maybe at least hours.
(At the very least, the UI / site design should think about “an average user completes 0-10 tasks of this form” and “an average user completes 300 tasks of this form” separately.)
And overall I think that having an Actually Really Good website for rationality training would be extremely valuable, so I’m supportive of efforts in this direction.
I brainstormed some problem types that I think such a site could include.
1: Probabilistic calibration training for quantifying uncertainty
This is the obvious one. I already commented on this, in particular that I don’t think this should be the main focus. (But if one were to execute this: I think that the lack of quantity and/or diversity of questions in existing tools is a core reason I don’t do this more.)
2: “Statistical calibration”
I feel like there are lots of quantitative statistics one could ask questions about. Here are some basic ones:
What is the GPD of [country]?
What share of [country]’s national budget goes to [domain]?
How many people work in [sector/company]?
How many people die of [cause] yearly?
Various economic trends, e.g. productivity gains / price drops in various sectors over time.
How much time do people spend doing [activity] daily/yearly?
(For more ideas, you can e.g. look at Statistics Finland’s list here. And there just are various quantitative statistics floating around: e.g. today I learned that salt intake in 1800s Europe was ~18g/day, which sure is more than I’d have guessed.)
3: Quantitative modeling
(The line between this and the previous one is blurry.)
Fermi estimates are the classic one here; see Quantified Intuitions’ The Estimation Game. See also this recent post that’s thematically related.
There’s room for more sophisticated quantitative modeling, too. Here are two examples to illustrate what I have in mind:
Example 1. How much value would it create to increase the speed of all passenger airplanes by 5%?
Example 2. Consider a company that has two options: either have its employees visit nearby restaurants for lunch, or hire food personnel and start serving lunch at its own spaces. How large does the company need to be for the second one to become profitable?
It’s not obvious how to model these phenomena, and the questions are (intentionally) underspecified; I think the interesting part would be comparing modeling choices and estimates of parameters with different users rather than simply comparing outputs.
4: The Wikipedia false-modifications game
See this post for discussion.
5: Discourse-gone-astray in the wild
(Less confident on this one.)
I suspect there’s a lot of pedagogically useful examples of poor discourse happening the wild (e.g. tilted or poorly researched newspaper articles, heated discussions in Twitter or elsewhere). This feels like a better way to execute what the “spot cognitive biases / logical fallacies” exercises aim to do. Answering questions like “How is this text misleading?”, “How did this conversation go off the rails?” or “What would have been a better response instead of what was said here?” and then comparing one’s notes to others seems like it could make a useful exercise.
6: Re-deriving established concepts
Recently it occurred to me that I didn’t know how inflation works and what its upsides are. Working this through (with some vague memories and hints from my friend) felt like a pretty good exercise to me.
Another example: I don’t know how people make vacuums in practice, but when I sat and thought it through, it wasn’t too hard to think of a way to create a space with much less air molecules than atmosphere with pretty simple tools.
Third example: I’ve had partial success prompting people to re-derive the notion of Shapley value.
I like this sort of problems: they are a bit confusing, in that part of the problem is asking the right questions, but there are established, correct (or at least extremely good) solutions.
(Of course someone might already know the canonical answer to any given question, but that’s fine. I think there are lots of good examples in economics—e.g. Vickrey auction, prediction markets, why price controls are bad / price gouging is pretty good, “fair” betting odds—for this, but maybe this is just because I don’t know much economics.)
7: Generating multiple ideas/interventions/solutions/hypotheses
An exercise I did at some point is “Generate 25 ideas for interventions that might improve learning and other outcomes in public education”. I feel like the ability to come up with multiple ideas to a given problem is pretty useful (e.g. this is something I face in my work all the time, and this list itself is an example of “think of many things”). This is similar to the babble exercises, though I’m picturing more “serious” prompts than the ones there.
Another way to train this skill would be to have interactive exercises that are about doing science (c.f. the 2-4-6 problem) and aiming to complete them as efficiently as possible (This article is thematically relevant.)
(Discussion of half-developed ideas that I don’t yet quite see how to turn into exercises.)
8: Getting better results with more effort
Two personal anecdotes:
I used to play chess as a child, but stopped at some point. When I years later played again, I noticed something: my quick intuitions felt just as weak as before, but I felt like I was better at thinking about what to think, and using more time to make better decisions by thinking more. Whereas when I was younger, I remember often making decisions pretty quickly and not seeing what else I could do.
I did math olympiads in high school. Especially early on, some problems just felt fundamentally unapproachable to me—I just couldn’t make any progress on them. Whereas nowadays when I encounter problems, in math or otherwise, I’m rarely stuck in this sense. “Oh, obviously if I just spent more time on this, I could figure this stuff out eventually”
A type of exercise where you are supposed to first give an initial answer after X time, and then are allowed to revise your answer for Y time, seems like it could train this and other skills. (Maybe brainstorming exercises of the form “if you had a week/month/year of time, how would you solve [problem]?” could help, too.)
9: I think there’s something in the genre of “be specific”, and more specifically in “operationalize vague claims into something that has a truth value”, that’d be nice to have in large-quantity exercise form. See this post for related discussion. I’m also reminded of this comment.
There are definitely things not covered by this list; in particular, I have little of directly training to apply all this in real life (c.f. TAPs, which is definitely a very real-life-y technique). So while I did keep practicality in mind, I’d be happy to see exercises that bridge the theory-practice-gap even more.
Also, the Dungeons and Data Science and the stuff Raymond is doing are something to keep in mind.