Most of these ideas seem good in isolation (or pointing in the direction of good things). I think they’d add up to significant complexity cost for the page, so figuring out to what degree they are worth adding to the overall cognitive load for the site will be an issue.
I agree that we should be very careful about adding complexity, especially when the complexity is added to every single post and comment. I can think of a few things that might reduce the complexity:
1. Instead of having a big eye-catching Arbital-style visualization of the probabilities displayed for every post and comment on the site, display an aggregate probability in boring grey text that matches the other text, and have people hover/click on that text to view/predict. E.g., your comment header could look like this:
Raemon +2 votes ∧ ∨ 𝗣 ≈ 0.9 6h
You could still include the full Arbital-style visualization within posts and comments, but it would be a deliberate choice by the post/comment author, rather than being a default. In cases where not enough people have assigned probabilities to the post/comment for the system to think it’s worth displaying an aggregate probability, the default visualization will be 𝗣 = ?.
2. Use a functionally similar (though visually distinct) click-and-drag sliding scale for the voting system’s karma-weighting that we use (see above) and for assigning probabilities, so some of the basic habits and motor intuitions people build up with karma can also be used for probability.
In general: My vague understanding of Oliver and co.’s vision for LessWrong is that LessWrong is to be a site where probability assignments, predictions, cruxes, bets, etc. play a huge role. Having easy infrastructure for making and comparing probability assignments might be more of a core feature than the full range of “buttons”, particularly if some of the key buttons can themselves be implemented as probability assignments.
More specifics of how I might implement a probability system like this:
Before you can assign probabilities, you need a “level-1 calibration” badge, achieved via hitting a certain calibration level in a LW/CFAR game/app. (You can then get level-2, level-3, etc. calibration badges for even better performance, maybe unlocking other site features like karma-betting markets.)
Everyone’s probability assignments (which are non-anonymous by default) can always be viewed by clicking or hovering on a “collapsed” probability (i.e., one that look like 𝗣 = foo instead of like an Arbital probability distribution image).
Collapsed probabilities will display as 𝗣 = ? until, e.g., some number of users with at least 2000 karma between them have assigned probabilities to that comment/post. Some aggregated probability will then be displayed for “foo” in 𝗣 = foo, with better-calibrated users (according to badge count) receiving more weight in the aggregation.
The main purpose of hiding the probabilities behind 𝗣 = ? until enough high-karma users have weighed in is to discourage low-karma users from getting really excited and wasting time running around and assigning probabilities to every comment on the site. If someone feels like doing that, that’s totally fine — maybe they’ll learn something from the process — but those probabilities shouldn’t be prominently displayed, because a lot of comments on the site are things like “I agree!” or “Woah.” where it doesn’t really matter if someone decides to waste their time adding silly probability assignments, but it does start carrying a cost if this makes silly probability assignments distracting and visible to anyone visiting the page.
(Note that users’ karma totals do not affect the weight users’ assignments receive in the probability aggregation at all, even though it affects how prominently the probabilities are displayed. On the other hand, it might be fine to weight users more if they have badges for things other than calibration, e.g., a “general knowledge” badge reflecting that you’re unusually good at answering Jeopardy questions or what-have-you.)
Most of these ideas seem good in isolation (or pointing in the direction of good things). I think they’d add up to significant complexity cost for the page, so figuring out to what degree they are worth adding to the overall cognitive load for the site will be an issue.
I agree that we should be very careful about adding complexity, especially when the complexity is added to every single post and comment. I can think of a few things that might reduce the complexity:
1. Instead of having a big eye-catching Arbital-style visualization of the probabilities displayed for every post and comment on the site, display an aggregate probability in boring grey text that matches the other text, and have people hover/click on that text to view/predict. E.g., your comment header could look like this:
Raemon +2 votes ∧ ∨ 𝗣 ≈ 0.9 6h
You could still include the full Arbital-style visualization within posts and comments, but it would be a deliberate choice by the post/comment author, rather than being a default. In cases where not enough people have assigned probabilities to the post/comment for the system to think it’s worth displaying an aggregate probability, the default visualization will be 𝗣 = ?.
2. Use a functionally similar (though visually distinct) click-and-drag sliding scale for the voting system’s karma-weighting that we use (see above) and for assigning probabilities, so some of the basic habits and motor intuitions people build up with karma can also be used for probability.
In general: My vague understanding of Oliver and co.’s vision for LessWrong is that LessWrong is to be a site where probability assignments, predictions, cruxes, bets, etc. play a huge role. Having easy infrastructure for making and comparing probability assignments might be more of a core feature than the full range of “buttons”, particularly if some of the key buttons can themselves be implemented as probability assignments.
More specifics of how I might implement a probability system like this:
Before you can assign probabilities, you need a “level-1 calibration” badge, achieved via hitting a certain calibration level in a LW/CFAR game/app. (You can then get level-2, level-3, etc. calibration badges for even better performance, maybe unlocking other site features like karma-betting markets.)
Everyone’s probability assignments (which are non-anonymous by default) can always be viewed by clicking or hovering on a “collapsed” probability (i.e., one that look like 𝗣 = foo instead of like an Arbital probability distribution image).
Collapsed probabilities will display as 𝗣 = ? until, e.g., some number of users with at least 2000 karma between them have assigned probabilities to that comment/post. Some aggregated probability will then be displayed for “foo” in 𝗣 = foo, with better-calibrated users (according to badge count) receiving more weight in the aggregation.
The main purpose of hiding the probabilities behind 𝗣 = ? until enough high-karma users have weighed in is to discourage low-karma users from getting really excited and wasting time running around and assigning probabilities to every comment on the site. If someone feels like doing that, that’s totally fine — maybe they’ll learn something from the process — but those probabilities shouldn’t be prominently displayed, because a lot of comments on the site are things like “I agree!” or “Woah.” where it doesn’t really matter if someone decides to waste their time adding silly probability assignments, but it does start carrying a cost if this makes silly probability assignments distracting and visible to anyone visiting the page.
(Note that users’ karma totals do not affect the weight users’ assignments receive in the probability aggregation at all, even though it affects how prominently the probabilities are displayed. On the other hand, it might be fine to weight users more if they have badges for things other than calibration, e.g., a “general knowledge” badge reflecting that you’re unusually good at answering Jeopardy questions or what-have-you.)
Is there an existing CFAR/LW calibration app that we consider good? (I know there have been attempts but haven’t actually used them myself)
New thing: https://www.openphilanthropy.org/blog/new-web-app-calibration-training
This one broke for me a few times :/
I actually also think the UI design and feedback mechanisms are a lot worse, so I would recommend that people still use the old one.