Okay, the thoughts that originally prompted me to write this post:
1) Point Inflation—it currently feels weird/wrong to me that I’m getting more points than usual. And then, when I realize that 17 points actually just means maybe 5 people upvoted my post, that feels a little sad. However, this seems like a temporary problem, and over time I’d get used to it.
The new paradigm seems like “3-weight votes” is basically what you’ll get as soon as you get the hang of things.
2) Not knowing how many people actually upvoted. I’m not sure how much I’d want this if I wasn’t used to having it, but it seems like something that might be useful to actually be able to check.
3) ‘Having’ to upvote/downvote more than feels fair. I’ve upvoted a post and felt a little weird because it was a *slightly* good post that I wanted to incentivize it, but giving it 3 karma felt overkill for what I wanted to give.
Alternately: if something is hovering at 2 karma, and I think it was a little bit wrong or poorly-done in some way, I sometimes want to downvote it but then feel bad about kicking it into the negatives, which results in me not downvoting it at all.
4) The existing of voting weight isn’t very intuitive—it feels at first like the site is just buggy, and this makes me want it to be more explicitly spelled out.
I have some thoughts on solutions, which I’ll make separate comments so they’re easier to upvote/downvote independently.
Karma-Weight as Max-Upvote, rather than Standard Upvote
Medium recently implement something called “claps” (see here), where you can basically “like” something as many times as you want. If you like it a lot, “clap” a lot. This is neat because it’s somewhat costly signaling—it actually takes time to clap things, there’s only so much you’d do it unless something actually seems really good to you.
A possible variation on this is something like “you can upvote something multiple times, and the limit of how much you upvote it is limited by your vote-weight”. (So, someone with 25 karma can upvote/downvote something up to 3 times).
This would resolve issue #3 above, and I think might add some additional value as a costlier signal.
If we went this route, there’s a few variations. Maybe you let people upvote a lot if it matters to them (which would mean even more point inflation). Maybe you can only upvote at the current log_5 scale, but make each additional vote require a lengthier (and perhaps more visually satisfying) click.
I’d probably lean towards your own posts still starting with your max-upvote.
(Note: Medium also doesn’t allow you to like something as many times as you want. Claps have a current maximum of 50, which is reached after about 1.5 minutes of clapping)
Meta: this is the sort of thing I’d have wanted to give a simple like or thumbs-up to to indicate to you that I’d seen it without having to respond with a comment.
(This is part of an mostly-unrelated issue of “should upvotes be public, or should there be some way to do something upvote-like that is public?” that’s been discussed a bit in a few places)
Meta: this is the sort of thing I’d have wanted to give a simple like or thumbs-up to to indicate to you that I’d seen it without having to respond with a comment.
I really like the system on Slack, Discord etc. where you can react to comments with various graphical icons and it shows who posted which reaction. Would be fantastic to have something similar here.
I want to push back on that, on two fronts. Such a system would be very bad and potentially rises to the dealbreaker level for me.
First point is that I think graphical icons are quite distracting and loud, and take us away from the kind of atmosphere of real discourse that we want. Icons and emoji and such are strongly associated for good reason with casual slash ephemeral conversation. They send the message that we’re texting, not debating.
Second point is that reactions of this type being non-anonymous seems very bad. I have come to the conclusion that social networks are toxic and should be considered harmful. If LW becomes a de facto social network rather than a place for discussion we’ve failed. Thinking about which people had which reaction to which things, and who is going to see you have which reaction to which thing, is central to that toxicity, and is again part of a very different type of thinking (and one I want to avoid, not embrace).
There are certainly times when it is valuable to tell someone “I have seen this” so there is an argument for a button that sends that message to the original poster (and only the original poster) but the counter-argument to that is that this puts subtle (or non-subtle) pressure on people to indicate whether they have seen things, and thus to check the site frequently. I very much do not want LW to be another site that people feel the need to keep current and constantly check for small updates. This is another big reason why I consider social networks toxic, as they reward constant refreshes. Email has this problem too and solving it while still being reachable quickly is an unsolved problem; some people such as Paul Christiano accept one-day turnaround times to avoid this issue. So overall, I prefer the equilibrium where there is no button, and if it is important that someone know you’ve seen something, you can comment/message to that effect and edit later.
First point is that I think graphical icons are quite distracting and loud
I don’t think they necessarily need to be: see e.g. the small “agree, respectfully disagree, helpful” icons at the bottom of posts over at Paradox forums (sample thread). They look pretty nice and unobtrusive to me.
Second point is that reactions of this type being non-anonymous seems very bad.
There seem to be both advantages and disadvantages to non-anonymity, and it’s not clear to me which one dominates. E.g. over on LW, a lot of people have mentioned that anonymous karma counts on their posts are pretty bad for motivation, and that a couple of named posters making comments such as “nice post” can feel much more rewarding than having lots of upvotes.
I agree that this can also have a negative effect, but given that one of the reasons why LW1.0 died seems to have been that people didn’t find it rewarding enough to put in the work of writing quality content, having more emotionally compelling feedback mechanisms seems worth considering.
I find it interesting that you’re worried the feedback isn’t compelling enough, and I’m worried it will be too compelling in bad ways. I strongly resonate with the idea that someone taking the time to write ‘nice post’ feels much better than getting a like or upvote. That seems good so long as doing so is rare and someone failing to do this does not feel like information, since it involves far more conscious effort.
I also like Rob’s idea of collapsing ‘minor’ comments, with my additional suggestion that the person you’re replying to defaults to seeing them in expanded form, and likely they start at sorting power −1 for other people. This could also be useful for things like “you have a typo or math error.”
I think both private non-anonymous reactions and public anonymous reactions are likely to be valuable, whereas public non-anonymous reactions could be potentially harmful and private anonymous reactions seem mostly useless.
“I’ve seen this” coming from the parent poster and “nice post” are valuable feedback for the author of the post/comment, but less useful information for other people so it would best be private and non-anonymous.
Reactions that say something about the content of a comment, like “interesting” or “confusing” are more useful if they are public and anonymous.
Proposal: instead of buttons, have a feature to mark a comment you’re making as “minor”, which causes it to be collapsed by default. A stack of 10 little collapsed comments saying “I agree” or “+1” or “SGTM” or “I’ve seen this” might be manageable in a way that uncollapsed comments wouldn’t be. (A feature like this might also encourage shy or uncertain people to comment more?)
Something like this might also encourage people to think of collapsed comments as boring, and not do the “oooh, this comment is collapsed, it must be super interesting and salacious” thing. (If comments at a sufficient depth get collapsed, then you might want them to display differently from comments that are minor or downvoted.)
This could also be useful if we have buttons, but only for a very limited set of reactions, and/or if all or most buttons are anonymous.
Are High Karma People Trustworthy? Does Karma Even Do The Thing We Want?
The hope is for high karma people to actually represent the kind of judgment we want. This is… more true than I think it’d be in most communities, but not overwhelmingly true to the degree I’d like it to be.
I’m interested in upvote-options that reflect different styles of agreement/endorsement (i.e. in many ways “I disagree with this person but think they are saying a valuable thing” is more valuable than “I agree with this person”, and definitely more valuable than “that post was funny”)
It’s possible to implement something like this with something like Facebook Reacts, and weight them in ways that seem more to “get at the thing we care about.”
Using karma for like/dislike and having a separate Arbital-style probability meter for every post and comment (plus an option to insert a probability meters into a specific part of your post) might get you a lot of the value of this, and it seems like a different genre from “buttons” in the sense Zvi is talking about.
Specifically, I would have every post and comment automatically come with a probability meter at the bottom for “How confident are you that all the key claims in this (post/comment) are true?”, which can maybe be disabled on a case-by-case basis. I’d expect people to normally just intuit what the key claims are, but people could also be encouraged by the existence of this feature to include an explicit tl;dr at the bottom or top of their post listing what they see as their key claims. (I’d consider it valuable to have the ‘whole-post/comment’ probability meter consistently ask this question, so that people can easily scroll quickly down a page and eyeball the probability meter for each comment/post without the extra mental overhead of needing to worry about whether the ‘whole-post/comment’ probability meter represents something fundamentally different in each case.)
Separately, I would also include a feature for inserting an arbitrary number of probability meters into the body of a post or comment, explicitly associated with particular claims.
(When people have multiple “key claims” and choose to explicitly list them, it can also be standard to include a probability meter for each one, in which case the whole-post/comment probability meter may not be useful (unless some people feel comfortable weighing in on the conjunction but not the conjuncts, which I could imagine happening for a variety of reasons). I see this redundancy as more or less harmless, and would still consider it useful in this case to make each-post/comment-gets-a-single-top-level-probability-meter a site-wide standard, both because it would make it possible to see the probabilities different entire posts or comments got in indexes of those posts and comments (e.g., “All posts” or “Featured posts”), and because maintaining this norm would make it more normal and cognitively available to do a lot of probability assignments.)
I would suggest making people’s probability judgments non-anonymous by default (to encourage accountability and conversation, and to keep the site from devaluing globally unpopular opinions that are held by high-status contrarians), but allowing them to switch to anonymous judgments if they want. I would also suggest marking when each probability judgment was made, and encouraging people to make periodic follow-up comments in the case of important/interesting things they’ve updated on, in which they give their new probability judgments for the exact same claim, without overwriting the record of their old judgment.
(I think overwriting is OK too, though, provided that there’s some way for users to access their own old judgment in case it’s important later (or for mods to find them), and provided that the probability judgments receive dates. The ability to “remove” an old probability judgment seems generally good insofar as it makes people less worried that they’ll have an embarrassing mistake immortalized on the Internet forever, and therefore encourages them to try assigning probabilities to things more. Ideally people would never remove old judgments, even if they misunderstood the thing they were answering—the probability you’re misunderstanding a question should always be factored into your probability assignments—but I don’t think this is a case where we want to force people to do the ideal thing.)
I assume a lot of people would abuse probability meters for all sorts of trivial/weird/meta things, once they became commonplace. E.g., “Proposition: I did a good job summarizing my key claims above.” “Proposition: Most people will assign <50% probability to this proposition.” “Proposition: You found OP confusing.” “Proposition: If you threw a dart at a random point on a dartboard whose target was as large as a proportion of dartboard as the amount you liked the new Blade Runner movie is high compared to how much you like your favorite movies, then you would hit the target.” “Proposition: You’ll join me at the MIRIxLA meetup tomorrow at noon.” I consider this an actively good thing and want to be in a culture where probability assignment is so ubiquitous that it gets incorporated into trifles and in-jokes and gets appropriated for a variety of uses.
Most of these ideas seem good in isolation (or pointing in the direction of good things). I think they’d add up to significant complexity cost for the page, so figuring out to what degree they are worth adding to the overall cognitive load for the site will be an issue.
I agree that we should be very careful about adding complexity, especially when the complexity is added to every single post and comment. I can think of a few things that might reduce the complexity:
1. Instead of having a big eye-catching Arbital-style visualization of the probabilities displayed for every post and comment on the site, display an aggregate probability in boring grey text that matches the other text, and have people hover/click on that text to view/predict. E.g., your comment header could look like this:
Raemon +2 votes ∧ ∨ 𝗣 ≈ 0.9 6h
You could still include the full Arbital-style visualization within posts and comments, but it would be a deliberate choice by the post/comment author, rather than being a default. In cases where not enough people have assigned probabilities to the post/comment for the system to think it’s worth displaying an aggregate probability, the default visualization will be 𝗣 = ?.
2. Use a functionally similar (though visually distinct) click-and-drag sliding scale for the voting system’s karma-weighting that we use (see above) and for assigning probabilities, so some of the basic habits and motor intuitions people build up with karma can also be used for probability.
In general: My vague understanding of Oliver and co.’s vision for LessWrong is that LessWrong is to be a site where probability assignments, predictions, cruxes, bets, etc. play a huge role. Having easy infrastructure for making and comparing probability assignments might be more of a core feature than the full range of “buttons”, particularly if some of the key buttons can themselves be implemented as probability assignments.
More specifics of how I might implement a probability system like this:
Before you can assign probabilities, you need a “level-1 calibration” badge, achieved via hitting a certain calibration level in a LW/CFAR game/app. (You can then get level-2, level-3, etc. calibration badges for even better performance, maybe unlocking other site features like karma-betting markets.)
Everyone’s probability assignments (which are non-anonymous by default) can always be viewed by clicking or hovering on a “collapsed” probability (i.e., one that look like 𝗣 = foo instead of like an Arbital probability distribution image).
Collapsed probabilities will display as 𝗣 = ? until, e.g., some number of users with at least 2000 karma between them have assigned probabilities to that comment/post. Some aggregated probability will then be displayed for “foo” in 𝗣 = foo, with better-calibrated users (according to badge count) receiving more weight in the aggregation.
The main purpose of hiding the probabilities behind 𝗣 = ? until enough high-karma users have weighed in is to discourage low-karma users from getting really excited and wasting time running around and assigning probabilities to every comment on the site. If someone feels like doing that, that’s totally fine — maybe they’ll learn something from the process — but those probabilities shouldn’t be prominently displayed, because a lot of comments on the site are things like “I agree!” or “Woah.” where it doesn’t really matter if someone decides to waste their time adding silly probability assignments, but it does start carrying a cost if this makes silly probability assignments distracting and visible to anyone visiting the page.
(Note that users’ karma totals do not affect the weight users’ assignments receive in the probability aggregation at all, even though it affects how prominently the probabilities are displayed. On the other hand, it might be fine to weight users more if they have badges for things other than calibration, e.g., a “general knowledge” badge reflecting that you’re unusually good at answering Jeopardy questions or what-have-you.)
Okay, the thoughts that originally prompted me to write this post:
1) Point Inflation—it currently feels weird/wrong to me that I’m getting more points than usual. And then, when I realize that 17 points actually just means maybe 5 people upvoted my post, that feels a little sad. However, this seems like a temporary problem, and over time I’d get used to it.
The new paradigm seems like “3-weight votes” is basically what you’ll get as soon as you get the hang of things.
2) Not knowing how many people actually upvoted. I’m not sure how much I’d want this if I wasn’t used to having it, but it seems like something that might be useful to actually be able to check.
3) ‘Having’ to upvote/downvote more than feels fair. I’ve upvoted a post and felt a little weird because it was a *slightly* good post that I wanted to incentivize it, but giving it 3 karma felt overkill for what I wanted to give.
Alternately: if something is hovering at 2 karma, and I think it was a little bit wrong or poorly-done in some way, I sometimes want to downvote it but then feel bad about kicking it into the negatives, which results in me not downvoting it at all.
4) The existing of voting weight isn’t very intuitive—it feels at first like the site is just buggy, and this makes me want it to be more explicitly spelled out.
I have some thoughts on solutions, which I’ll make separate comments so they’re easier to upvote/downvote independently.
Karma-Weight as Max-Upvote, rather than Standard Upvote
Medium recently implement something called “claps” (see here), where you can basically “like” something as many times as you want. If you like it a lot, “clap” a lot. This is neat because it’s somewhat costly signaling—it actually takes time to clap things, there’s only so much you’d do it unless something actually seems really good to you.
A possible variation on this is something like “you can upvote something multiple times, and the limit of how much you upvote it is limited by your vote-weight”. (So, someone with 25 karma can upvote/downvote something up to 3 times).
This would resolve issue #3 above, and I think might add some additional value as a costlier signal.
If we went this route, there’s a few variations. Maybe you let people upvote a lot if it matters to them (which would mean even more point inflation). Maybe you can only upvote at the current log_5 scale, but make each additional vote require a lengthier (and perhaps more visually satisfying) click.
I’d probably lean towards your own posts still starting with your max-upvote.
(Note: Medium also doesn’t allow you to like something as many times as you want. Claps have a current maximum of 50, which is reached after about 1.5 minutes of clapping)
Ah, good to know.
Meta: this is the sort of thing I’d have wanted to give a simple like or thumbs-up to to indicate to you that I’d seen it without having to respond with a comment.
(This is part of an mostly-unrelated issue of “should upvotes be public, or should there be some way to do something upvote-like that is public?” that’s been discussed a bit in a few places)
I really like the system on Slack, Discord etc. where you can react to comments with various graphical icons and it shows who posted which reaction. Would be fantastic to have something similar here.
I want to push back on that, on two fronts. Such a system would be very bad and potentially rises to the dealbreaker level for me.
First point is that I think graphical icons are quite distracting and loud, and take us away from the kind of atmosphere of real discourse that we want. Icons and emoji and such are strongly associated for good reason with casual slash ephemeral conversation. They send the message that we’re texting, not debating.
Second point is that reactions of this type being non-anonymous seems very bad. I have come to the conclusion that social networks are toxic and should be considered harmful. If LW becomes a de facto social network rather than a place for discussion we’ve failed. Thinking about which people had which reaction to which things, and who is going to see you have which reaction to which thing, is central to that toxicity, and is again part of a very different type of thinking (and one I want to avoid, not embrace).
There are certainly times when it is valuable to tell someone “I have seen this” so there is an argument for a button that sends that message to the original poster (and only the original poster) but the counter-argument to that is that this puts subtle (or non-subtle) pressure on people to indicate whether they have seen things, and thus to check the site frequently. I very much do not want LW to be another site that people feel the need to keep current and constantly check for small updates. This is another big reason why I consider social networks toxic, as they reward constant refreshes. Email has this problem too and solving it while still being reachable quickly is an unsolved problem; some people such as Paul Christiano accept one-day turnaround times to avoid this issue. So overall, I prefer the equilibrium where there is no button, and if it is important that someone know you’ve seen something, you can comment/message to that effect and edit later.
I don’t think they necessarily need to be: see e.g. the small “agree, respectfully disagree, helpful” icons at the bottom of posts over at Paradox forums (sample thread). They look pretty nice and unobtrusive to me.
There seem to be both advantages and disadvantages to non-anonymity, and it’s not clear to me which one dominates. E.g. over on LW, a lot of people have mentioned that anonymous karma counts on their posts are pretty bad for motivation, and that a couple of named posters making comments such as “nice post” can feel much more rewarding than having lots of upvotes.
I agree that this can also have a negative effect, but given that one of the reasons why LW1.0 died seems to have been that people didn’t find it rewarding enough to put in the work of writing quality content, having more emotionally compelling feedback mechanisms seems worth considering.
I find it interesting that you’re worried the feedback isn’t compelling enough, and I’m worried it will be too compelling in bad ways. I strongly resonate with the idea that someone taking the time to write ‘nice post’ feels much better than getting a like or upvote. That seems good so long as doing so is rare and someone failing to do this does not feel like information, since it involves far more conscious effort.
I also like Rob’s idea of collapsing ‘minor’ comments, with my additional suggestion that the person you’re replying to defaults to seeing them in expanded form, and likely they start at sorting power −1 for other people. This could also be useful for things like “you have a typo or math error.”
I think both private non-anonymous reactions and public anonymous reactions are likely to be valuable, whereas public non-anonymous reactions could be potentially harmful and private anonymous reactions seem mostly useless.
“I’ve seen this” coming from the parent poster and “nice post” are valuable feedback for the author of the post/comment, but less useful information for other people so it would best be private and non-anonymous.
Reactions that say something about the content of a comment, like “interesting” or “confusing” are more useful if they are public and anonymous.
Proposal: instead of buttons, have a feature to mark a comment you’re making as “minor”, which causes it to be collapsed by default. A stack of 10 little collapsed comments saying “I agree” or “+1” or “SGTM” or “I’ve seen this” might be manageable in a way that uncollapsed comments wouldn’t be. (A feature like this might also encourage shy or uncertain people to comment more?)
Something like this might also encourage people to think of collapsed comments as boring, and not do the “oooh, this comment is collapsed, it must be super interesting and salacious” thing. (If comments at a sufficient depth get collapsed, then you might want them to display differently from comments that are minor or downvoted.)
This could also be useful if we have buttons, but only for a very limited set of reactions, and/or if all or most buttons are anonymous.
Are High Karma People Trustworthy? Does Karma Even Do The Thing We Want?
The hope is for high karma people to actually represent the kind of judgment we want. This is… more true than I think it’d be in most communities, but not overwhelmingly true to the degree I’d like it to be.
I’m interested in upvote-options that reflect different styles of agreement/endorsement (i.e. in many ways “I disagree with this person but think they are saying a valuable thing” is more valuable than “I agree with this person”, and definitely more valuable than “that post was funny”)
It’s possible to implement something like this with something like Facebook Reacts, and weight them in ways that seem more to “get at the thing we care about.”
Using karma for like/dislike and having a separate Arbital-style probability meter for every post and comment (plus an option to insert a probability meters into a specific part of your post) might get you a lot of the value of this, and it seems like a different genre from “buttons” in the sense Zvi is talking about.
Specifically, I would have every post and comment automatically come with a probability meter at the bottom for “How confident are you that all the key claims in this (post/comment) are true?”, which can maybe be disabled on a case-by-case basis. I’d expect people to normally just intuit what the key claims are, but people could also be encouraged by the existence of this feature to include an explicit tl;dr at the bottom or top of their post listing what they see as their key claims. (I’d consider it valuable to have the ‘whole-post/comment’ probability meter consistently ask this question, so that people can easily scroll quickly down a page and eyeball the probability meter for each comment/post without the extra mental overhead of needing to worry about whether the ‘whole-post/comment’ probability meter represents something fundamentally different in each case.)
Separately, I would also include a feature for inserting an arbitrary number of probability meters into the body of a post or comment, explicitly associated with particular claims.
(When people have multiple “key claims” and choose to explicitly list them, it can also be standard to include a probability meter for each one, in which case the whole-post/comment probability meter may not be useful (unless some people feel comfortable weighing in on the conjunction but not the conjuncts, which I could imagine happening for a variety of reasons). I see this redundancy as more or less harmless, and would still consider it useful in this case to make each-post/comment-gets-a-single-top-level-probability-meter a site-wide standard, both because it would make it possible to see the probabilities different entire posts or comments got in indexes of those posts and comments (e.g., “All posts” or “Featured posts”), and because maintaining this norm would make it more normal and cognitively available to do a lot of probability assignments.)
I would suggest making people’s probability judgments non-anonymous by default (to encourage accountability and conversation, and to keep the site from devaluing globally unpopular opinions that are held by high-status contrarians), but allowing them to switch to anonymous judgments if they want. I would also suggest marking when each probability judgment was made, and encouraging people to make periodic follow-up comments in the case of important/interesting things they’ve updated on, in which they give their new probability judgments for the exact same claim, without overwriting the record of their old judgment.
(I think overwriting is OK too, though, provided that there’s some way for users to access their own old judgment in case it’s important later (or for mods to find them), and provided that the probability judgments receive dates. The ability to “remove” an old probability judgment seems generally good insofar as it makes people less worried that they’ll have an embarrassing mistake immortalized on the Internet forever, and therefore encourages them to try assigning probabilities to things more. Ideally people would never remove old judgments, even if they misunderstood the thing they were answering—the probability you’re misunderstanding a question should always be factored into your probability assignments—but I don’t think this is a case where we want to force people to do the ideal thing.)
I assume a lot of people would abuse probability meters for all sorts of trivial/weird/meta things, once they became commonplace. E.g., “Proposition: I did a good job summarizing my key claims above.” “Proposition: Most people will assign <50% probability to this proposition.” “Proposition: You found OP confusing.” “Proposition: If you threw a dart at a random point on a dartboard whose target was as large as a proportion of dartboard as the amount you liked the new Blade Runner movie is high compared to how much you like your favorite movies, then you would hit the target.” “Proposition: You’ll join me at the MIRIxLA meetup tomorrow at noon.” I consider this an actively good thing and want to be in a culture where probability assignment is so ubiquitous that it gets incorporated into trifles and in-jokes and gets appropriated for a variety of uses.
Most of these ideas seem good in isolation (or pointing in the direction of good things). I think they’d add up to significant complexity cost for the page, so figuring out to what degree they are worth adding to the overall cognitive load for the site will be an issue.
I agree that we should be very careful about adding complexity, especially when the complexity is added to every single post and comment. I can think of a few things that might reduce the complexity:
1. Instead of having a big eye-catching Arbital-style visualization of the probabilities displayed for every post and comment on the site, display an aggregate probability in boring grey text that matches the other text, and have people hover/click on that text to view/predict. E.g., your comment header could look like this:
Raemon +2 votes ∧ ∨ 𝗣 ≈ 0.9 6h
You could still include the full Arbital-style visualization within posts and comments, but it would be a deliberate choice by the post/comment author, rather than being a default. In cases where not enough people have assigned probabilities to the post/comment for the system to think it’s worth displaying an aggregate probability, the default visualization will be 𝗣 = ?.
2. Use a functionally similar (though visually distinct) click-and-drag sliding scale for the voting system’s karma-weighting that we use (see above) and for assigning probabilities, so some of the basic habits and motor intuitions people build up with karma can also be used for probability.
In general: My vague understanding of Oliver and co.’s vision for LessWrong is that LessWrong is to be a site where probability assignments, predictions, cruxes, bets, etc. play a huge role. Having easy infrastructure for making and comparing probability assignments might be more of a core feature than the full range of “buttons”, particularly if some of the key buttons can themselves be implemented as probability assignments.
More specifics of how I might implement a probability system like this:
Before you can assign probabilities, you need a “level-1 calibration” badge, achieved via hitting a certain calibration level in a LW/CFAR game/app. (You can then get level-2, level-3, etc. calibration badges for even better performance, maybe unlocking other site features like karma-betting markets.)
Everyone’s probability assignments (which are non-anonymous by default) can always be viewed by clicking or hovering on a “collapsed” probability (i.e., one that look like 𝗣 = foo instead of like an Arbital probability distribution image).
Collapsed probabilities will display as 𝗣 = ? until, e.g., some number of users with at least 2000 karma between them have assigned probabilities to that comment/post. Some aggregated probability will then be displayed for “foo” in 𝗣 = foo, with better-calibrated users (according to badge count) receiving more weight in the aggregation.
The main purpose of hiding the probabilities behind 𝗣 = ? until enough high-karma users have weighed in is to discourage low-karma users from getting really excited and wasting time running around and assigning probabilities to every comment on the site. If someone feels like doing that, that’s totally fine — maybe they’ll learn something from the process — but those probabilities shouldn’t be prominently displayed, because a lot of comments on the site are things like “I agree!” or “Woah.” where it doesn’t really matter if someone decides to waste their time adding silly probability assignments, but it does start carrying a cost if this makes silly probability assignments distracting and visible to anyone visiting the page.
(Note that users’ karma totals do not affect the weight users’ assignments receive in the probability aggregation at all, even though it affects how prominently the probabilities are displayed. On the other hand, it might be fine to weight users more if they have badges for things other than calibration, e.g., a “general knowledge” badge reflecting that you’re unusually good at answering Jeopardy questions or what-have-you.)
Is there an existing CFAR/LW calibration app that we consider good? (I know there have been attempts but haven’t actually used them myself)
New thing: https://www.openphilanthropy.org/blog/new-web-app-calibration-training
This one broke for me a few times :/
I actually also think the UI design and feedback mechanisms are a lot worse, so I would recommend that people still use the old one.