Prediction-tracking systems, particularly prediction markets with dedicated virtual currencies, seem like an obvious contender here. One could imagine the scope of those being expanded to more fields. So I’m going to focus on what’s not captured or inadequately captured by them:
The book Superforecasters notes that generating good questions is as important as predicting answers to questions effectively, but prediction-tracking is completely unable to assign credit to the thinkers who generate good questions.
This is similar to the idea that we should judge people separately on their ability to generate hypotheses and their ability to accurately score hypotheses. Hypothesis generation is incredibly valuable, and we would not want to optimize too much for correctness at the expense of that. This is an idea I attribute to Scott Garrabrant.
Another important idea from Scott is epistemic tenure. This idea argues against scoring an intellectual in too fine-grained a way. If a person’s score can go down too easily (for example, if recent work is heavily weighted), this could create a fear of putting bad ideas out there, which could severely dampen creativity.
Ability to produce proofs and strong evidence seems potentially under-valued. For example, if a conjecture is believed to be highly likely, producing a proof would enable you to get a few points on a prediction-tracker (since you could move your probability to 100% before anyone else), but this would severely underrate your contribution.
Ability to contribute to the thought process seems under-valued. Imagine a prediction market with an attached forum. Someone might contribute heavily to the discussions, in a way which greatly increases the predictive accuracy of top users, without being able to score any points for themselves.
I do think “[a]bility to contribute to the thought process seems under-valued” is very relevant here. A prediction-tracking system captures one...layer[^1], I suppose, of intellectuals; the layer that is concerned with making frequent, specific, testable predictions about imminent events. Those who make theories that are more vague, or with more complex outcomes, or even less frequent[^2][^3], while perhaps instrumental to the frequent, specific, testable predictors, would not be recognized, unless there were some sort of complex system compelling the assignment of credit to the vague contributors (and presumably to their vague contributors, et cetera, across the entire intellectual lineage or at least some maximum feasible depth).
This would be useful to help the lay public understand outcomes of events, but not necessarily useful in helping them learn about the actual models behind them; it leaves them with models like “trust Alice, Bob, and Carol, but not Dan, Eve, or Frank” rather than “Alice, Bob, and Carol all subscribe to George’s microeconomic theory which says that wages are determined by the House of Mars, and Dan, Eve, and Frank’s failure to predict changes in household income using Helena’s theory that wage increases are caused by three-ghost visitations to CEOs’ dreams substantially discredits it”. Intellectuals could declare that their successes or failures, or those of their peers, were due to adherence to a specific theory, or the lay people could try to infer as such, but this is another layer of intellectual analysis that is nontrivial unless everyone wears jerseys declaring what theoretical school of thought they follow (useful if there are a few major schools of thought in a field and the main conflict is between them, in which case we really ought to be ranking those instead of individuals; not terribly useful otherwise).
[^1]: I do not mean to imply here that such intellectuals are above or below other sorts. I use layer here in the same way that it is used in neural networks, denoting that its elements are posterior to other layers and closer to a human-readable/human-valued result.
[^2]: For example, someone who predicts the weather will have much more opportunity to be trusted than someone who predicts elections. Perhaps this is how it should be; while the latter are less frequent, they will likely have a wider spread, and if our overall confidence in election-predicting intellectuals is lower than in our predictions of weather-predicting intellectuals, that might just be the right response to a field with relatively fewer data points: less confidence in any specific prediction or source of knowledge.
[^3] On the other hand, these intellectuals may be less applied not because of the nature of their field, but the nature of their specialization; a grand an abstract genius could produce incredibly detailed models of the world, and the several people who run the numbers on those models would be the ones rewarded with a track record of successful predictions.
The point about proof generation is interesting. A general proof is equivalent to collapsing the scope of predictions covered by the proof; a method of generating strong evidence effectively setting a floor for future predictions.
A simple way to score this might be to keep adding to their prediction score every time a question is found to succumb to the proof. That being said, we could also consider the specific prediction separately from the transmissibility of the prediction method.
This might be worthwhile even with no change in the overall score; it feels obvious that we would like to be able to sort predictions by [people who have used proofs] or [people who generate evidence directly].
Prediction-tracking systems, particularly prediction markets with dedicated virtual currencies, seem like an obvious contender here. One could imagine the scope of those being expanded to more fields. So I’m going to focus on what’s not captured or inadequately captured by them:
The book Superforecasters notes that generating good questions is as important as predicting answers to questions effectively, but prediction-tracking is completely unable to assign credit to the thinkers who generate good questions.
This is similar to the idea that we should judge people separately on their ability to generate hypotheses and their ability to accurately score hypotheses. Hypothesis generation is incredibly valuable, and we would not want to optimize too much for correctness at the expense of that. This is an idea I attribute to Scott Garrabrant.
Another important idea from Scott is epistemic tenure. This idea argues against scoring an intellectual in too fine-grained a way. If a person’s score can go down too easily (for example, if recent work is heavily weighted), this could create a fear of putting bad ideas out there, which could severely dampen creativity.
Ability to produce proofs and strong evidence seems potentially under-valued. For example, if a conjecture is believed to be highly likely, producing a proof would enable you to get a few points on a prediction-tracker (since you could move your probability to 100% before anyone else), but this would severely underrate your contribution.
Ability to contribute to the thought process seems under-valued. Imagine a prediction market with an attached forum. Someone might contribute heavily to the discussions, in a way which greatly increases the predictive accuracy of top users, without being able to score any points for themselves.
I do think “[a]bility to contribute to the thought process seems under-valued” is very relevant here. A prediction-tracking system captures one...layer[^1], I suppose, of intellectuals; the layer that is concerned with making frequent, specific, testable predictions about imminent events. Those who make theories that are more vague, or with more complex outcomes, or even less frequent[^2][^3], while perhaps instrumental to the frequent, specific, testable predictors, would not be recognized, unless there were some sort of complex system compelling the assignment of credit to the vague contributors (and presumably to their vague contributors, et cetera, across the entire intellectual lineage or at least some maximum feasible depth).
This would be useful to help the lay public understand outcomes of events, but not necessarily useful in helping them learn about the actual models behind them; it leaves them with models like “trust Alice, Bob, and Carol, but not Dan, Eve, or Frank” rather than “Alice, Bob, and Carol all subscribe to George’s microeconomic theory which says that wages are determined by the House of Mars, and Dan, Eve, and Frank’s failure to predict changes in household income using Helena’s theory that wage increases are caused by three-ghost visitations to CEOs’ dreams substantially discredits it”. Intellectuals could declare that their successes or failures, or those of their peers, were due to adherence to a specific theory, or the lay people could try to infer as such, but this is another layer of intellectual analysis that is nontrivial unless everyone wears jerseys declaring what theoretical school of thought they follow (useful if there are a few major schools of thought in a field and the main conflict is between them, in which case we really ought to be ranking those instead of individuals; not terribly useful otherwise).
[^1]: I do not mean to imply here that such intellectuals are above or below other sorts. I use layer here in the same way that it is used in neural networks, denoting that its elements are posterior to other layers and closer to a human-readable/human-valued result.
[^2]: For example, someone who predicts the weather will have much more opportunity to be trusted than someone who predicts elections. Perhaps this is how it should be; while the latter are less frequent, they will likely have a wider spread, and if our overall confidence in election-predicting intellectuals is lower than in our predictions of weather-predicting intellectuals, that might just be the right response to a field with relatively fewer data points: less confidence in any specific prediction or source of knowledge.
[^3] On the other hand, these intellectuals may be less applied not because of the nature of their field, but the nature of their specialization; a grand an abstract genius could produce incredibly detailed models of the world, and the several people who run the numbers on those models would be the ones rewarded with a track record of successful predictions.
The point about proof generation is interesting. A general proof is equivalent to collapsing the scope of predictions covered by the proof; a method of generating strong evidence effectively setting a floor for future predictions.
A simple way to score this might be to keep adding to their prediction score every time a question is found to succumb to the proof. That being said, we could also consider the specific prediction separately from the transmissibility of the prediction method.
This might be worthwhile even with no change in the overall score; it feels obvious that we would like to be able to sort predictions by [people who have used proofs] or [people who generate evidence directly].