Measuring accuracy is a good way to assess the quality of our models. But not all accurate models are inherently good to have.
I’m wondering about the questions you picked. Do you feel that there is some utility for you in being able to predict, e.g. the future situation in Libya? I don’t really think there is, but then I struggle to come up with more useful alternatives, at least ones that aren’t personal.
I appreciate what you’re doing here, I’d like to be doing it myself, on some level (though I don’t currently intend to participate). But I’m concerned to what extent this sort of contest is useful to have, and to what extent it is a game.
The usefulness of a model of the particular area was something I considered in choosing between questions, but I had a hard time finding a set of good non-personal questions which had very high value to model. I tried to pick questions which in some way depended on interesting underlying questions-for example, the Tesla one hinges on your ability to predict the performance of a known-to-overpromise entrepreneur in a manner that’s more precise than either maximum cynicism or full trust, and the ability to predict ongoing ramp-up of manufacturing of tech facing manufacturing difficulties, both of which I think have value.
World politics are I think the weakest section in that regard, and this is a big part of why rather than just taking twenty questions from the various sources of world politics predictions I had available, I looked for other questions, and made a bunch of my own EA-related ones by going through EA org posts looking for uncertain pieces of the future, reducing the world politics questions down to only a little over a third of the set.
That said, I think the world politics do have transferability in calibration if not precision (you can learn to be accurate on topics you don’t have a precise model for by having a good grasp of how confident you should be), and the general skill of skimming a topic, arriving at impressions about it, and knowing how much to trust those impressions. I think there are general skills of rationality being practiced here, beyond gaining specific models.
And I think while it is the weakest section it does have some value- there’s utility in having a reasonable grasp of the behaviour and in particular the speed of change under various circumstances in governments- the way governments behave and react in the future will set the regulatory environment for future technological development, and the way they behave in geopolitics affects risk from political instability, both as a civilisation risk in itself and as something that could require mitigation in other work. There was an ongoing line of questioning about how good it is, exactly, to have a massive chunk of AGI safety orgs in one coastal American city (in particular during the worst of the North Korea stuff), and a good model for that is useful for deciding whether it’s worth trying to fund the creation and expansion and focusing of orgs elsewhere as a “backup”, for example, which is a decision that can be taken individually on the basis of a good grasp of how concerned you should be, exactly, about particular geopolitical issues.
These world politics questions are probably not perfectly optimised for that (I had to avoid anything on NK in particular due to the current rate of change), and it’d be nice to find better ones, and maybe more other useful questions and shrink the section further next year. I think they probably have some value to practice predicting on, though.
I like the EA section. I think grouping people by specific goals/interests and preparing questions for those goals is the right way. If I cared about EA, then being able to predict which charities will start/stop being effective, before they actually implement whatever changes they’re considering, would allow me to spend money more efficiently. It would be good not only to have an accurate personal model, but also to see other people with better models make those predictions, and know how reliable they really are.
Likewise, we could have something about AGI, e.g. “which AGI safety organization will produce the most important work next year”, so that we can fund them more effectively. Of course, “most important” is a bit subjective, and, also, there is a self-fulfilling component in this (if you don’t fund an organization, then it won’t do anything useful). But in theory being able to predict this would be a good skill, for someone who cares about AGI safety.
Problem is, I don’t really know what else we commonly care about (to be honest, I don’t care about either of those much).
I think the world politics do have transferability in calibration
I would also like this to be true, but I wonder if it really is. There is a very big difference between political questions and personal questions. I’d ask if someone has measured whether they experience any transfer between the two, but then I’m not even sure how to measure it.
It might be nice to have a set of twenty EA questions, a set of twenty ongoing-academic-research questions, a set of twenty general tech industry questions, a set of twenty world politics questions for the people who like them maybe, and run multiple contests at some point which refine predictive ability within a particular domain, yeah.
It’d be a tough time to source that many, and I feel that twenty is already about the minimum sample size I’d want to use, and for research questions it’d probably require some crowdsourcing of interesting upcoming experiments to predict on, but particularly if help turns out to be available it’d be worth considering if the smaller thing works.
Measuring accuracy is a good way to assess the quality of our models. But not all accurate models are inherently good to have.
I’m wondering about the questions you picked. Do you feel that there is some utility for you in being able to predict, e.g. the future situation in Libya? I don’t really think there is, but then I struggle to come up with more useful alternatives, at least ones that aren’t personal.
I appreciate what you’re doing here, I’d like to be doing it myself, on some level (though I don’t currently intend to participate). But I’m concerned to what extent this sort of contest is useful to have, and to what extent it is a game.
The usefulness of a model of the particular area was something I considered in choosing between questions, but I had a hard time finding a set of good non-personal questions which had very high value to model. I tried to pick questions which in some way depended on interesting underlying questions-for example, the Tesla one hinges on your ability to predict the performance of a known-to-overpromise entrepreneur in a manner that’s more precise than either maximum cynicism or full trust, and the ability to predict ongoing ramp-up of manufacturing of tech facing manufacturing difficulties, both of which I think have value.
World politics are I think the weakest section in that regard, and this is a big part of why rather than just taking twenty questions from the various sources of world politics predictions I had available, I looked for other questions, and made a bunch of my own EA-related ones by going through EA org posts looking for uncertain pieces of the future, reducing the world politics questions down to only a little over a third of the set.
That said, I think the world politics do have transferability in calibration if not precision (you can learn to be accurate on topics you don’t have a precise model for by having a good grasp of how confident you should be), and the general skill of skimming a topic, arriving at impressions about it, and knowing how much to trust those impressions. I think there are general skills of rationality being practiced here, beyond gaining specific models.
And I think while it is the weakest section it does have some value- there’s utility in having a reasonable grasp of the behaviour and in particular the speed of change under various circumstances in governments- the way governments behave and react in the future will set the regulatory environment for future technological development, and the way they behave in geopolitics affects risk from political instability, both as a civilisation risk in itself and as something that could require mitigation in other work. There was an ongoing line of questioning about how good it is, exactly, to have a massive chunk of AGI safety orgs in one coastal American city (in particular during the worst of the North Korea stuff), and a good model for that is useful for deciding whether it’s worth trying to fund the creation and expansion and focusing of orgs elsewhere as a “backup”, for example, which is a decision that can be taken individually on the basis of a good grasp of how concerned you should be, exactly, about particular geopolitical issues.
These world politics questions are probably not perfectly optimised for that (I had to avoid anything on NK in particular due to the current rate of change), and it’d be nice to find better ones, and maybe more other useful questions and shrink the section further next year. I think they probably have some value to practice predicting on, though.
I like the EA section. I think grouping people by specific goals/interests and preparing questions for those goals is the right way. If I cared about EA, then being able to predict which charities will start/stop being effective, before they actually implement whatever changes they’re considering, would allow me to spend money more efficiently. It would be good not only to have an accurate personal model, but also to see other people with better models make those predictions, and know how reliable they really are.
Likewise, we could have something about AGI, e.g. “which AGI safety organization will produce the most important work next year”, so that we can fund them more effectively. Of course, “most important” is a bit subjective, and, also, there is a self-fulfilling component in this (if you don’t fund an organization, then it won’t do anything useful). But in theory being able to predict this would be a good skill, for someone who cares about AGI safety.
Problem is, I don’t really know what else we commonly care about (to be honest, I don’t care about either of those much).
I would also like this to be true, but I wonder if it really is. There is a very big difference between political questions and personal questions. I’d ask if someone has measured whether they experience any transfer between the two, but then I’m not even sure how to measure it.
It might be nice to have a set of twenty EA questions, a set of twenty ongoing-academic-research questions, a set of twenty general tech industry questions, a set of twenty world politics questions for the people who like them maybe, and run multiple contests at some point which refine predictive ability within a particular domain, yeah.
It’d be a tough time to source that many, and I feel that twenty is already about the minimum sample size I’d want to use, and for research questions it’d probably require some crowdsourcing of interesting upcoming experiments to predict on, but particularly if help turns out to be available it’d be worth considering if the smaller thing works.