But still without being transparent about his own forecasts, preventing a fair comparison.
I think it’s a fair comparison, in that we can do at least a weak subjective-Bayesian update on the information—it’s useful and not cherry-picked, at least insofar as we can compare the AGI/TAI construct Eliezer was talking about in December, to the things Metaculus is making predictions about.
I agree that it’s way harder to do a Bayesian update on data points like ‘EY predicted AGI well before 2050, then Metaculus updated from 2052 to 2035’ when we don’t have a full EY probability distribution over years.
I mostly just respond by making a smaller subjective update and then going on with my day, rather than treating this as revelatory. I’m better off with the information in hand, but it’s a very small update in the grand scheme of things. Almost all of my knowledge is built out of small updates in the first place, rather than huge revelatory ones.
If I understand your views, Jotto, three big claims you’re making are:
It’s rude to be as harsh to other futurists as Eliezer was toward Metaculus, and if you’re going to be that harsh then at minimum you should clearly be sticking your neck out as much as the people you’re criticizing. (Analogy: it would be rude, and harmful to pro-quantified-forecasting norms, to loudly criticize Matt Yglesias for having an off year without at minimum having made a similar number of similarly risky, easy-to-resolve public predictions.)
Metaculus-style forecasting is the gold standard for reasoning about the physical world, and is the only game in town when it comes to ‘remotely reasonable methods to try to predict anything about future technology’. Specifically:
Anyone who claims to know anything relevant to the future should have an account on Metaculus (or a similar site), and people should overwhelmingly base their beliefs about the future on (a) what Metaculus says, and (b) what the people with the highest Metaculus scores say...
… rather than basing their beliefs on their own inside-view models of anything, personal attempts to do explicit quantified Bayesian updates in response to not-fully-quantitative data (e.g., ‘how surprised would my gut be if something like CYC turned out to be more important than ML to future AI progress, a la Hanson’s claims? in how many worlds do I expect to see that, compared to worlds where I don’t see it?’) , or attempts to shift their implicit strength of belief about things without routing through explicit Bayesian calculations.
If you aren’t a top Metaculus forecaster and aren’t just repeating the current Metaculus consensus using the reasoning ‘X is my belief state because Metaculus thinks it’, then you should shut up rather than poisoning the epistemic commons with your unvalidated inside-view models, hard-to-quickly-quantitatively-evaluate claims about reality, etc.
(Correct me if I’m misstating any of your views.)
I don’t have a strong view on whether folks should be friendlier/nicer in general—there are obvious benefits to letting people be blunt, but also obvious costs. Seems hard to resolve. I think it’s healthy that the EA Forum and LW have chosen different tradeoff points here, so we can test the effects of different norms and attract people who favor different tradeoffs. (Though I think there should be more ‘cultural exchange’ between the EA Forum and LW.)
The more specific question ‘has Eliezer stuck his neck out enough?’ seems to me to turn on 2. Likewise, 3 depends on the truth of 2.
I think 2 is false—Metaculus strikes me as a good tool to have in the toolbox, and a really cool resource overall, but I don’t see it as a replacement for inside-view reasoning, building your own models of the world, or doing implicit updating and intuition-honing.
Nor do I think that only the top n% of EAs or rationalists should try to do their own model-building like this; I think nearly every EA and every rationalist should do it, just trying to guard against the obvious pitfalls—and learning via experience, to some degree, where those pitfalls tend to be for them personally.
At the borderlands of EA and non-EA, I find that the main argument I tend to want to cite is Bayes:
‘Yep, A seems possible. But if not-A were true instead, what would you expect to see differently? How well does not-A retrodict the data, compared to A?’
And relatedly, ‘What are the future predictions of A versus not-A, and how soon can we get data that provides nontrivial evidence for one side versus the other?’ But that’s a more standard part of the non-EA college-educated person’s toolbox.
And there’s a sense in which almost all of the cognitive resources available to a human look like retrodiction, rather than prediction.
If you hear a new Q and only trust your pre-registered predictions, then that means your whole lifetime of past knowledge is useless to you.
We have in fact adopted the norm “give disproportionate weight to explicit written-down predictions”, to guard against hindsight bias and lying.
But it’s still the case that almost all the cognitive work is being done at any time by “how does this fit my past experience?”.
I guess there’s another, subtler reason we give extra weight to predictions: there’s a social norm against acknowledging gaps in individual ability.
If you only discuss observables and objective facts, never priors, then it’s easier to just-not-talk-about individuals’ judgment.
Whatever the rationale, it’s essential that we in fact get better at retrodiction (i.e., reasoning about the things we already know), because we can’t do without it. We need to be able to talk about our knowledge, and we need deliberate practice at manipulating it.
The big mistake isn’t “give more weight to pre-registered predictions”; it’s ”… and then make it taboo to say that you’re basing any conclusions on anything else”.
Predictions are the gold standard, but man cannot live on gold alone.
To explain more of my view, here’s a thing I wrote in response to some Qs on the 18th:
I think there are two different topics here:
1. Should we talk a bunch publicly about the fastest path to AGI?
2. To what extent should we try to explicate and quantify our *other* predictions, and publish those predictions?
I think the answer to 1 is an obvious “no”.
But ‘what is your AGI timeline?’, ‘when do you expect a Gato-like thing to be developed?’, etc. seems basically fine to me, because it doesn’t say much about how you expect AGI to be developed. (Especially if your timeline is long.)
The Metaculus folks criticizing EY for saying ‘Metaculus updated toward my view’ apparently didn’t realize that he did make a public prediction proving this: (link)
He didn’t make a Gato-specific public prediction, but it’s also not apparent to me that EY’s making a strong claim like ‘I predicted a system exactly like Gato would be built exactly now’; he’s just saying it’s the broad sort of thing his intuitive models of AI progress allow.
Translating an intuitive, unstable, preverbal sense of ‘which events are likelier to happen when?’ into a bunch of quantified predictions, without falling victim to issues like framing and salience effects, seems pretty hard to me.
EY is on the record as saying that it’s hard to get much mileage out of thinking about timelines, and that it’s even harder if you try to switch away from your brain’s native format for representing the probabilities (emotional surprise, concrete anticipations, etc.).
I could easily imagine that there’s some individual variation about how people best do tech forecasting, and I also think it’s reasonable for folks to disagree about the best norms here. So, I think 2 is a more complicated Q than 1, and I don’t have a strong view on it. [...]
I guess “top Metaculus forecaster” is a transparently bad metric, because spending more time on Metaculus tends to raise your score? Is there a ‘Metaculus score corrected for how much you use the site’ leaderboard?
This is good in some ways but also very misleading. This selects against people who also place a lot of forecasts on lots of questions, and also against people who place forecasts on questions that have already been open for a long time, and who don’t have time to later update on most of them.
I’d say it’s a very good way to measure performance within a tournament, but in the broader jungle of questions it misses an awful lot.
E.g. I have predictions on 1,114 questions, and the majority were never updated, and had negligible energy put into them.
Sometimes for fun I used to place my first (and only) forecast on questions that were just about to close. I liked it because this made it easier to compare my performance on distribution questions, versus the community, because the final summary would only show that for the final snapshot. But of course, if you do this then you will get very few points per question. But if I look at my results on those, it’s normal for me to slightly outperform the community median.
This isn’t captured by my average points per question across all questions, where I underperform (partly because I never updated on most of those questions, and partly because a lot of it is amusingly obscure stuff I put little effort into.) Though, that’s not to suggest I’m particular great either (I’m not), but I digress.
If we’re trying to predict a forecaster’s insight on “the next” given discrete prediction, then a more useful metric would be the forecaster’s log score versus the community’s log score on the same questions, at the time they placed those forecast. Naturally this isn’t a good way to score tournaments, where people should update often, and focus on high-effort per question. But if we’re trying to estimate their judgment from the broader jungle of Metaculus questions, then that would be much more informative than a points average per question.
I think it’s a fair comparison, in that we can do at least a weak subjective-Bayesian update on the information—it’s useful and not cherry-picked, at least insofar as we can compare the AGI/TAI construct Eliezer was talking about in December, to the things Metaculus is making predictions about.
I agree that it’s way harder to do a Bayesian update on data points like ‘EY predicted AGI well before 2050, then Metaculus updated from 2052 to 2035’ when we don’t have a full EY probability distribution over years.
I mostly just respond by making a smaller subjective update and then going on with my day, rather than treating this as revelatory. I’m better off with the information in hand, but it’s a very small update in the grand scheme of things. Almost all of my knowledge is built out of small updates in the first place, rather than huge revelatory ones.
If I understand your views, Jotto, three big claims you’re making are:
It’s rude to be as harsh to other futurists as Eliezer was toward Metaculus, and if you’re going to be that harsh then at minimum you should clearly be sticking your neck out as much as the people you’re criticizing. (Analogy: it would be rude, and harmful to pro-quantified-forecasting norms, to loudly criticize Matt Yglesias for having an off year without at minimum having made a similar number of similarly risky, easy-to-resolve public predictions.)
Metaculus-style forecasting is the gold standard for reasoning about the physical world, and is the only game in town when it comes to ‘remotely reasonable methods to try to predict anything about future technology’. Specifically:
Anyone who claims to know anything relevant to the future should have an account on Metaculus (or a similar site), and people should overwhelmingly base their beliefs about the future on (a) what Metaculus says, and (b) what the people with the highest Metaculus scores say...
… rather than basing their beliefs on their own inside-view models of anything, personal attempts to do explicit quantified Bayesian updates in response to not-fully-quantitative data (e.g., ‘how surprised would my gut be if something like CYC turned out to be more important than ML to future AI progress, a la Hanson’s claims? in how many worlds do I expect to see that, compared to worlds where I don’t see it?’) , or attempts to shift their implicit strength of belief about things without routing through explicit Bayesian calculations.
If you aren’t a top Metaculus forecaster and aren’t just repeating the current Metaculus consensus using the reasoning ‘X is my belief state because Metaculus thinks it’, then you should shut up rather than poisoning the epistemic commons with your unvalidated inside-view models, hard-to-quickly-quantitatively-evaluate claims about reality, etc.
(Correct me if I’m misstating any of your views.)
I don’t have a strong view on whether folks should be friendlier/nicer in general—there are obvious benefits to letting people be blunt, but also obvious costs. Seems hard to resolve. I think it’s healthy that the EA Forum and LW have chosen different tradeoff points here, so we can test the effects of different norms and attract people who favor different tradeoffs. (Though I think there should be more ‘cultural exchange’ between the EA Forum and LW.)
The more specific question ‘has Eliezer stuck his neck out enough?’ seems to me to turn on 2. Likewise, 3 depends on the truth of 2.
I think 2 is false—Metaculus strikes me as a good tool to have in the toolbox, and a really cool resource overall, but I don’t see it as a replacement for inside-view reasoning, building your own models of the world, or doing implicit updating and intuition-honing.
Nor do I think that only the top n% of EAs or rationalists should try to do their own model-building like this; I think nearly every EA and every rationalist should do it, just trying to guard against the obvious pitfalls—and learning via experience, to some degree, where those pitfalls tend to be for them personally.
Quoting another recent thing I wrote on Twitter:
To explain more of my view, here’s a thing I wrote in response to some Qs on the 18th:
I guess “top Metaculus forecaster” is a transparently bad metric, because spending more time on Metaculus tends to raise your score? Is there a ‘Metaculus score corrected for how much you use the site’ leaderboard?
Yes, https://metaculusextras.com/points_per_question
It has its own problems in terms of judging ability. But it does exist.
Thanks! :)
This is good in some ways but also very misleading. This selects against people who also place a lot of forecasts on lots of questions, and also against people who place forecasts on questions that have already been open for a long time, and who don’t have time to later update on most of them.
I’d say it’s a very good way to measure performance within a tournament, but in the broader jungle of questions it misses an awful lot.
E.g. I have predictions on 1,114 questions, and the majority were never updated, and had negligible energy put into them.
Sometimes for fun I used to place my first (and only) forecast on questions that were just about to close. I liked it because this made it easier to compare my performance on distribution questions, versus the community, because the final summary would only show that for the final snapshot. But of course, if you do this then you will get very few points per question. But if I look at my results on those, it’s normal for me to slightly outperform the community median.
This isn’t captured by my average points per question across all questions, where I underperform (partly because I never updated on most of those questions, and partly because a lot of it is amusingly obscure stuff I put little effort into.) Though, that’s not to suggest I’m particular great either (I’m not), but I digress.
If we’re trying to predict a forecaster’s insight on “the next” given discrete prediction, then a more useful metric would be the forecaster’s log score versus the community’s log score on the same questions, at the time they placed those forecast. Naturally this isn’t a good way to score tournaments, where people should update often, and focus on high-effort per question. But if we’re trying to estimate their judgment from the broader jungle of Metaculus questions, then that would be much more informative than a points average per question.