I see what you’re saying, but it looks like you’re strawmanning me yet again with a more extreme version of my position. You’ve done that several times and you need to stop that.
What you’ve argued here prevents me from questioning the forecasting performance of every pundit who I can’t formally score, which is ~all of them.
Yes, it’s not a real forecasting track record unless it meets the sort of criteria that are fairly well understood in Tetlockian research. And neither is Ben Garfinkel’s post, that doesn’t give us a forecasting track record, like on Metaculus.
But if a non-track-recorded person suggests they’ve been doing a good job anticipating things, it’s quite reasonable to point out non-scorable things they said that seem incorrect, even with no way to score it.
In an earlier draft of my essay, I considered getting into bets he’s made (several of which he’s lost). I ended up not including those things. Partly because my focus was waning and it was more attainable to stick to the meta-level point. And partly because I thought the essay might be better if it was more focused. I don’t think there is literally zero information about his forecasting performance (that’s not plausible), but it seemed like it would be more of a distraction from my epistemic point. Bets are not as informative as Metaculus-style forecasts, but they are better than nothing. This stuff is a spectrum, even Metaculus doesn’t retain some kinds of information about the forecaster. Still, I didn’t get into it, though I could have.
But I ended up later editing in a link to one of Paul’s comments, where he describes some reasons that Robin looks pretty bad in hindsight, but also includes several things Eliezer said that seem quite off. None of those are scorable. But I added in a link to that, because Eliezer explicitly claimed he came across better in that debate, which overall he may have, but it’s actually more mixed than that, and that’s relevant to my meta-point that one can obfuscate these things without a proper track record. And Ben Garfinkel’s post is similarly relevant.
If the community felt more ambivalently about Eliezer’s forecasts, or even if Eliezer was more ambivalent about his own forecasts? And then there was some guy trying to convince people he has made bad forecasts? Then your objection of one-sidedness would make much more sense to me. That’s not what this is.
Eliezer actively tells people he’s anticipating things well, but he deliberately prevents his forecasts from being scorable. Pundits do that too, and you bet I would eagerly criticize vague non-scorable stuff they said that seems wrong. And yes, I would retweet someone criticizing those things too. Does that also bother you?
IMO that’s a much more defensible position, and is what the discussion should have initially focused on. From my perspective, the way the debate largely went is:
Jotto: Eliezer claims to have a relatively successful forecasting track record, along with Dario and Demis; but this is clearly dissembling, because a forecasting track record needs to look like a long series of Metaculus predictions.
Other people: (repeat without qualification the claim that Eliezer is falsely claiming to have a “forecasting track record”; simultaneously claims that Eliezer has a subpar “forecasting track record”, based on evidence that wouldn’t meet Jotto’s stated bar)
Jotto: (signal-boosts the inconsistent claims other people are making, without noting that this is equivocating between two senses of “track record” and therefore selectively applying two different standards)
Rob B: (gripes and complains)
Whereas the way the debate should have gone is:
Jotto: I personally disagree with Eliezer that the AI Foom debate is easy to understand and cash out into rough predictions about how the field has progressed since 2009, or how it is likely to progress in the future. Also, I wish that all of Eliezer, Robin, Demis, Dario, and Paul had made way more Metaculus-style forecasts back in 2010, so it would be easier to compare their prediction performance. I find it frustrating that nobody did this, and think we should start doing this way more now. Also, I think this sharper comparison would probably have shown that Eliezer is significantly worse at thinking about this topic than Paul, and maybe than Robin, Demis, and Dario.
Rob B: I disagree with your last sentence, and I disagree quantitatively that stuff like the Foom debate is as hard-to-interpret as you suggest. But I otherwise agree with you, and think it would have been useful if the circa-2010 discussions had included more explicit probability distributions, scenario breakdowns, quantitative estimates, etc. (suitably flagged as unstable, spitballed ass-numbers). Even where these aren’t cruxy and don’t provide clear evidence about people’s quality of reasoning about AGI, it’s still just helpful to have a more precise sense of what people’s actual beliefs at the time were. “X is unlikely” is way less useful than knowing whether it’s more like 30%, or more like 5%, or more like 0.1%, etc.
I think the whole ‘X isn’t a real track record’ thing was confusing, and made your argument sound more forceful than it should have.
Plus maybe some disagreements about how possible it is in general to form good models of people and of topics like AGI in the absence of Metaculus-ish forecasts, and disagreement about exactly how informative it would be to have a hundred examples of narrow-AI benchmark predictions over the last ten years from all the influential EAs?
(I think it would be useful, but more like ‘1% to 10% of the overall evidence for weighing people’s reasoning and correctness about AGI’, not ’90% to 100% of the evidence’.)
(An exception would be if, e.g., it turned out that ML progress is way more predictable than Eliezer or I believe. ML’s predictability is a genuine crux for us, so seeing someone else do amazing at this prediction task for a bunch of years, with foresight rather than hindsight, would genuinely update us a bunch. But we don’t expect to learn much from Eliezer or Rob trying to predict stuff, because while someone else may have secret insight that lets them predict the future of narrow-AI advances very narrowly, we are pretty sure we don’t know how to do that.)
Part of what I object to is that you’re a Metaculus radical, whose Twitter bio says “Replace opinions with forecasts.”
This is a view almost no one in the field currently agrees with or tries to live up to.
Which is fine, on its own. I like radicals, and want to hear their views argued for and hashed out in conversation.
But then you selectively accuse Eliezer of lying about having a “track record”, without noting how many other people are also expressing non-forecast “opinions” (and updating on these), and while using language in ways that make it sound like Eliezer is doing something more unusual than he is, and making it sound like your critique is more independent of your nonstandard views on track records and “opinions” than it actually is.
That’s the part that bugs me. If you have an extreme proposal for changing EA’s norms, argue for that proposal. Don’t just selectively take potshots at views or people you dislike more, while going easy on everyone else.
That’s the part that bugs me. If you have an extreme proposal for changing EA’s norms, argue for that proposal. Don’t just selectively take potshots at views or people you dislike more, while going easy on everyone else.
I think Jotto has argued for the proposal in the past. Whether he did it in that particular comment is not very important, so long as he holds everyone to the same standards.
As for his standards: I think he sees Eliezer as an easy target because he’s high status in this community and has explicitly said that he thinks his track record is good (in fact, better than other people). On its own, therefore, it’s not surprising that Eliezer would get singled out.
I no longer see exchanges with you as a good use of energy, unless you’re able to describe some of the strawmanning of me you’ve done and come clean about that.
EDIT: Since this is being downvoted, here is a comment chain where Rob Besinger interpreted me in ways that are bizarre, such as suggesting that I think Eliezer is saying he has “a crystal ball”, or that “if you record any prediction anywhere other than Metaculus (that doesn’t have similarly good tools for representing probability distributions), you’re a con artist”. Things that sound thematically similar to what I was saying, but were weird, persistent extremes that I don’t see as good-faith readings of me. It kept happening over Twitter, then again on LW. At no point have I felt he’s trying to understand what I actually think. So I don’t see the point of continuing with him.
I see what you’re saying, but it looks like you’re strawmanning me yet again with a more extreme version of my position. You’ve done that several times and you need to stop that.
What you’ve argued here prevents me from questioning the forecasting performance of every pundit who I can’t formally score, which is ~all of them.
Yes, it’s not a real forecasting track record unless it meets the sort of criteria that are fairly well understood in Tetlockian research. And neither is Ben Garfinkel’s post, that doesn’t give us a forecasting track record, like on Metaculus.
But if a non-track-recorded person suggests they’ve been doing a good job anticipating things, it’s quite reasonable to point out non-scorable things they said that seem incorrect, even with no way to score it.
In an earlier draft of my essay, I considered getting into bets he’s made (several of which he’s lost). I ended up not including those things. Partly because my focus was waning and it was more attainable to stick to the meta-level point. And partly because I thought the essay might be better if it was more focused. I don’t think there is literally zero information about his forecasting performance (that’s not plausible), but it seemed like it would be more of a distraction from my epistemic point. Bets are not as informative as Metaculus-style forecasts, but they are better than nothing. This stuff is a spectrum, even Metaculus doesn’t retain some kinds of information about the forecaster. Still, I didn’t get into it, though I could have.
But I ended up later editing in a link to one of Paul’s comments, where he describes some reasons that Robin looks pretty bad in hindsight, but also includes several things Eliezer said that seem quite off. None of those are scorable. But I added in a link to that, because Eliezer explicitly claimed he came across better in that debate, which overall he may have, but it’s actually more mixed than that, and that’s relevant to my meta-point that one can obfuscate these things without a proper track record. And Ben Garfinkel’s post is similarly relevant.
If the community felt more ambivalently about Eliezer’s forecasts, or even if Eliezer was more ambivalent about his own forecasts? And then there was some guy trying to convince people he has made bad forecasts? Then your objection of one-sidedness would make much more sense to me. That’s not what this is.
Eliezer actively tells people he’s anticipating things well, but he deliberately prevents his forecasts from being scorable. Pundits do that too, and you bet I would eagerly criticize vague non-scorable stuff they said that seems wrong. And yes, I would retweet someone criticizing those things too. Does that also bother you?
IMO that’s a much more defensible position, and is what the discussion should have initially focused on. From my perspective, the way the debate largely went is:
Jotto: Eliezer claims to have a relatively successful forecasting track record, along with Dario and Demis; but this is clearly dissembling, because a forecasting track record needs to look like a long series of Metaculus predictions.
Other people: (repeat without qualification the claim that Eliezer is falsely claiming to have a “forecasting track record”; simultaneously claims that Eliezer has a subpar “forecasting track record”, based on evidence that wouldn’t meet Jotto’s stated bar)
Jotto: (signal-boosts the inconsistent claims other people are making, without noting that this is equivocating between two senses of “track record” and therefore selectively applying two different standards)
Rob B: (gripes and complains)
Whereas the way the debate should have gone is:
Jotto: I personally disagree with Eliezer that the AI Foom debate is easy to understand and cash out into rough predictions about how the field has progressed since 2009, or how it is likely to progress in the future. Also, I wish that all of Eliezer, Robin, Demis, Dario, and Paul had made way more Metaculus-style forecasts back in 2010, so it would be easier to compare their prediction performance. I find it frustrating that nobody did this, and think we should start doing this way more now. Also, I think this sharper comparison would probably have shown that Eliezer is significantly worse at thinking about this topic than Paul, and maybe than Robin, Demis, and Dario.
Rob B: I disagree with your last sentence, and I disagree quantitatively that stuff like the Foom debate is as hard-to-interpret as you suggest. But I otherwise agree with you, and think it would have been useful if the circa-2010 discussions had included more explicit probability distributions, scenario breakdowns, quantitative estimates, etc. (suitably flagged as unstable, spitballed ass-numbers). Even where these aren’t cruxy and don’t provide clear evidence about people’s quality of reasoning about AGI, it’s still just helpful to have a more precise sense of what people’s actual beliefs at the time were. “X is unlikely” is way less useful than knowing whether it’s more like 30%, or more like 5%, or more like 0.1%, etc.
I think the whole ‘X isn’t a real track record’ thing was confusing, and made your argument sound more forceful than it should have.
Plus maybe some disagreements about how possible it is in general to form good models of people and of topics like AGI in the absence of Metaculus-ish forecasts, and disagreement about exactly how informative it would be to have a hundred examples of narrow-AI benchmark predictions over the last ten years from all the influential EAs?
(I think it would be useful, but more like ‘1% to 10% of the overall evidence for weighing people’s reasoning and correctness about AGI’, not ’90% to 100% of the evidence’.)
(An exception would be if, e.g., it turned out that ML progress is way more predictable than Eliezer or I believe. ML’s predictability is a genuine crux for us, so seeing someone else do amazing at this prediction task for a bunch of years, with foresight rather than hindsight, would genuinely update us a bunch. But we don’t expect to learn much from Eliezer or Rob trying to predict stuff, because while someone else may have secret insight that lets them predict the future of narrow-AI advances very narrowly, we are pretty sure we don’t know how to do that.)
Part of what I object to is that you’re a Metaculus radical, whose Twitter bio says “Replace opinions with forecasts.”
This is a view almost no one in the field currently agrees with or tries to live up to.
Which is fine, on its own. I like radicals, and want to hear their views argued for and hashed out in conversation.
But then you selectively accuse Eliezer of lying about having a “track record”, without noting how many other people are also expressing non-forecast “opinions” (and updating on these), and while using language in ways that make it sound like Eliezer is doing something more unusual than he is, and making it sound like your critique is more independent of your nonstandard views on track records and “opinions” than it actually is.
That’s the part that bugs me. If you have an extreme proposal for changing EA’s norms, argue for that proposal. Don’t just selectively take potshots at views or people you dislike more, while going easy on everyone else.
I think Jotto has argued for the proposal in the past. Whether he did it in that particular comment is not very important, so long as he holds everyone to the same standards.
As for his standards: I think he sees Eliezer as an easy target because he’s high status in this community and has explicitly said that he thinks his track record is good (in fact, better than other people). On its own, therefore, it’s not surprising that Eliezer would get singled out.
I no longer see exchanges with you as a good use of energy, unless you’re able to describe some of the strawmanning of me you’ve done and come clean about that.
EDIT: Since this is being downvoted, here is a comment chain where Rob Besinger interpreted me in ways that are bizarre, such as suggesting that I think Eliezer is saying he has “a crystal ball”, or that “if you record any prediction anywhere other than Metaculus (that doesn’t have similarly good tools for representing probability distributions), you’re a con artist”. Things that sound thematically similar to what I was saying, but were weird, persistent extremes that I don’t see as good-faith readings of me. It kept happening over Twitter, then again on LW. At no point have I felt he’s trying to understand what I actually think. So I don’t see the point of continuing with him.