Yes, I’m saying that each $ increment the “qualitative division” model fares worse and worse. I think that people who hold onto this qualitative division have generally been qualitatively surprised by the accomplishments of LMs, that when they make concrete forecasts those forecasts have mismatched reality, and that they should be updating strongly about whether such a division is real.
What instead the model considers relevant is whether, when you look at the LLM’s output, that output seems to exhibit properties of cognition that are strongly prohibited by the model’s existing expectations about weak versus strong cognitive work—and if it doesn’t, then the model simply doesn’t update; it wasn’t, in fact, surprised by the level of cognition it observed—even if (perhaps) the larger model embedding it, which does track things like how the automation of certain tasks might translate into revenue/profit, was surprised.
I’m most of all wondering how you get high level of confidence in the distinction and its relevance. I’ve seen only really vague discussion. The view that LM cognition doesn’t scale into generality seems wacky to me. I want to see the description of tasks it can’t do.
In general if someone won’t state any predictions of their view I’m just going to update about your view based on my understanding of what it predicts (which is after all what I’d ultimately be doing if I took a given view seriously). I’ll also try to update about your view as operated by you, and so e.g. if you were generally showing a good predictive track record or achieving things in the world then I would be happy to acknowledge there is probably some good view there that I can’t understand.
I’m confused that you consider this “considerable”, and would write up a comment chastising Eliezer and the other “fast takeoff” folk because they… weren’t hugely moved by, like, ~2 bits’ worth of evidence? Like, I don’t see why he couldn’t just reply, “Sure, I updated by around 2 bits, which means that now I’ve gone from holding fast takeoff as my dominant hypothesis to holding fast takeoff as my dominant hypothesis.”
I do think that a factor of two is significant evidence. In practice in my experience that’s about as much evidence as you normally get between realistic alternative perspectives in messy domains. The kind of forecasting approach that puts 99.9% probability on things and so doesn’t move until it gets 10 bits is just not something that works in practice.
On the slip side, it’s enough evidence that Eliezer is endlessly condescending about it (e.g. about those who only assigned a 50% probability to the covid response being as inept as it was). Which I think is fine (but annoying), a factor of 2 is real evidence. And if I went around saying “Maybe our response to AI will be great” and then just replied to this observation with “whatever covid isn’t the kind of thing I’m talking about” without giving some kind of more precise model that distinguishes, then you would be right to chastise me.
Perhaps more importantly, I just don’t know where someone with this view would give ground. Even if you think any given factor of two isn’t a big deal, ten factors of two is what gets you from 99.9% to 50%. So you can’t just go around ignoring a couple of them every few years!
And rhetorically, I’m not complaining about people ultimately thinking fast takeoff is more plausible. I’m complaining about not expressing the view in such a way that we can learn about it based on what appears to me to be multiple bits of evidence, or acknowledging that evidence. This isn’t the only evidence we’ve gotten, I’m generally happy to acknowledge many bits of ways in which my views have moved towards other people’s.
Yes, I’m saying that each $ increment the “qualitative division” model fares worse and worse. I think that people who hold onto this qualitative division have generally been qualitatively surprised by the accomplishments of LMs, that when they make concrete forecasts those forecasts have mismatched reality, and that they should be updating strongly about whether such a division is real.
I’m most of all wondering how you get high level of confidence in the distinction and its relevance. I’ve seen only really vague discussion. The view that LM cognition doesn’t scale into generality seems wacky to me. I want to see the description of tasks it can’t do.
In general if someone won’t state any predictions of their view I’m just going to update about your view based on my understanding of what it predicts (which is after all what I’d ultimately be doing if I took a given view seriously). I’ll also try to update about your view as operated by you, and so e.g. if you were generally showing a good predictive track record or achieving things in the world then I would be happy to acknowledge there is probably some good view there that I can’t understand.
I do think that a factor of two is significant evidence. In practice in my experience that’s about as much evidence as you normally get between realistic alternative perspectives in messy domains. The kind of forecasting approach that puts 99.9% probability on things and so doesn’t move until it gets 10 bits is just not something that works in practice.
On the slip side, it’s enough evidence that Eliezer is endlessly condescending about it (e.g. about those who only assigned a 50% probability to the covid response being as inept as it was). Which I think is fine (but annoying), a factor of 2 is real evidence. And if I went around saying “Maybe our response to AI will be great” and then just replied to this observation with “whatever covid isn’t the kind of thing I’m talking about” without giving some kind of more precise model that distinguishes, then you would be right to chastise me.
Perhaps more importantly, I just don’t know where someone with this view would give ground. Even if you think any given factor of two isn’t a big deal, ten factors of two is what gets you from 99.9% to 50%. So you can’t just go around ignoring a couple of them every few years!
And rhetorically, I’m not complaining about people ultimately thinking fast takeoff is more plausible. I’m complaining about not expressing the view in such a way that we can learn about it based on what appears to me to be multiple bits of evidence, or acknowledging that evidence. This isn’t the only evidence we’ve gotten, I’m generally happy to acknowledge many bits of ways in which my views have moved towards other people’s.