davidad comments on Conversation on technology forecasting and gradualism

davidad 10 Dec 2021 19:12 UTC
LW: 11 AF: 5
AF
Here’s my attempt at formalizing the tension between gradualism and catastrophism.
1. As a background assumption about world-modeling, let’s suppose that each person’s fine-grained model about how the state of the world moves around can be faithfully represented as a stochastic differential equation driven by a Lévy process. This is a common generalization of Brownian-motion-driven processes, deterministic exponential growth, or Poisson processes (the latter of which only change discontinuously); it basically covers any stochastic model that unfolds over continuous time in way that’s Markovian (i.e. the past can affect the future only via the present) and not sensitive to where the $t = 0$ point is.
2. The Lévy-Khintchine formula guarantees that any Lévy process decomposes into a deterministic drift, a diffusion process (like a Brownian motion with some arbitrary covariance), and a jump process (like a Poisson process with some arbitrary intensity measure).
3. Now, we’re going to consider some observable predicate, a function from the state space $X$ to $B o o l$ (like “is there TAI yet?”), and push forward the stochastic model to get a distribution over hitting times. It’s worth pointing out that in some sense the entire point of exercises like defining TAI is to introduce a “discontinuity” into everyone’s model. In the conversation above, everyone seems to agree as background knowledge that there is a certain point at which something “can FOOM”, and that this is a discontinuous event (although the consequent takeoff may be very slow, or not, etc.). What’s disputed here is about how one might model the process that leads up to this event.
4. The gradualist position is that the jump-process terms are negligible (it’s basically a diffusion), and the catastrophist position is that the diffusion-process terms are negligible (it’s basically a point process). By “negligible”, I mean that, if we have the right model, we should be able to zero out those terms from our underlying Lévy process, and not see much difference in the hitting times.
5. Diffusion processes are all kind of alike, and you can make good bets about them based on historical data. Like Gaussians, they are characterized by covariances; finance people love them. This is why someone inclined to take the gradualist position about progress sees the histories of completely unrelated progress as providing meaningful evidence about AGI.
6. Jump processes are generally really different from each other, and are hard to forecast from historical data (especially when the intensity is low but the jump distance is high). This is why someone inclined to reject the gradualist position sees little relevance in histories of unrelated progress.
7. The gradualist seems likely to have a more diffusion-oriented modeling toolbox in general; they’re more likely to reach for kernel density estimation than point process regression; this is why the non-gradualist expects the gradualist to have overestimated the probability that an adequate replacement for Steve Jobs could be found.
8. From the gradualist point of view, the non-gradualist lacks intuition about potential-energy landscapes, and so seems to find it plausible that large energy barriers are more likely to be tunneled through in a single jump than crossed gradually by diffusion and ratcheting. The gradualist says this leads the non-gradualist to systematically overestimate the probabilities of such breakthroughs, which perhaps has manifested as a string of unprofitable investments in physical-tech startups.
9. It gets confusing at this point because both sides seem to be accusing the other side of overestimating the probabilities of rare events. “Wait, who actually has shorter timelines here, what’s going on?” But this debate isn’t about the layer where we summarize reality into the one rare event of AGI being deployed, it’s about how to think about the underlying state-space model. Both sides think that the other side is focusing on a term of the differential equation that’s really negligible as a driver of phase transitions, so their paradigmatic cases to show the other side is overestimating the contribution from those terms are about when they overestimate the probability of phase transitions.
10. Finally, I want to point out the obvious third option: neither diffusion terms nor jump terms are negligible. I suspect this is Eliezer’s true position, and the one that enables him to “make the obvious boring prediction” (as Paul says) in areas where diffusion is very relevant (by fitting it to historical data), while also saying “sometimes trends break upward” and also saying “the obvious boring predictions about AGI are negligible”, all driven from a coherent overall world-model.