My AI Model Delta Compared To Christiano

Preamble: Delta vs Crux

This section is redundant if you already read My AI Model Delta Compared To Yudkowsky.

I don’t natively think in terms of cruxes. But there’s a similar concept which is more natural for me, which I’ll call a delta.

Imagine that you and I each model the world (or some part of it) as implementing some program. Very oversimplified example: if I learn that e.g. it’s cloudy today, that means the “weather” variable in my program at a particular time[1] takes on the value “cloudy”. Now, suppose your program and my program are exactly the same, except that somewhere in there I think a certain parameter has value 5 and you think it has value 0.3. Even though our programs differ in only that one little spot, we might still expect very different values of lots of variables during execution—in other words, we might have very different beliefs about lots of stuff in the world.

If your model and my model differ in that way, and we’re trying to discuss our different beliefs, then the obvious useful thing-to-do is figure out where that one-parameter difference is.

That’s a delta: one or a few relatively “small”/​local differences in belief, which when propagated through our models account for most of the differences in our beliefs.

For those familiar with Pearl-style causal models: think of a delta as one or a few do() operations which suffice to make my model basically match somebody else’s model, or vice versa.

This post is about my current best guesses at the delta between my AI models and Paul Christiano’s AI models. When I apply the delta outlined here to my models, and propagate the implications, my models mostly look like Paul’s as far as I can tell. That said, note that this is not an attempt to pass Paul’s Intellectual Turing Test; I’ll still be using my own usual frames.

My AI Model Delta Compared To Christiano

Best guess: Paul thinks that verifying solutions to problems is generally “easy” in some sense. He’s sometimes summarized this as “verification is easier than generation”, but I think his underlying intuition is somewhat stronger than that.

What do my models look like if I propagate that delta? Well, it implies that delegation is fundamentally viable in some deep, general sense.

That propagates into a huge difference in worldviews. Like, I walk around my house and look at all the random goods I’ve paid for—the keyboard and monitor I’m using right now, a stack of books, a tupperware, waterbottle, flip-flops, carpet, desk and chair, refrigerator, sink, etc. Under my models, if I pick one of these objects at random and do a deep dive researching that object, it will usually turn out to be bad in ways which were either nonobvious or nonsalient to me, but unambiguously make my life worse and would unambiguously have been worth-to-me the cost to make better. But because the badness is nonobvious/​nonsalient, it doesn’t influence my decision-to-buy, and therefore companies producing the good are incentivized not to spend the effort to make it better. It’s a failure of ease of verification: because I don’t know what to pay attention to, I can’t easily notice the ways in which the product is bad. (For a more game-theoretic angle, see When Hindsight Isn’t 2020.)

On (my model of) Paul’s worldview, that sort of thing is rare; at most it’s the exception to the rule. On my worldview, it’s the norm for most goods most of the time. See e.g. the whole air conditioner episode for us debating the badness of single-hose portable air conditioners specifically, along with a large sidebar on the badness of portable air conditioner energy ratings.

How does the ease-of-verification delta propagate to AI?

Well, most obviously, Paul expects AI to go well mostly via humanity delegating alignment work to AI. On my models, the delegator’s incompetence is a major bottleneck to delegation going well in practice, and that will extend to delegation of alignment to AI: humans won’t get what we want by delegating because we don’t even understand what we want or know what to pay attention to. The outsourced alignment work ends up bad in nonobvious/​nonsalient (but ultimately important) ways for the same reasons as most goods in my house. But if I apply the “verification is generally easy” delta to my models, then delegating alignment work to AI makes total sense.

Then we can go even more extreme: HCH, aka “the infinite bureaucracy”, a model Paul developed a few years ago. In HCH, the human user does a little work then delegates subquestions/​subproblems to a few AIs, which in turn do a little work then delegate their subquestions/​subproblems to a few AIs, and so on until the leaf-nodes of the tree receive tiny subquestions/​subproblems which they can immediately solve. On my models, HCH adds recursion to the universal pernicious difficulties of delegation, and my main response is to run away screaming. But on Paul’s models, delegation is fundamentally viable, so why not delegate recursively?

(Also note that HCH is a simplified model of a large bureaucracy, and I expect my views and Paul’s differ in much the same way when thinking about large organizations in general. I mostly agree with Zvi’s models of large organizations, which can be lossily-but-accurately summarized as “don’t”. Paul, I would guess, expects that large organizations are mostly reasonably efficient and reasonably aligned with their stakeholders/​customers, as opposed to universally deeply dysfunctional.)

Propagating further out: under my models, the difficulty of verification accounts for most of the generalized market inefficiency in our world. (I see this as one way of framing Inadequate Equilibria.) So if I apply a “verification is generally easy” delta, then I expect the world to generally contain far less low-hanging fruit. That, in turn, has a huge effect on timelines. Under my current models, I expect that, shortly after AIs are able to autonomously develop, analyze and code numerical algorithms better than humans, there’s going to be some pretty big (like, multiple OOMs) progress in AI algorithmic efficiency (even ignoring a likely shift in ML/​AI paradigm once AIs start doing the AI research). That’s the sort of thing which leads to a relatively discontinuous takeoff. Paul, on the other hand, expects a relatively smooth takeoff—which makes sense, in a world where there’s not a lot of low-hanging fruit in the software/​algorithms because it’s easy for users to notice when the libraries they’re using are trash.

That accounts for most of the known-to-me places where my models differ from Paul’s. I put approximately-zero probability on the possibility that Paul is basically right on this delta; I think he’s completely out to lunch. (I do still put significantly-nonzero probability on successful outsourcing of most alignment work to AI, but it’s not the sort of thing I expect to usually work.)