Executable philosophy as a failed totalizing meta-worldview

(this is an expanded, edited version of an x.com post)

It is easy to interpret Eliezer Yudkowsky’s main goal as creating a friendly AGI. Clearly, he has failed at this goal and has little hope of achieving it. That’s not a particularly interesting analysis, however. A priori, creating a machine that makes things ok forever is not a particularly plausible objective. Failure to do so is not particularly informative.

So I’ll focus on a different but related project of his: executable philosophy. Quoting Arbital:

Two motivations of “executable philosophy” are as follows:

We need a philosophical analysis to be “effective” in Turing’s sense: that is, the terms of the analysis must be useful in writing programs. We need ideas that we can compile and run; they must be “executable” like code is executable.
We need to produce adequate answers on a time scale of years or decades, not centuries. In the entrepreneurial sense of “good execution”, we need a methodology we can execute on in a reasonable timeframe.

There is such a thing as common sense rationality, which says the world is round, you shouldn’t play the lottery, etc. Formal notions like Bayesianism, VNM utility theory, and Solomonoff induction formalize something strongly related to this common sense rationality. Yudkowsky believes further study in this tradition can supersede ordinary academic philosophy, which he believes to be conceptually weak and motivated to continue ongoing disputes for more publications.

In the Sequences, Yudkowsky presents these formal ideas as the basis for a totalizing meta-worldview, of epistemic and instrumental rationality, and uses the meta-worldview to argue for his object-level worldview (which includes many-worlds, AGI foom, importance of AI alignment, etc.). While one can get totalizing (meta-)worldviews from elsewhere (such as interdisciplinary academic studies), Yudkowsky’s (meta-)worldview is relatively easy to pick up for analytically strong people (who tend towards STEM), and is effective (“correct” and “winning”) relative to its simplicity.

Yudkowsky’s source material and his own writing do not form a closed meta-worldview, however. There are open problems as to how to formalize and solve real problems. Many of the more technical sort are described in MIRI’s technical agent foundations agenda. These include questions about how to parse a physically realistic problem as a set of VNM lotteries (“decision theory”), how to use something like Bayesianism to handle uncertainty about mathematics (“logical uncertainty”), how to formalize realistic human values (“value loading”), and so on.

Whether or not the closure of this meta-worldview leads to creation of friendly AGI, it would certainly have practical value. It would allow real world decisions to be made by first formalizing them within a computational framework (related to Yudkowsky’s notion of “executable philosophy”), whether or not the computation itself is tractable (with its tractable version being friendly AGI).

The practical strategy of MIRI as a technical research institute is to go meta on these open problems by recruiting analytically strong STEM people (especially mathematicians and computer scientists) to work on them, as part of the agent foundations agenda. I was one of these people. While we made some progress on these problems (such as with the Logical Induction paper), we didn’t come close to completing the meta-worldview, let alone building friendly AGI.

With the Agent Foundations team at MIRI eliminated, MIRI’s agent foundations agenda is now unambiguously a failed project. I had called MIRI technical research as likely to fail around 2017 with the increase in internal secrecy, but at this point it is not a matter of uncertainty to those informed of the basic institutional facts. Some others, such as Wei Dai and Michel Vassar, had called even earlier the infeasibility of completing the philosophy with a small technical team.

What can be learned from this failure? One possible lesson is that totalizing (meta-)worldviews fail in general. This is basically David Chapman’s position: he promotes “reasonableness” and “meta-rationality”, and he doesn’t consider meta-rationality to be formalizable as rationality is. Rather, meta-rationality operates “around” formal systems and aids in creating and modifying these systems:

[Meta-rationality practitioners] produce these insights by investigating the relationship between a system of technical rationality and its context. The context includes a specific situation in which rationality is applied, the purposes for which it is used, the social dynamics of its use, and other rational systems that might also be brought to bear. This work operates not within a system of technical rationality, but around, above, and then on the system.

It would seem that one particular failure of constructing a totalizing (meta-)worldview is Bayesian evidence in favor of Chapmanian postrationalism, but this isn’t the only alternative. Perhaps it is feasible to construct a totalizing (meta-)worldview, but it failed in this case for particular reasons. Someone familiar with the history of the rationality scene can point to plausible causal factors (such as non-technical social problems) in this failure. Two possible alternatives are:

that the initial MIRI (meta-)worldview was mostly correct, but that MIRI’s practical strategy of recruiting analytically strong STEM people to complete it failed;
or that it wasn’t mostly correct, so a different starting philosophy is needed.

Mostly, I don’t see people acting as if the first branch is the relevant one. Orthogonal, an agent foundations research org, is most acting like they believe this out of relevant organizations. And my own continued commentary on philosophy relevant to MIRI technical topics shows some interest in this branch, although my work tends to point towards wider scope of philosophy rather than meta-worldview closure.

What about a different starting philosophy? I see people saying that the Sequences were great and someone else should do something like them. Currently, I don’t see opportunity in this. Yudkowsky wrote the Sequences at a time when many of the basic ideas, such as Bayesianism and VNM utility, were in the water supply in sufficiently elite STEM circles, and had credibility (for example, they were discussed in Artificial Intelligence: A Modern Approach). There don’t currently seem to be enough credible abstractions floating around in STEM to form a totalizing (meta-)worldview out of.

This is partially due to social factors including a decline in belief in neoliberalism, meritocracy, and much of science. Fewer people than before think the thing to be doing is apolitical elite STEM-like thinking. Postmodernism, a general critique of meta-narratives, has reached more of elite STEM, and the remainder are more focused on countering postmodernism than they were before. And the AI risk movement has moved much of its focus from technical research to politics, and much of its technical focus from agent foundations to empirical deep learning research.

Now is a post-paradigmatic stage, that may move to pre-paradigmatic (and then paradigmatic) as different abstract ideas become credible. Perhaps, for example, some credible agency abstractions will come from people playing around with and trying to understand deep learning systems, and these can shore up “reasonable” and “meta-rational” gaps in the application of rationality, and/or construct new rationality theory. Or perhaps something will come of people reading old philosophers like Kant (with Karl Popper as a historical precedent). But immediately forming and explicating a new paradigm seems premature.

And so, I accept that the current state of practical rationality involves what Chapman calls “reasonableness” and “meta-rationality”, though I take this to be a commentary on the current state of rationality frameworks and discourse rather than a universal. I believe more widespread interdisciplinary study is reasonable for the intellectually ambitious in this context.