(this is an expanded, edited version of an x.com post)
It is easy to interpret Eliezer Yudkowsky’s main goal as creating a friendly AGI. Clearly, he has failed at this goal and has little hope of achieving it. That’s not a particularly interesting analysis, however. A priori, creating a machine that makes things ok forever is not a particularly plausible objective. Failure to do so is not particularly informative.
So I’ll focus on a different but related project of his: executable philosophy. Quoting Arbital:
Two motivations of “executable philosophy” are as follows:
We need a philosophical analysis to be “effective” in Turing’s sense: that is, the terms of the analysis must be useful in writing programs. We need ideas that we can compile and run; they must be “executable” like code is executable.
We need to produce adequate answers on a time scale of years or decades, not centuries. In the entrepreneurial sense of “good execution”, we need a methodology we can execute on in a reasonable timeframe.
There is such a thing as common sense rationality, which says the world is round, you shouldn’t play the lottery, etc. Formal notions like Bayesianism, VNM utility theory, and Solomonoff induction formalize something strongly related to this common sense rationality. Yudkowsky believes further study in this tradition can supersede ordinary academic philosophy, which he believes to be conceptually weak and motivated to continue ongoing disputes for more publications.
In the Sequences, Yudkowsky presents these formal ideas as the basis for a totalizing meta-worldview, of epistemic and instrumental rationality, and uses the meta-worldview to argue for his object-level worldview (which includes many-worlds, AGI foom, importance of AI alignment, etc.). While one can get totalizing (meta-)worldviews from elsewhere (such as interdisciplinary academic studies), Yudkowsky’s (meta-)worldview is relatively easy to pick up for analytically strong people (who tend towards STEM), and is effective (“correct” and “winning”) relative to its simplicity.
Yudkowsky’s source material and his own writing do not form a closed meta-worldview, however. There are open problems as to how to formalize and solve real problems. Many of the more technical sort are described in MIRI’s technical agent foundations agenda. These include questions about how to parse a physically realistic problem as a set of VNM lotteries (“decision theory”), how to use something like Bayesianism to handle uncertainty about mathematics (“logical uncertainty”), how to formalize realistic human values (“value loading”), and so on.
Whether or not the closure of this meta-worldview leads to creation of friendly AGI, it would certainly have practical value. It would allow real world decisions to be made by first formalizing them within a computational framework (related to Yudkowsky’s notion of “executable philosophy”), whether or not the computation itself is tractable (with its tractable version being friendly AGI).
The practical strategy of MIRI as a technical research institute is to go meta on these open problems by recruiting analytically strong STEM people (especially mathematicians and computer scientists) to work on them, as part of the agent foundations agenda. I was one of these people. While we made some progress on these problems (such as with the Logical Induction paper), we didn’t come close to completing the meta-worldview, let alone building friendly AGI.
With the Agent Foundations team at MIRI eliminated, MIRI’s agent foundations agenda is now unambiguously a failed project. I had called MIRI technical research as likely to fail around 2017 with the increase in internal secrecy, but at this point it is not a matter of uncertainty to those informed of the basic institutional facts. Some others, such as Wei Dai and Michel Vassar, had called even earlier the infeasibility of completing the philosophy with a small technical team.
What can be learned from this failure? One possible lesson is that totalizing (meta-)worldviews fail in general. This is basically David Chapman’s position: he promotes “reasonableness” and “meta-rationality”, and he doesn’t consider meta-rationality to be formalizable as rationality is. Rather, meta-rationality operates “around” formal systems and aids in creating and modifying these systems:
[Meta-rationality practitioners] produce these insights by investigating the relationship between a system of technical rationality and its context. The context includes a specific situation in which rationality is applied, the purposes for which it is used, the social dynamics of its use, and other rational systems that might also be brought to bear. This work operates not within a system of technical rationality, but around, above, and then on the system.
It would seem that one particular failure of constructing a totalizing (meta-)worldview is Bayesian evidence in favor of Chapmanian postrationalism, but this isn’t the only alternative. Perhaps it is feasible to construct a totalizing (meta-)worldview, but it failed in this case for particular reasons. Someone familiar with the history of the rationality scene can point to plausible causal factors (such as non-technical social problems) in this failure. Two possible alternatives are:
that the initial MIRI (meta-)worldview was mostly correct, but that MIRI’s practical strategy of recruiting analytically strong STEM people to complete it failed;
or that it wasn’t mostly correct, so a different starting philosophy is needed.
Mostly, I don’t see people acting as if the first branch is the relevant one. Orthogonal, an agent foundations research org, is most acting like they believe this out of relevant organizations. And my own continued commentary on philosophy relevant to MIRI technical topics shows some interest in this branch, although my work tends to point towards wider scope of philosophy rather than meta-worldview closure.
What about a different starting philosophy? I see people saying that the Sequences were great and someone else should do something like them. Currently, I don’t see opportunity in this. Yudkowsky wrote the Sequences at a time when many of the basic ideas, such as Bayesianism and VNM utility, were in the water supply in sufficiently elite STEM circles, and had credibility (for example, they were discussed in Artificial Intelligence: A Modern Approach). There don’t currently seem to be enough credible abstractions floating around in STEM to form a totalizing (meta-)worldview out of.
This is partially due to social factors including a decline in belief in neoliberalism, meritocracy, and much of science. Fewer people than before think the thing to be doing is apolitical elite STEM-like thinking. Postmodernism, a general critique of meta-narratives, has reached more of elite STEM, and the remainder are more focused on countering postmodernism than they were before. And the AI risk movement has moved much of its focus from technical research to politics, and much of its technical focus from agent foundations to empirical deep learning research.
Now is a post-paradigmatic stage, that may move to pre-paradigmatic (and then paradigmatic) as different abstract ideas become credible. Perhaps, for example, some credible agency abstractions will come from people playing around with and trying to understand deep learning systems, and these can shore up “reasonable” and “meta-rational” gaps in the application of rationality, and/or construct new rationality theory. Or perhaps something will come of people reading old philosophers like Kant (with Karl Popper as a historical precedent). But immediately forming and explicating a new paradigm seems premature.
And so, I accept that the current state of practical rationality involves what Chapman calls “reasonableness” and “meta-rationality”, though I take this to be a commentary on the current state of rationality frameworks and discourse rather than a universal. I believe more widespread interdisciplinary study is reasonable for the intellectually ambitious in this context.
I liked reading this philosophical and sociological history!
Indeed, it’s hard to reliably do groundbreaking scientific and philosophical research with a team of ~10 people over the course of ~10 years. I think it was well worth the effort and did far better than one would naively expect of such a team — especially given that the natural project-lead was plagued by so many chronic health issues as to be ~entirely unable to do any management. I will continue to support further efforts of this sort e.g. Orthogonal.
It is not clear to me what makes a worldview totalizing. Would Newtonian mechanics be a totalizing worldview? If not, is it a worldview? Is any worldview in physics after Newton non-totalizing? (My guess is no.) Is Greek geometry a la Euclid?
No because it’s a physics theory. It is a descriptive theory of physical laws applying to matter and so on. It is not even a theory of how to do science. It is limited to one domain, and not expandable to other domains.
OK.
Decision theorists holds that for every sequence of observations and every utility function (set of goals), there is exactly one best move or optimal action (namely, the one that maximizes expected utility conditional on the observations). Does trying to use decision theory as much as possible in one’s life tend to push one into having a totalizing worldview?
Decision theory itself is relatively narrowly scoped, but application of decision theory is broadly scoped, as it could be applied to practically any decision. Executable philosophy and the Sequences include further aspects beyond decision theory.
(Reposting my comment from twitter.)
Rationalist-empiricists wanted to derive everything from coherence + contact with the world. The assumption was that the world was mostly homogeneous, and so by becoming super powerful at learning one part of the world, it would open up easy entry into the rest.
It failed because the world is mostly inhomogeneous, and the appearance of homogeneity is simply because it is easier to command assent for the things that everyone can see.
What do you mean by “homogeneous” here?
The negation of the backbone conjecture, though that may sound obviously wrong since rationalists obviously expect the laws of physics to function as a backbone for the universe, but they are merely a backbone in the sense of information and not of magnitude.
I must say, you did a very poor job at answering my question.
If I was going to steelman Mr Tailcalled, I’d imagine that he was trying to “point at the reason” that transfer learning is far and away the exception.
Mostly learning (whether in humans, beasts, or software) happens relative to a highly specific domain of focus and getting 99.8% accuracy in the domain, and making a profit therein… doesn’t really generalize. I can’t run a hedge fund after mastering the hoola hoop, and I can’t win a boxing match from learning to recognize real and forged paintings. NONE of these skills would be much help in climbing a 200 foot tall redwood tree with my bare hands and bare feet… and mastering the Navajo language is yet again “mostly unrelated” to any of them. The challenges we agents seem to face in the world are “one damn thing after another”.
(Arguing against this steelman, the exception here might be “next token prediction”. Mastering next token prediction seems to grant the power to play Minecraft through APIs, win art contests, prove math theorems, and drive theologically confused people into psychosis. However, consistent with the steelman, next token prediction hasn’t seemed to offer any help at fabbing smaller and faster and more efficient computer chips. If next token prediction somehow starts to make chip fabbing go much faster, then hold onto your butts.)
This is exactly the reason why I asked the initial question. There is a reading of tailcalled’s statement which makes it correct and there is a reading which makes it wrong. I was curious which meaning is implied, and whether the difference between the two even understood.
When talking about top performance in highly specific domains, one should indeed use lots of domain specific tricks. But in a grand scheme of things the rule of “coherence + contact with the world” is extremely helpful, among other things it allows to derive all the specific tricks for all the different domains.
Likewise, there is a sense in which rationalist-empiricists project didn’t deliver to the fullest of our expectations when solving multiple specific technical problems. On the other hand, it definetely has succeed in a sense that philosophies based on this approach were so triumphant and delivered so many fruits, that we put them in a league of their own called “science”.
This assumes you have contact with all the different domains, which you don’t, rather than just some of them.
Can you give an example of a domain which I have no contact with, so the coherence + contact with the world methodology won’t help me to figure out the corresponding domain specific tricks for succeeding in it, yet such tricks exist in principle?
Farming, law enforcement, war, legislation, chip fabbing, space colonization, cargo trucking, …
Space colonization obviously includes cargo trucking, farming, legislation, chip fabbing, law enforcement, and, for appreciators, war.
But I don’t think you are doing space colonization. I’d guess you are doing reading/writing on social media, programming, grocery shopping, cooking, … . And I think recursive self-improvement is supposed to work with no experience in space colonization.
Meaning of my comment was “your examples are very weak in proving absense of cross-domain generalization”.
And if we are talking about me, right now I’m doing statistics, physics and signal processing, which seems to be awfully generalizable.
I can buy that there’s a sort of “trajectory of history” that makes use of all domains at once, I just think this is the opposite of what rationalist-empiricists are likely to focus on.
This is precisely the position that I am referring to when I say “the assumption was that the world is mostly homogeneous”. Like physics is generalizable if you think the nature of the world is matter. And you can use energy from the sun to decompose anything into matter, allowing you to command universal assent that everything is matter. But does that mean matter is everything? Does your physics knowledge tell you how to run a company? If not, why say it is “awfully generalizable”?
I don’t see how it makes sense in the context we are talking about.
Let’s take farming. Clearly, it’s not some separate magisteria which I do not have any connection to. Farming is happening in the same reality. I can see how people farm things, do it myself, learn about different methods, do experiments myself and so on. The “coherence + contact with the world” seems to be very helpful here.
I think of “the rationalist project” as “having succeeded” in a very limited and relative sense that is still quite valuable.
For example, back when the US and Chinese governments managed to accidentally make a half-cocked bioweapon and let it escape from a lab and then not do any adequate public health at all, or hold the people who caused the megadeath event even slightly accountable, and all of the institutions of basically every civilization on Earth failed to do their fucking jobs, the “rationalists” (ie the people on LW and so on) were neck and neck with anonymous anime catgirls on twitter (who overlap a lot with rationalists in practice) in terms of being actually sane and reasonable voices in the chaos… and it turns out that having some sane and reasonable voices is useful!
Eliezer says “Rationalists should win” but Yvain said “its really not that great” and Yvain got more upvotes (90 vs 247 currently) so Yvain is prolly right, right? But either way it means rationality is probably at least a little bit great <3
Rank the tasks by size as measured by e.g. energy content. Playing Minecraft, proving math theorems, and driving a theologically confused person to psychosis are all small in terms of energy, especially when considering that the models are not consistently driving anyone to psychosis and thus the theologically confused person who was driven to psychosis was probably highly predisposed to it.
Art competitions are more significant, but AFAIK the times when it won art competitions, it relied on human guidance. I tried to ask ChatGPT to make a picture that could win an art competition without giving it any more guidance, and it made this, which yes is extremely beautiful, but also seems deeply Gnostic and so probably unlikely to win great art competitions. AI art thus seems more suited for Gnostic decoration than for greatness. (Maybe healthy people will eventually develop an aversion to it? Already seems on the way; e.g. art competitions tend to forbid AI art.)
So, next token prediction can succeed in a lot of pathetic tasks. It has also gotten a lot of data with examples of completions of pathetic tasks. Thus the success doesn’t rely on homogeneity (extrapolation), it relies on heterogeneity of data (interpolation).
It’s not an accident that it has data on weak tasks. There are more instances of small forms than large forms, so there is more data available on the smaller forms. In order to get the data on the larger forms, it will take work to integrate it with the world, and let the data drill into the AI.
I read your gnostic/pagan stuff and chuckled over the “degeneracy [ranking where] Paganism < … < Gnosticism < Atheism < Buddhism”.
I think I’ll be better able to steelman you in the future and I’m sorry if I caused you to feel misrepresented with my previous attempt. I hadn’t realized that the vibe you’re trying to serve is so Nietzschean.
Just to clarify, when you say “pathetic” it is is not intended to evoke “pathos” and function as an even hypothetically possible compliment regarding a wise and pleasant deployment of feelings (even subtle feelings) in accord with reason, that could be unified and balanced to easily and pleasantly guide persons into actions in accord with The Good after thoughtful cultivation...
...but rather I suspect you intended it as a near semantic neighbor (but with opposite moral valence) of something like “precious” (as an insult (as it is in some idiolects)) in that both “precious and pathetic things” are similarly weak and small and in need of help.
Like the central thing you’re trying to communicate with the word “pathetic” (I think, but am not sure, and hence I’m seeking clarification) is to notice that entities labeled with that adjective could hypothetically be beloved and cared for… but you want to highlight how such things are also sort of worthy of contempt and might deserve abandonment.
We could argue: Such things are puny. They will not be good allies. They are not good role models. They won’t autonomously grow. They lack the power to even access whole regimes of coherently possible data gathering loops. They “will not win” and so, if you’re seeking “systematized winning”, such “pathetic” things are not where you should look. Is this something like what you’re trying to point to by invoking “patheticness” so centrally in a discussion of “solving philosophy formally”?
I agree that we can say that MIRI’s research program failed. But I don’t think it makes sense to say that the whole totalizing meta-worldview has failed.
This particular meta-worldview has arguably been developing since Democratus, 3,000 years ago. Being extremely conservative, it’s as old as the concept of a Turing machine. It seems a bit weird to consider the reductionist / reduction-to-mathematics project as a whole to be failed, because MIRI in particular failed to close all the open problems, over the span of about 10 years.
It seems most likely to me that most of the meta worldview was roughly right, but is still incomplete.
None of what you’re talking about is particular to the Sequences. It’s a particular synthesis of ideas including reductionism, Bayesianism, VNM, etc. I’m not really sure why the Sequences would be important under your view except as a popularization of pre-existing concepts.
Has MIRI at any moment had an explicit goal to “solve philosophy”?
I have no insider perspective but from what I got, it seems that philosophical problems were mostly tangental to the MIRI’s work and were engaged with only as a side effect of trying to create FAI. Am I wrong?
...try reading the linked “Executable Philosophy” Arbital page?
There is no answer to my question in the “Executable Philosophy” Arbital page.
The fact that Eliezer developed a framework that can be used for systematically solving philosophical problems, and even the fact that it was applied to solve several of such problems, doesn’t mean that MIRI had an explicit goal to solve philosophy.
This seem to support my hypothesis that the goal was to solve problems arising in the creation of FAI, and the fact that some of such problems are considered to belong to the realm of philosophy was only tangential, but I’d like to hear explicit acknowledgment/disproval from someone who had an insider perspective.
It appears Eliezer thinks executable philosophy addresses most philosophical issues worth pursuing:
“Solving philosophy” is a grander marketing slogan that I don’t think was used, but, clearly, executable philosophy is a philosophically ambitious project.
Which completely fits the interpretation where he thinks that philosophical issues worth pursuing can be solved in principle via executive philosophy, that executive philosophy is the best known tool for solving such issues, and yet MIRI’s mission is not solving such issues for the sake of philosophy itself, but only as long as they contribute to the creation of FAI and therefore MIRI never had a dedicated task force for philosophy and gave up pursuing these issues as soon as they figured out that they are not going to make the FAI.
There is also a trivial case where Eliezer thinks that any philosophical issue that is unable to contribute to the creation of FAI is not worth pursuing in principle, but I suppose we can omit it for now.
I don’t see a need for such vague, non-committal statements when we can address the core crux. Did or did not MIRI actively tried to solve philosophical problems for the sake of philosophical problems using the framework of executable philosophy? If it did, what exactly didn’t work?
There might be a confusion. Did you get the impression from my post that I think MIRI was trying to solve philosophy?
I do think other MIRI researchers and I would think of the MIRI problems as philosophical in nature even if they’re different from the usual ones, because they’re more relevant and worth paying attention to, given the mission and so on, and because (MIRI believes) they carve philosophical reality at the joints better than the conventional ones.
Whether it’s “for the sake of solving philosophical problems or not”… clearly they think they would need to solve a lot of them to do FAI.
EDIT: for more on MIRI philosophy, see deconfusion, free will solution.
I did get the feeling that your post implies that. Could you help me clear my confusion?
For example here you say:
This is something I agree with Eliezer about. I can clearly see that “executive philosophy” framework is what made his reasoning about philosophy adjecent topics so good campared to the baseline which I encountered before. Solution to free will is a great example.
But you seem to be framing this as a failed project. In what sense did it fail?
The reading that I got is that executive philosophy framework was trying to complete/solve philosophy and failed.
But if executive philosophy framework wasn’t even systematically applied to the solutions of philosophical problems in the first place, if the methodology wasn’t given a fair shot, in what sense can we say that it failed to complete philosophy? If this has never been the goal in what sense Wei Dai’s and Michel Vassar’s statements are relevant here?
MIRI research topics are philosophical problems. Such as decision theory and logical uncertainty. And they would have to solve more. Ontology identification is a philosophical problem. Really, how would you imagine doing FAI without solving much of philosophy?
I think the post is pretty clear about why I think it failed. MIRI axed the agent foundations team and I can see very very few people continue to work on these problems. Maybe in multiple decades (past many of the relevant people’s median superintelligence timelines) some of the problems will get solved but I don’t see “push harder on doing agent foundations” as a thing people are trying to do.
Re the failed totalizing worldview, I’d say a lot of the failure comes down mostly not to the philosophical premises being incorrect (with a few exceptions), but rather a combo of underestimating how hard inference is from bare premises, without relying on empirical results, related to computational complexity and a failure to scale down from the idealized reasoner, combined with philosophical progress being mostly unnecessary for Friendly AI.
Which is why I refuse to generalize from Eliezer Yudkowsky and MIRI’s failure to make Friendly AI to all hopes of making Friendly AI failing, or even most hopes of Friendly AI failing.
David Deutsch has constructed one from (mostly) the universality of computation, and Poparian ideas about explanations.
I think that saying that “executable philosophy” has failed is missing Yudkowsky’s main point. Quoting from the Arbital page:
He claims that unless we learn how to translate philosophy into “ideas that we can compile and run”, aligned AGI is out of the question. This is not a worldview, but an empirical proposition, the truth of which remains to be determined.
There’s also an adjacent worldview, which suffuses the Sequences, that it’s possible in the relatively short term to become much more generally “rational” than even the smartest uninitiated people, “faster than science” etc, and that this is chiefly rooted in Bayes, Solomonoff &Co. It’s fair to conclude that this has largely failed, and IMO Chapman makes a convincing case that this failure was unavoidable. (He also annoyingly keeps hinting that there is a supremely fruitful “meta-rational” worldview instead that he’s about to reveal to the world. Any day now. I’m not holding my breath.)
Reminded me of the complex systems chapter from a textbook (Center for AI Safety)
https://www.aisafetybook.com/textbook/complex-systems
I’ve read a bit of the logical induction paper, but I still don’t understand why Bayesian probability isn’t sufficient for reasoning about math. It seems that the Cox axioms still apply to logical uncertainty, and in fact “parts of the environment are too complex to model” is a classic example justification for using probability in A.I. (I believe it is given in AIMA). At a basic rigorous level probabilities are assigned to percepts, but we like to assign them to English statements as well (doing this reasonably is a version of one of the problems you mentioned). Modeling the relationships between strings of mathematical symbols probabilistically seems a bit more well justified than applying them to English statements if anything, since the truth/falsehood of provability is well-defined in all cases*. Pragmatically, I think I do assign a probabilities to mathematical statements being true/provable when I am doing research, and I am not conscious of this leading me astray!
*the truth of statements independent of (say) ZFC is a bit more of a philosophical quagmire, though it still seems that assigning probabilities to provable/disprovable/independent is a pretty safe practice. This might also be a use case for semimeasures as defective probabilities.
Here’s an explanation that may help.
You can think of classical Bayesian reasoning as justified by Dutch Book arguments. However, for a Dutch Book argument to be convincing, there’s an important condition that we need: the bookie needs to be just as ignorant as the agent. If the bookie makes money off the agent because the bookie knows an insider secret about the horse race, we don’t think of this as “irrational” on the part of the agent.
This assumption is typically packaged into the part of a Dutch Book argument where we say the Dutch Book “guarantees a net loss”—if the bookie is using insider knowledge, then it’s not a “guarantee” of a net loss. This “guarantee” needs to be made with respect to all the ways things could empirically turn out.
However, this distinction becomes fuzzier when we consider embedded agency, and in particular, computational uncertainty. If the agent has observed the length of two sides of a right triangle, then it is possible to compute the length of the remaining side. Should we say, on the one hand, that there is a Dutch Book against agents who do not correctly compute this third length? Or should we complain that a bookie who has completed the computation has special insider knowledge, which our agent may lack due to not having completed the computation?
If we bite the “no principled distinction” bullet, we can develop a theory where we learn to avoid making logical mistakes (such as classical Dutch Books, or the triangle example) in exactly the same manner that we learn to avoid empirical mistakes (such as learning that the sun rises every morning). Instead of getting a guarantee that we never give in to a Dutch Book, we get a bounded-violations guarantee; we can only lose so much money that way before we wise up.
In this example, if I knew the Pythagorean theorem and had performed the calculation, I would be certain of the right answer. If I were not able to perform the calculation because of logical uncertainty (say the numbers were large) then relative to my current state of knowledge I could avoid dutch books by assigning probabilities to side lengths. This would make me impossible to money pump in the sense of cyclical preferences. The fact that I could gamble more wisely if I had access to more computation doesn’t seem to undercut the reasons for using probabilities when I don’t.
Now in the extreme adversarial case, a bookie could come along who knows my computational limits and only offers me bets where I lose in expectation. But this is also a problem for empirical uncertainty; in both cases, if you literally face a bookie who is consistently winning money from you, you could eventually infer that they know more than you and stop accepting their bets. I still see no fundamental difference between empirical and logical uncertainties.
I am not trying to undercut the use of probability in the broad sense of using numbers to represent degrees of belief.
However, if “probability” means “the kolmogorov axioms”, we can easily undercut these by the argument you mention: we can consider a (quite realistic!) case where we don’t have enough computational power to enforce the kolmogorov axioms precisely. We conclude that we should avoid easily-computed dutch books, but may be vulnerable to some hard-to-compute dutch books.
Yes, exactly. In the perspective I am offering, the only difference between bookies who we stop betting with due to a history of losing money, vs bookies we stop betting with due to a priori knowing better, is that the second kind corresponds to something we already knew (already had high prior weight on).
In the classical story, however, there are bookies we avoid a priori as a matter of logic alone (we could say that the classical perspective insists that the kolmogorov axioms are known a priori—which is completely fine and good if you’ve got the computational power to do it).