I’m increasingly worried that philosophers tend to underestimate the difficulty of philosophy. I’ve previously criticized Eliezer for this, but it seems to be a more general phenomenon.
Observations:
Low expressed interest in metaphilosophy (in relation to either AI or humans)
Low expressed interest in AI philosophical competence (either concern that it might be low, or desire/excitement for supercompetent AI philosophers with Jupiter-sized brains)
Low concern that philosophical difficulty will be a blocker of AI alignment or cause of AI risk
High confidence when proposing novel solutions (even to controversial age-old questions, and when the proposed solution fails to convince many)
Rarely attacking one’s own ideas (in a serious or sustained way) or changing one’s mind based on others’ arguments
Rarely arguing for uncertainty/confusion (i.e., that that’s the appropriate epistemic status on a topic), with normative ethics being a sometime exception
Possible explanations:
General human overconfidence
People who have a high estimate of difficulty of philosophy self-selecting out of the profession.
Academic culture/norms—no or negative rewards for being more modest or expressing confusion. (Moral uncertainty being sometimes expressed because one can get rewarded by proposing some novel mechanism for dealing with it.)
Philosophy is frequently (probably most of the time) done in order to signal group membership rather than as an attempt to accurately model the world. Just look at political philosophy or philosophy of religion. Most of the observations you note can be explained by philosophers operating at simulacrum level 3 instead of level 1.
“Signal group membership” may be true of the fields you mentioned (political philosophy and philosophy of religion), but seems false of many other fields such as philosophy of math, philosophy of mind, decision theory, anthropic reasoning. Hard to see what group membership someone is signaling by supporting one solution to Sleeping Beauty vs another, for example.
Here are some axes along which I think there’s some group membership signaling in philosophy (IDK about the extent and it’s hard to disentangle it from other stuff):
Math: platonism/intuitionism/computationalism (i.e. what is math?), interpretations of probability, foundations of math (set theory vs univalent foundations)
Mind: externalism/internalism (about whatever), consciousness (de-facto-dualisms (e.g. Chalmers) vs reductive realism vs illusionism), language of thought vs 4E cognition, determinism vs compatibilism vs voluntarism
Metaphysics/ontology: are chairs, minds, and galaxies real? (this is somewhat value-laden for many people)
Biology: gene’s-eye-view/modern synthesis vs extended evolutionary synthesis
I don’t think this is accurate, I think most philosophy is done under motivated reasoning but is not straightforwardly about signaling group membership
I think most academic philosophers take the difficult of philosophy quite seriously. Metaphilosophy is a flourishing subfield of philosophy; you can find recent papers on the topic here https://philpapers.org/browse/metaphilosophy. There is also a growing group of academic philosophers working on AI safety and alignment; you can find some recent work here https://link.springer.com/collections/cadgidecih. I think that sometimes the tone of specific papers sounds confident; but that is more stylistic convention than a reflection of the underlying credences. Finally, I think that uncertainty / decision theory is a persistent theme in recent philosophical work on AI safety and other issues in philosophy of AI; see for example this paper, which is quite sensitive to issues about chances of welfare https://link.springer.com/article/10.1007/s43681-023-00379-1.
Thank you for your view from inside academia. Some questions to help me get a better sense of what you see:
Do you know any philosophers who switched from non-meta-philosophy to metaphilosophy because they become convinced that the problems they were trying to solve are too hard and they needed to develop a better understanding of philosophical reasoning or better intellectual tools in general? (Or what’s the closest to this that you’re aware of?)
Do you know any philosophers who have expressed an interest in ensuring that future AIs will be philosophically competent, or a desire/excitement for supercompetent AI philosophers? (I know 1 or 2 private expressions of the former, but not translated into action yet.)
Do you know any philosophers who are worried that philosophical problems involved in AI alignment/safety may be too hard to solve in time, and have called for something like an AI pause to give humanity more time to solve them? (Even philosophers who have expressed a concern about AI x-risk or are working on AI safety have not taken a position like this, AFAIK.)
How often have you seen philosophers say something like “Upon further reflection, my proposed solution to problem X has many problems/issues, I’m no longer confident it’s the right approach and now think X is much harder than I originally thought.”
Would also appreciate any links/citations/quotes (if personal but sharable communications) on these.
These are all things I’ve said or done due to high estimate of philosophical difficulty, but not (or rarely) seen among academic philosophers, at least from my casual observation from outside academia. It’s also possible that we disagree on what estimate of philosophical difficulty is appropriate (such that for example you don’t think philosophers should often say or do these things), which would also be interesting to know.
Another academic philosopher, directed here by @Simon Goldstein. Hello Wei!
It’s not common to switch entirely to metaphilosophy, but I think lots of us get more interested in the foundations and methodology of at least our chosen subfields as we gain experience, see where progress is(n’t) being made, start noticing deep disagreements about the quality of different kinds of work, and so on. It seems fair to describe this as awakening to a need for better tools and a greater understanding of methods. I recently wrote a paper about the methodology of one of my research areas, philosophy of mathematical practice, for pretty much these reasons.
Current LLMs are pretty awful at discussing the recent philosophy literature, so I think anyone who’d like AI tools to serve as useful research assistants would be happy to see at least some improvement here! I’m personally also excited about the prospects of using language models with bigger context windows for better corpus analysis work in empirical and practice-oriented parts of philosophy.
I basically agree with Simon on this.
I don’t think this is uncommon. You might not see these reversals in print often, because nobody wants to publish and few people want to read a paper that just says “I retract my previous claims and no longer have a confident positive view to offer”. But my sense is that philosophers often give up on projects because the problems are piling up and they no longer see an appealing way forward. Sometimes this happens more publicly. Hilary Putnam, one of the most influential philosophers of the later 20th century, was famous for changing his mind about scientific realism and other basic metaphysical issues. Wesley Salmon gave up his influential “mark transmission” account of causal explanation due to counterexamples raised by Kitcher (as you can read here). It would be easy enough to find more examples.
Great questions. Sadly, I don’t have any really good answers for you.
I don’t know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics.
I do not, except for the end of Superintelligence
Many of the philosophers I know who work on AI safety would love for there to be an AI pause, in part because they think alignment is very difficult. But I don’t know if any of us have explicitly called for an AI pause, in part because it seems useless, but may have opportunity cost.
I think few of my friends in philosophy have ardently abandoned a research project they once pursued because they decided it wasn’t the right approach. I suspect few researchers do that. In my own case, I used to work in an area called ‘dynamic semantics’, and one reason I’ve stopped working on that research project is that I became pessimistic that it had significant advantages over its competitors.
Sadly, I don’t have any really good answers for you.
Thanks, it’s actually very interesting and important information.
I don’t know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics.
I’ve noticed (and stated in the OP) that normative ethics seems to be an exception where it’s common to express uncertainty/confusion/difficulty. But I think, from both my inside and outside views, that this should be common in most philosophical fields (because e.g. we’ve been trying to solve them for centuries without coming up with broadly convincing solutions), and there should be a steady stream of all kinds of philosophers going up the meta ladder all the way to metaphilosophy. It recently dawned on me that this doesn’t seem to be the case.
Many of the philosophers I know who work on AI safety would love for there to be an AI pause, in part because they think alignment is very difficult. But I don’t know if any of us have explicitly called for an AI pause, in part because it seems useless, but may have opportunity cost.
What seems useless, calling for an AI pause, or the AI pause itself? Have trouble figuring out because if “calling for an AI pause”, what is the opportunity cost (seems easy enough to write or sign an open letter), and if “AI pause itself”, “seems useless” contradicts “would love”. In either case, this seems extremely important to openly discuss/debate! Can you please ask these philosophers to share their views of this on LW (or their preferred venue), and share your own views?
FTR I’d probably be up for helping out logistically with such an open letter (e.g. making the website and any other parts of it). I previously made this open letter.
Sorry for being unclear, I meant that calling for a pause seems useless because it won’t happen. I think calling for the pause has opportunity cost because of limited attention and limited signalling value; reputation can only be used so many times; better to channel pressure towards asks that could plausibly get done.
I think there’s a steady stream of philosophy getting interested in various questions in metaphilosophy
Thanks for this info and the references. I guess by “metaphilosophy” I meant something more meta than metaethics or metaepistemology, i.e., a field that tries to understand all philosophical reasoning in some unified or systematic way, including reasoning used in metaethics and metaepistemology, and metaphilosophy itself. (This may differ from standard academic terminology, in which case please let me know if there’s a preferred term for the concept I’m pointing at.) My reasoning being that metaethics itself seems like a hard problem that has defied solution for centuries, so why stop there instead of going even more meta?
Sorry for being unclear, I meant that calling for a pause seems useless because it won’t happen.
I think you (and other philosophers) may be too certain that a pause won’t happen, but I’m not sure I can convince you (at least not easily). What about calling for it in a low cost way, e.g., instead of doing something high profile like an open letter (with perceived high opportunity costs), just write a blog post or even a tweet saying that you wish for an AI pause, because …? What if many people privately prefer an AI pause, but nobody knows because nobody says anything? What if by keeping silent, you’re helping to keep society in a highly suboptimal equilibrium?
I think there are also good arguments for doing something like this from a deontological or contractualist perspective (i.e. you have a duty/obligation to honestly and publicly report your beliefs on important matters related to your specialization), which sidestep the “opportunity cost” issue, but I’m not sure if you’re open to that kind of argument. I think they should have some weight given moral uncertainty.
Hm. I think modern academic philosophy is a raging shitshow, but I thought philosophy on LW was quite good. I haven’t been a regular LW user until a couple of years ago, and the philosophical takes here, particularly Eliezer’s, converge with my own conclusions after a half lifetime of looking at philosophical questions through the lens of science, particularly neuroscience and psychology.
So: what do you see as the limitations in LW/Yudkowskian philosophy? Perhaps I’ve overlooked them.
I am currently skeptical that we need better philosophy for good AGI outcomes, vs. better practical work on technical AGI alignment (a category that barely exists) and PR work to put the likely personal intent aligned AGI into the hands of people that give half a crap about understanding or implementing ethics. Deciding on the long term future will be a matter of a long contemplation if we get AGI into good hands. We should decide if that logic is right, and if so, plan the victory party after we’ve won the war.
I did read your metaphilosophy post and remain unconvinced that there’s something big the rest of us are missing.
I’m happy to be corrected (I love becoming less wrong, and I’m aware of many of my biases that might prevent it):
Here’s how it currently looks to me: Ethics are ultimately a matter of preference, the rest is game theory and science (including the science of human preferences). Philosophical questions boil down to scientific questions in most cases, so epistemology is metaphilosophy for the most part.
Change my mind! Seriously, I’ll listen. It’s been years since I’ve thought about philosophy hard.
I was just reading Daniel Dennett’s memoir for no reason in particular, it had some interesting glimpses into how professional philosophers actually practice philosophy. Like I guess there’s a thing where one person reads their paper (word-for-word!) and then someone else is the designated criticizer? I forget the details. Extremely different from my experience in physics academia though!!
(Obviously, reading that memoir is probably not the most time-efficient way to learn about the day-to-day practice of academic philosophy.)
(Oh, there was another funny anecdote in the memoir where the American professional philosopher association basically had a consensus against some school of philosophy, and everyone was putting it behind them and moving on, but then there was a rebellion where the people who still liked that school of philosophy did a hostile takeover of the association’s leadership!)
Academic culture/norms—no or negative rewards for being more modest or expressing confusion. (Moral uncertainty being sometimes expressed because one can get rewarded by proposing some novel mechanism for dealing with it.)
A non-ethics example that jumps to my mind is David Chalmers on the Hard Problem of Consciousness here: “So if I’m giving my overall credences, I’m going to give, 10% to illusionism, 30% to panpsychism, 30% to dualism, and maybe the other 30% to, I don’t know what else could be true, but maybe there’s something else out there.” That’s the only example I can think of but I read very very little philosophy.
I genuinely don’t know what you want elaboration of. Reacts are nice for what they are, but saying something out loud about what you want to hear more about / what’s confusing / what you did and didn’t understand/agree with, is more helpful.
Re/ “to whom not...”, I’m asking Wei: what groups of people would not be described by the list of 6 “underestimating the difficult of philosophy” things? It seems to me that broadly, EAs and “AI alignment” people tend to favor somewhat too concrete touchpoints like “well, suppressing revolts in the past has gone like such and such, so we should try to do similar for AGI”. And broadly they don’t credit an abstract argument about why something won’t work, or would only work given substantial further philosophical insight.
Re/ “don’t think thinking …”, well, if I say “LLMs basically don’t think”, they’re like “sure it does, I can keep prompting it and it says more things, and I can even put that in a scaffold” or “what concrete behavior can you point to that it can’t do”. Like, bro, I’m saying it can’t think. That’s the tweet. What thinking is, isn’t clear, but That thinking is should be presumed, pending a forceful philosophical conceptual replacement!
That is, in fact, a helpful elaboration! When you said
Most people who “work on AI alignment” don’t even think that thinking is a thing.
my leading hypotheses for what you could mean were:
Using thought, as a tool, has not occured to most such people
Most such people have no concept whatsoever of cognition as being a thing, the way people in the year 1000 had no concept whatsoever of javascript being a thing.
Now, instead, my leading hypothesis is that you mean:
Most such people are failing to notice that there’s an important process, called “thinking”, which humans do but LLMs “basically” don’t do.
This is a bunch more precise! For one, it mentions AIs at all.
As my reacts hopefully implied, this is exactly the kind of clarification I needed—thanks!
Like, bro, I’m saying it can’t think. That’s the tweet. What thinking is, isn’t clear, but That thinking is should be presumed, pending a forceful philosophical conceptual replacement!
Sure, but you’re not preaching to the choir at that point. So surely the next step in that particular dance is to stick a knife in the crack and twist?
That is -
“OK, buddy:
Here’s property P (and if you’re good, Q and R and...) that [would have to]/[is/are obviously natural and desirable to]/[is/are pretty clearly a critical part if you want to] characterize ‘thought’ or ‘reasoning’ as distinct from whatever it is LLMs do when they read their own notes as part of a new prompt and keep chewing them up and spitting the result back as part of the new prompt for itself to read.
Here’s thing T (and if you’re good, U and V and...) that an LLM cannot actually do, even in principle, which would be trivially easy for (say) an uploaded (and sane, functional, reasonably intelligent) human H could do, even if H is denied (almost?) all of their previously consolidated memories and just working from some basic procedural memory and whatever Magical thing this ‘thinking’/‘reasoning’ thing is.”
And if neither you nor anyone else can do either of those things… maybe it’s time to give up and say that this ‘thinking’/‘reasoning’ thing is just philosophically confused? I don’t think that that’s where we’re headed, but I find it important to explicitly acknowledge the possibility; I don’t deal in more than one epiphenomenon at a time and I’m partial to Platonism already. So if this ‘reasoning’ thing isn’t meaningfully distinguishable in some observable way from what LLMs do, why shouldn’t I simply give in?
I’ve had this tweet pinned to my Twitter profile for a while, hoping to find some like-minded people, but with 13k views so far I’ve yet to get a positive answer (or find someone expressing this sentiment independently):
Among my first reactions upon hearing “artificial superintelligence” were “I can finally get answers to my favorite philosophical problems” followed by “How do I make sure the ASI actually answers them correctly?”
Anyone else reacted like this?
This aside, there are some people around LW/rationality who seem more cautious/modest/self-critical about proposing new philosophical solutions, like MIRI’s former Agent Foundations team, but perhaps partly as a result of that, they’re now out of a job!
Having worked on some of the problems myself (e.g. decision theory), I think the underlying problems are just very hard. Why do you think they could have done “so much more, much more intently, and much sooner”?
The type of fundamental problem that proper speculative philosophy is supposed to solve is the sort where streetlighting doesn’t work (or isn’t working, or isn’t working fast enough). But nearly all of the alignment field after like 2004 was still basically streetlighting. It was maybe a reasonable thing to have some hope in prospectively, but retrospectively it was too much investment in streetlighting, and retrospectively I can make arguments about why one should have maybe guessed that at the time. By 2018 IIRC, or certainly by 2019, I was vociferously arguing for that in AF team meetings—but the rest of the team either disagreed with me or didn’t understand me, and on my own I’m just not that good a thinker, and I didn’t find anyone else to try it with. I think they have good thoughts, but are nevertheless mostly streetlighting—i.e. not trying to take step after step of thinking at the level of speculative philosophy AND aimed at getting the understanding needed for alignment.
My understanding of what happened (from reading this) is that you wanted to explore in a new direction very different from the then preferred approach of the AF team, but couldn’t convince them (or someone else) to join you. To me this doesn’t clearly have much to do with streetlighting, and my current guess is that it was probably reasonable of them to not be convinced. It was also perfectly reasonable of you to want to explore a different approach, but it seems unreasonable to claim without giving any details that it would have produced better results if only they had listened to you. (I mean you can claim this, but why should I believe you?)
If you disagree (and want to explain more), maybe you could either explain the analogy more fully (e.g., what corresponds to the streetlight, why should I believe that they overexplored the lighted area, what made you able to “see in the dark” to pick out a more promising search area or did you just generally want to explore the dark more) and/or try to convince me on the object level / inside view that your approach is or was more promising?
(Also perfectly fine to stop here if you want. I’m pretty curious on both the object and meta levels about your thoughts on AF, but you may not have wanted to get into such a deep discussion when you first joined this thread.)
Ok, so, there’s this thing about AGI killing everyone. And there’s this idea of avoiding that by making AGI that’s useful like an AGI but doesn’t kill everyone and does stuff we like. And you say you’re working on that, or want to work on that. And what you’re doing day to day is {some math thing, some programming thing, something about decision theory, …}. What is the connection between these things?
and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into is something like:
Well, I don’t know what to do to make aligned AI. But it seems like X ϵ {ontology, decision, preference function, NN latent space, logical uncertainty, reasoning under uncertainty, training procedures, negotiation, coordination, interoperability, planning, …} is somehow relevant.
And, I have a formalized version of some small aspect of X in which is mathematically interesting / philosophically intriguing / amenable to testing with a program, and which seems like it’s kinda related to X writ large. So what I’m going to do, is I’m going to tinker with this formalized version for a week/month/year, and then I’m going to zoom out and think about how this relates to X, and what I have and haven’t learned, and so on.
This is a good strategy because this is how all mathematical / scientific / technological progress is made: you start with stuff you know; you expand outwards by following veins of interest, tractability, and generality/power; you keep an eye roughly towards broader goals by selecting the broad region you’re in; and you build outward. What we see historically is that this process tends to lead us to think about the central / key / important / difficult / general problems—such problems show up everywhere, so we convergently will come to address them in due time. By mostly sticking, in our day-to-day work, to things that are relatively more concrete and tractable—though continually pushing and building toward difficult things—we make forward progress, sharpen our skills, and become familiar with the landscape of concepts and questions.
So I would summarize that position as endorsing streetlighting, in a very broad sense that encompasses most math / science / technology. And this position is largely correct! My claim is that
this is probably too slow for making Friendly AI, and
maybe one could go faster by trying to more directly cleave to the core philosophical problems.
(But note that, while that essay frames things as “a proposed solution”, the solution is barely anything—more like a few guesses at pieces of methodology—and the main point is the discussion of the problem; maybe a writing mistake.)
An underemphasized point that I should maybe elaborate more on: a main claim is that there’s untapped guidance to be gotten from our partial understanding—at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there’s a lot of progress to be made by having them talk to each other, so to speak.
[2.] maybe one could go faster by trying to more directly cleave to the core philosophical problems.
...
An underemphasized point that I should maybe elaborate more on: a main claim is that there’s untapped guidance to be gotten from our partial understanding—at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there’s a lot of progress to be made by having them talk to each other, so to speak.
OK but what would this even look like?\gen
Toss away anything amenable to testing and direct empirical analysis; it’s all too concrete and model-dependent.
Toss away mathsy proofsy approaches; they’re all too formalized and over-rigid and can only prove things from starting assumptions we haven’t got yet and maybe won’t think of in time.
Toss away basically all settled philosophy, too; if there were answers to be had there rather than a few passages which ask correct questions, the Vienna Circle would have solved alignment for us.
What’s left? And what causes it to hang together? And what causes it not to vanish up its own ungrounded self-reference?
What makes you think there are any such ‘answers’, renderable in a form that you could identify?
And even if they do exist, why do you think a human being could fully grasp the explanation in finite time?
Edit: It seems quite possible that even the simplest such ‘answers’ could require many years of full time effort to understand, putting it beyond most if not all human memory capacity. i.e. By the end even those who ‘learned’ it will have forgotten many parts near the beginning.
(Upvoted since your questions seem reasonable and I’m not sure why you got downvoted.)
I see two ways to achieve some justifiable confidence in philosophical answers produced by superintelligent AI:
Solve metaphilosophy well enough that we achieve an understanding of philosophical reasoning on par with mathematical reason, and have ideas/systems analogous to formal proofs and mechanical proof checkers that we can use to check the ASI’s arguments.
We increase our own intelligence and philosophical competence until we can verify the ASI’s reasoning ourselves.
I blame science, math, engineering, entrepreneurship. Philosophy is the practice of the esoteric method, meaning it can’t be made truly legible for very long stretches of investigation. This results in accumulation of anti-epistemic hazards, which science doesn’t particularly need to have tools for dealing with, because it can filter its reasoning through frequent transitions into legibility.
Philosophy can’t rely on such filtering through legibility, it has to maintain sanity the hard way. But as philosophy enviously looks at the more successful endeavors of science, it doesn’t see respect for such methods of maintaining sanity in its reasoning, instead it sees that merely moving fast and breaking things works very well. And so the enthusiasm for their development wanes, instead philosophy remains content with the object level questions that investigate particular truths, rather than methods for getting better at telling which cognitive algorithms can more robustly arrive at truths (rationality, metaphilosophy).
I’m increasingly worried that philosophers tend to underestimate the difficulty of philosophy. I’ve previously criticized Eliezer for this, but it seems to be a more general phenomenon.
Observations:
Low expressed interest in metaphilosophy (in relation to either AI or humans)
Low expressed interest in AI philosophical competence (either concern that it might be low, or desire/excitement for supercompetent AI philosophers with Jupiter-sized brains)
Low concern that philosophical difficulty will be a blocker of AI alignment or cause of AI risk
High confidence when proposing novel solutions (even to controversial age-old questions, and when the proposed solution fails to convince many)
Rarely attacking one’s own ideas (in a serious or sustained way) or changing one’s mind based on others’ arguments
Rarely arguing for uncertainty/confusion (i.e., that that’s the appropriate epistemic status on a topic), with normative ethics being a sometime exception
Possible explanations:
General human overconfidence
People who have a high estimate of difficulty of philosophy self-selecting out of the profession.
Academic culture/norms—no or negative rewards for being more modest or expressing confusion. (Moral uncertainty being sometimes expressed because one can get rewarded by proposing some novel mechanism for dealing with it.)
Philosophy is frequently (probably most of the time) done in order to signal group membership rather than as an attempt to accurately model the world. Just look at political philosophy or philosophy of religion. Most of the observations you note can be explained by philosophers operating at simulacrum level 3 instead of level 1.
“Signal group membership” may be true of the fields you mentioned (political philosophy and philosophy of religion), but seems false of many other fields such as philosophy of math, philosophy of mind, decision theory, anthropic reasoning. Hard to see what group membership someone is signaling by supporting one solution to Sleeping Beauty vs another, for example.
Here are some axes along which I think there’s some group membership signaling in philosophy (IDK about the extent and it’s hard to disentangle it from other stuff):
Math: platonism/intuitionism/computationalism (i.e. what is math?), interpretations of probability, foundations of math (set theory vs univalent foundations)
Mind: externalism/internalism (about whatever), consciousness (de-facto-dualisms (e.g. Chalmers) vs reductive realism vs illusionism), language of thought vs 4E cognition, determinism vs compatibilism vs voluntarism
Metaphysics/ontology: are chairs, minds, and galaxies real? (this is somewhat value-laden for many people)
Biology: gene’s-eye-view/modern synthesis vs extended evolutionary synthesis
I don’t think this is accurate, I think most philosophy is done under motivated reasoning but is not straightforwardly about signaling group membership
I think most academic philosophers take the difficult of philosophy quite seriously. Metaphilosophy is a flourishing subfield of philosophy; you can find recent papers on the topic here https://philpapers.org/browse/metaphilosophy. There is also a growing group of academic philosophers working on AI safety and alignment; you can find some recent work here https://link.springer.com/collections/cadgidecih. I think that sometimes the tone of specific papers sounds confident; but that is more stylistic convention than a reflection of the underlying credences. Finally, I think that uncertainty / decision theory is a persistent theme in recent philosophical work on AI safety and other issues in philosophy of AI; see for example this paper, which is quite sensitive to issues about chances of welfare https://link.springer.com/article/10.1007/s43681-023-00379-1.
Thank you for your view from inside academia. Some questions to help me get a better sense of what you see:
Do you know any philosophers who switched from non-meta-philosophy to metaphilosophy because they become convinced that the problems they were trying to solve are too hard and they needed to develop a better understanding of philosophical reasoning or better intellectual tools in general? (Or what’s the closest to this that you’re aware of?)
Do you know any philosophers who have expressed an interest in ensuring that future AIs will be philosophically competent, or a desire/excitement for supercompetent AI philosophers? (I know 1 or 2 private expressions of the former, but not translated into action yet.)
Do you know any philosophers who are worried that philosophical problems involved in AI alignment/safety may be too hard to solve in time, and have called for something like an AI pause to give humanity more time to solve them? (Even philosophers who have expressed a concern about AI x-risk or are working on AI safety have not taken a position like this, AFAIK.)
How often have you seen philosophers say something like “Upon further reflection, my proposed solution to problem X has many problems/issues, I’m no longer confident it’s the right approach and now think X is much harder than I originally thought.”
Would also appreciate any links/citations/quotes (if personal but sharable communications) on these.
These are all things I’ve said or done due to high estimate of philosophical difficulty, but not (or rarely) seen among academic philosophers, at least from my casual observation from outside academia. It’s also possible that we disagree on what estimate of philosophical difficulty is appropriate (such that for example you don’t think philosophers should often say or do these things), which would also be interesting to know.
Another academic philosopher, directed here by @Simon Goldstein. Hello Wei!
It’s not common to switch entirely to metaphilosophy, but I think lots of us get more interested in the foundations and methodology of at least our chosen subfields as we gain experience, see where progress is(n’t) being made, start noticing deep disagreements about the quality of different kinds of work, and so on. It seems fair to describe this as awakening to a need for better tools and a greater understanding of methods. I recently wrote a paper about the methodology of one of my research areas, philosophy of mathematical practice, for pretty much these reasons.
Current LLMs are pretty awful at discussing the recent philosophy literature, so I think anyone who’d like AI tools to serve as useful research assistants would be happy to see at least some improvement here! I’m personally also excited about the prospects of using language models with bigger context windows for better corpus analysis work in empirical and practice-oriented parts of philosophy.
I basically agree with Simon on this.
I don’t think this is uncommon. You might not see these reversals in print often, because nobody wants to publish and few people want to read a paper that just says “I retract my previous claims and no longer have a confident positive view to offer”. But my sense is that philosophers often give up on projects because the problems are piling up and they no longer see an appealing way forward. Sometimes this happens more publicly. Hilary Putnam, one of the most influential philosophers of the later 20th century, was famous for changing his mind about scientific realism and other basic metaphysical issues. Wesley Salmon gave up his influential “mark transmission” account of causal explanation due to counterexamples raised by Kitcher (as you can read here). It would be easy enough to find more examples.
Great questions. Sadly, I don’t have any really good answers for you.
I don’t know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics.
I do not, except for the end of Superintelligence
Many of the philosophers I know who work on AI safety would love for there to be an AI pause, in part because they think alignment is very difficult. But I don’t know if any of us have explicitly called for an AI pause, in part because it seems useless, but may have opportunity cost.
I think few of my friends in philosophy have ardently abandoned a research project they once pursued because they decided it wasn’t the right approach. I suspect few researchers do that. In my own case, I used to work in an area called ‘dynamic semantics’, and one reason I’ve stopped working on that research project is that I became pessimistic that it had significant advantages over its competitors.
The FLI Pause letter didn’t achieve a pause, but it dramatically shifted the Overton Window.
Thanks, it’s actually very interesting and important information.
I’ve noticed (and stated in the OP) that normative ethics seems to be an exception where it’s common to express uncertainty/confusion/difficulty. But I think, from both my inside and outside views, that this should be common in most philosophical fields (because e.g. we’ve been trying to solve them for centuries without coming up with broadly convincing solutions), and there should be a steady stream of all kinds of philosophers going up the meta ladder all the way to metaphilosophy. It recently dawned on me that this doesn’t seem to be the case.
What seems useless, calling for an AI pause, or the AI pause itself? Have trouble figuring out because if “calling for an AI pause”, what is the opportunity cost (seems easy enough to write or sign an open letter), and if “AI pause itself”, “seems useless” contradicts “would love”. In either case, this seems extremely important to openly discuss/debate! Can you please ask these philosophers to share their views of this on LW (or their preferred venue), and share your own views?
FTR I’d probably be up for helping out logistically with such an open letter (e.g. making the website and any other parts of it). I previously made this open letter.
I think there’s a steady stream of philosophy getting interested in various questions in metaphilosophy; metaethics is just the most salient to me. One example is the recent trend towards conceptual engineering (https://philpapers.org/browse/conceptual-engineering). Metametaphysics has also gotten a lot of attention in the last 10-20 years https://www.oxfordbibliographies.com/display/document/obo-9780195396577/obo-9780195396577-0217.xml. There is also some recent work in metaepistemology, but maybe less so because the debates tend to recapitulate previous work in metaethics https://plato.stanford.edu/entries/metaepistemology/.
Sorry for being unclear, I meant that calling for a pause seems useless because it won’t happen. I think calling for the pause has opportunity cost because of limited attention and limited signalling value; reputation can only be used so many times; better to channel pressure towards asks that could plausibly get done.
Thanks for this info and the references. I guess by “metaphilosophy” I meant something more meta than metaethics or metaepistemology, i.e., a field that tries to understand all philosophical reasoning in some unified or systematic way, including reasoning used in metaethics and metaepistemology, and metaphilosophy itself. (This may differ from standard academic terminology, in which case please let me know if there’s a preferred term for the concept I’m pointing at.) My reasoning being that metaethics itself seems like a hard problem that has defied solution for centuries, so why stop there instead of going even more meta?
I think you (and other philosophers) may be too certain that a pause won’t happen, but I’m not sure I can convince you (at least not easily). What about calling for it in a low cost way, e.g., instead of doing something high profile like an open letter (with perceived high opportunity costs), just write a blog post or even a tweet saying that you wish for an AI pause, because …? What if many people privately prefer an AI pause, but nobody knows because nobody says anything? What if by keeping silent, you’re helping to keep society in a highly suboptimal equilibrium?
I think there are also good arguments for doing something like this from a deontological or contractualist perspective (i.e. you have a duty/obligation to honestly and publicly report your beliefs on important matters related to your specialization), which sidestep the “opportunity cost” issue, but I’m not sure if you’re open to that kind of argument. I think they should have some weight given moral uncertainty.
Hm. I think modern academic philosophy is a raging shitshow, but I thought philosophy on LW was quite good. I haven’t been a regular LW user until a couple of years ago, and the philosophical takes here, particularly Eliezer’s, converge with my own conclusions after a half lifetime of looking at philosophical questions through the lens of science, particularly neuroscience and psychology.
So: what do you see as the limitations in LW/Yudkowskian philosophy? Perhaps I’ve overlooked them.
I am currently skeptical that we need better philosophy for good AGI outcomes, vs. better practical work on technical AGI alignment (a category that barely exists) and PR work to put the likely personal intent aligned AGI into the hands of people that give half a crap about understanding or implementing ethics. Deciding on the long term future will be a matter of a long contemplation if we get AGI into good hands. We should decide if that logic is right, and if so, plan the victory party after we’ve won the war.
I did read your metaphilosophy post and remain unconvinced that there’s something big the rest of us are missing.
I’m happy to be corrected (I love becoming less wrong, and I’m aware of many of my biases that might prevent it):
Here’s how it currently looks to me: Ethics are ultimately a matter of preference, the rest is game theory and science (including the science of human preferences). Philosophical questions boil down to scientific questions in most cases, so epistemology is metaphilosophy for the most part.
Change my mind! Seriously, I’ll listen. It’s been years since I’ve thought about philosophy hard.
I was just reading Daniel Dennett’s memoir for no reason in particular, it had some interesting glimpses into how professional philosophers actually practice philosophy. Like I guess there’s a thing where one person reads their paper (word-for-word!) and then someone else is the designated criticizer? I forget the details. Extremely different from my experience in physics academia though!!
(Obviously, reading that memoir is probably not the most time-efficient way to learn about the day-to-day practice of academic philosophy.)
(Oh, there was another funny anecdote in the memoir where the American professional philosopher association basically had a consensus against some school of philosophy, and everyone was putting it behind them and moving on, but then there was a rebellion where the people who still liked that school of philosophy did a hostile takeover of the association’s leadership!)
A non-ethics example that jumps to my mind is David Chalmers on the Hard Problem of Consciousness here: “So if I’m giving my overall credences, I’m going to give, 10% to illusionism, 30% to panpsychism, 30% to dualism, and maybe the other 30% to, I don’t know what else could be true, but maybe there’s something else out there.” That’s the only example I can think of but I read very very little philosophy.
What are the issues that are “difficult” in philosophy, in your opinion? What makes them difficult?
I remember you and others talking about the need to “solve philosophy”, but I was never sure what it meant by that.
To whom does this not apply? Most people who “work on AI alignment” don’t even think that thinking is a thing.
@Nate Showell @P. @Tetraspace @Joseph Miller @Lorxus
I genuinely don’t know what you want elaboration of. Reacts are nice for what they are, but saying something out loud about what you want to hear more about / what’s confusing / what you did and didn’t understand/agree with, is more helpful.
Re/ “to whom not...”, I’m asking Wei: what groups of people would not be described by the list of 6 “underestimating the difficult of philosophy” things? It seems to me that broadly, EAs and “AI alignment” people tend to favor somewhat too concrete touchpoints like “well, suppressing revolts in the past has gone like such and such, so we should try to do similar for AGI”. And broadly they don’t credit an abstract argument about why something won’t work, or would only work given substantial further philosophical insight.
Re/ “don’t think thinking …”, well, if I say “LLMs basically don’t think”, they’re like “sure it does, I can keep prompting it and it says more things, and I can even put that in a scaffold” or “what concrete behavior can you point to that it can’t do”. Like, bro, I’m saying it can’t think. That’s the tweet. What thinking is, isn’t clear, but That thinking is should be presumed, pending a forceful philosophical conceptual replacement!
That is, in fact, a helpful elaboration! When you said
my leading hypotheses for what you could mean were:
Using thought, as a tool, has not occured to most such people
Most such people have no concept whatsoever of cognition as being a thing, the way people in the year 1000 had no concept whatsoever of javascript being a thing.
Now, instead, my leading hypothesis is that you mean:
Most such people are failing to notice that there’s an important process, called “thinking”, which humans do but LLMs “basically” don’t do.
This is a bunch more precise! For one, it mentions AIs at all.
As my reacts hopefully implied, this is exactly the kind of clarification I needed—thanks!
Sure, but you’re not preaching to the choir at that point. So surely the next step in that particular dance is to stick a knife in the crack and twist?
That is -
And if neither you nor anyone else can do either of those things… maybe it’s time to give up and say that this ‘thinking’/‘reasoning’ thing is just philosophically confused? I don’t think that that’s where we’re headed, but I find it important to explicitly acknowledge the possibility; I don’t deal in more than one epiphenomenon at a time and I’m partial to Platonism already. So if this ‘reasoning’ thing isn’t meaningfully distinguishable in some observable way from what LLMs do, why shouldn’t I simply give in?
I’ve had this tweet pinned to my Twitter profile for a while, hoping to find some like-minded people, but with 13k views so far I’ve yet to get a positive answer (or find someone expressing this sentiment independently):
This aside, there are some people around LW/rationality who seem more cautious/modest/self-critical about proposing new philosophical solutions, like MIRI’s former Agent Foundations team, but perhaps partly as a result of that, they’re now out of a job!
Yeah that was not my reaction. (More like “that’s going to be the most beautiful thing ever” and “I want to be that too”.)
No, if anything the job loss resulted from not doing so much more, much more intently, and much sooner.
Having worked on some of the problems myself (e.g. decision theory), I think the underlying problems are just very hard. Why do you think they could have done “so much more, much more intently, and much sooner”?
The type of fundamental problem that proper speculative philosophy is supposed to solve is the sort where streetlighting doesn’t work (or isn’t working, or isn’t working fast enough). But nearly all of the alignment field after like 2004 was still basically streetlighting. It was maybe a reasonable thing to have some hope in prospectively, but retrospectively it was too much investment in streetlighting, and retrospectively I can make arguments about why one should have maybe guessed that at the time. By 2018 IIRC, or certainly by 2019, I was vociferously arguing for that in AF team meetings—but the rest of the team either disagreed with me or didn’t understand me, and on my own I’m just not that good a thinker, and I didn’t find anyone else to try it with. I think they have good thoughts, but are nevertheless mostly streetlighting—i.e. not trying to take step after step of thinking at the level of speculative philosophy AND aimed at getting the understanding needed for alignment.
My understanding of what happened (from reading this) is that you wanted to explore in a new direction very different from the then preferred approach of the AF team, but couldn’t convince them (or someone else) to join you. To me this doesn’t clearly have much to do with streetlighting, and my current guess is that it was probably reasonable of them to not be convinced. It was also perfectly reasonable of you to want to explore a different approach, but it seems unreasonable to claim without giving any details that it would have produced better results if only they had listened to you. (I mean you can claim this, but why should I believe you?)
If you disagree (and want to explain more), maybe you could either explain the analogy more fully (e.g., what corresponds to the streetlight, why should I believe that they overexplored the lighted area, what made you able to “see in the dark” to pick out a more promising search area or did you just generally want to explore the dark more) and/or try to convince me on the object level / inside view that your approach is or was more promising?
(Also perfectly fine to stop here if you want. I’m pretty curious on both the object and meta levels about your thoughts on AF, but you may not have wanted to get into such a deep discussion when you first joined this thread.)
If you say to someone
and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into is something like:
So I would summarize that position as endorsing streetlighting, in a very broad sense that encompasses most math / science / technology. And this position is largely correct! My claim is that
this is probably too slow for making Friendly AI, and
maybe one could go faster by trying to more directly cleave to the core philosophical problems.
I discuss the problem more here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
(But note that, while that essay frames things as “a proposed solution”, the solution is barely anything—more like a few guesses at pieces of methodology—and the main point is the discussion of the problem; maybe a writing mistake.)
An underemphasized point that I should maybe elaborate more on: a main claim is that there’s untapped guidance to be gotten from our partial understanding—at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there’s a lot of progress to be made by having them talk to each other, so to speak.
OK but what would this even look like?\gen
Toss away anything amenable to testing and direct empirical analysis; it’s all too concrete and model-dependent.
Toss away mathsy proofsy approaches; they’re all too formalized and over-rigid and can only prove things from starting assumptions we haven’t got yet and maybe won’t think of in time.
Toss away basically all settled philosophy, too; if there were answers to be had there rather than a few passages which ask correct questions, the Vienna Circle would have solved alignment for us.
What’s left? And what causes it to hang together? And what causes it not to vanish up its own ungrounded self-reference?
From scratch but not from scratch. https://www.lesswrong.com/posts/noxHoo3XKkzPG6s7E/most-smart-and-skilled-people-are-outside-of-the-ea?commentId=DNvmP9BAR3eNPWGBa
https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
What makes you think there are any such ‘answers’, renderable in a form that you could identify?
And even if they do exist, why do you think a human being could fully grasp the explanation in finite time?
Edit: It seems quite possible that even the simplest such ‘answers’ could require many years of full time effort to understand, putting it beyond most if not all human memory capacity. i.e. By the end even those who ‘learned’ it will have forgotten many parts near the beginning.
(Upvoted since your questions seem reasonable and I’m not sure why you got downvoted.)
I see two ways to achieve some justifiable confidence in philosophical answers produced by superintelligent AI:
Solve metaphilosophy well enough that we achieve an understanding of philosophical reasoning on par with mathematical reason, and have ideas/systems analogous to formal proofs and mechanical proof checkers that we can use to check the ASI’s arguments.
We increase our own intelligence and philosophical competence until we can verify the ASI’s reasoning ourselves.
I blame science, math, engineering, entrepreneurship. Philosophy is the practice of the esoteric method, meaning it can’t be made truly legible for very long stretches of investigation. This results in accumulation of anti-epistemic hazards, which science doesn’t particularly need to have tools for dealing with, because it can filter its reasoning through frequent transitions into legibility.
Philosophy can’t rely on such filtering through legibility, it has to maintain sanity the hard way. But as philosophy enviously looks at the more successful endeavors of science, it doesn’t see respect for such methods of maintaining sanity in its reasoning, instead it sees that merely moving fast and breaking things works very well. And so the enthusiasm for their development wanes, instead philosophy remains content with the object level questions that investigate particular truths, rather than methods for getting better at telling which cognitive algorithms can more robustly arrive at truths (rationality, metaphilosophy).