Avoiding metaphysics means giving bad philosophy a free pass
I have noticed a culture among AI Alignment field builders, a certain reluctance to let conversations lead to metaphysical underpinnings. The framing and unsaid assumptions beneath models that predict the existential risk from AI is not explained, rather selection pressures strongly favor people who already grokked these ideas.
So I worry that this is leading to a homogeneity in the kind of people that are surviving the funnel to reach the point where they start contributing to the community—by deconfusing concepts, starting new research directions, communicating the risk to others etc. It seems worthwhile at this point to try and increase variance here.
This current strategy is totally understandable because people do not react favorably to their metaphysics being challenged. Philosophy needs careful rigorous thinking, it is extremely easy to get mired up in discussions that lead nowhere, with ill defined words thrown around regarding non falsifiable questions.
So most people choose to ignore these questions about consciousness, free will, qualia, sentience, etc and try to stick to the standard pitch. The standard pitch involves extrapolating that generally capable systems do not need any of these human-like properties to be dangerous. The bar is quite low to logically make a case for x risk from intelligent systems that mechanically follow goals.
But logic is not sufficient here, there are emotional knots to untangle, there are cognitive defense mechanisms in all of us which protect our sanity. These series of compromises that our psyche has set up to compartmentalize inconsistent beliefs need to be carefully undone, it is especially important to have empathy when spreading this meme.
All of our axioms are based on intuitions. The initial friction that I see when trying to communicate the existential threat from generally intelligent AI systems is that people have strong priors on the inherent limitations of such “software” running on electronics. They put AI systems into a different category than humans.
So I try to poke at these beliefs and try to understand how they are so confidently brushing away concerns about agentic systems and instead focusing on the risks of narrowly intelligent tools being misused by human actors. It turns out that is a more concrete narrative which is easier to hold in their heads.
It is harder but still possible to point to the likelihood of AI systems getting so enmeshed in our society that it disrupts our ability to coordinate and steer the future of humanity in a positive direction. Even when individual agents are aware of and do not like the trajectory their company is taking, it is very hard to coordinate action in a synchronized way to break out of inadequate equilibriums.
Just like how we had trouble predicting the effects of humans interacting with each other through the internet at this scale, the aggregate effects of introducing AI systems into the mix is not known. Note how even this conceives of AI systems as these passive objects.
The idea of an intelligent agentic AI system steering the world towards a state using many independent causal chains such that no human is able to zoom out and hold all of these nudges together in their head and when the final state is reached it seems almost coincidental, hard to backchain to the AI itself feels like an idea straight from a scifi novel and not worth considering because of how outlandish it is.
But it is important to be able to conceive of these possibilities, if only as candidate hypotheses so that you can examine the empirical evidence from the world fairly. Our defenses against weird ideas need to be identified and carefully questioned, by recognizing we are living in a slice of time which is extremely different from most of our past.
I notice that some possibilities such as these are not even considered and that made me wonder why. Why is it that most people seem to stubbornly think of AI systems as passive tools and believe strongly that it cannot anytime soon become agentic in the same way humans or animals are.
I claim this is because implicitly they hold a dualistic view of the world. While they might not be able to articulate it in the language of souls, free will, a divine spark, etc. On some deep level they think living biological things have this ability—“agency” to spawn new causal chains while inanimate objects cannot do this.
This is a fundamental limitation of any object they deem as computable, reducible down to a known process. This is why people who are not aware of the technical details are more likely to accept AI could be agentic while a lot of academics seem to be resistant to this idea. The closer you are to the system you study, the clearer it is that the system is an automaton.
I don’t think many people have truly dissolved the question of free will, they have not completely embraced the highly likely possibility that we are biochemical machines ourselves. I think reductionism and computationalism is such an orthodoxy among scientists that it is only logically that they admit it while emotionally they resist the idea.
There is a cognitive dissonance when they are forced to follow through to the logical conclusion of these beliefs and these metaphysical claims come into conflict with the idea of humans including themselves having this special quality that makes us able to act independently of our past. Most of us still consider the counterfactual actions we might have taken as very real options we had but chose not to pick.
It is important to verify if these claims of mine are true because, in order to rally the political will to pass legislation that is pivotal or to attract the brightest minds of humanity to take this problem seriously we need to make sure that we do a good job of communicating the existential risk from general intelligences.
Appealing to authority and pointing to some leaders of the field who share these concerns, or pointing to the CEOs of the companies that build these systems is not a good way to get people to develop their own inside view of why this problem matters and is real.
People who are likely to solve this problem will need to see the actual shape of the problem via legitimate communication techniques and that will involve making their metaphysical assumptions explicit and addressing them.
I understand that eventually lesswrong and the sequences aim to deal with exactly these kinds of questions but I am worried that it is too late by the time this conversation happens. Early on in the funnel, useful people might be put off by this dissonance they feel.
A solution might be to have some of these philosophical conversations be more mainstream.
I am confused by who you’re talking about, e.g.
but then
Aren’t non-academics and non-experts the majority, i.e. “most people”?
Your main idea seems to be that academics and other technically knowledgeable people are supposed to be materialists and reductionists, but in their hearts they still don’t think of themselves as automata, and this prevents them from conceiving that nonhuman automata could develop all of the capacities that humans have. So in order to open their minds to the higher forms of AI risk, one should emphasize materialist philosophy of mind, and that human nature and machine nature are not that different.
Well, people have a variety of attitudes. Many of the people working in deep learning or in AI safety, maybe even a majority, definitely believe that artificial neural networks can be conscious, and can be people. Some are more agnostic and say, higher cognitive capabilities don’t necessarily imply personhood, and that we just don’t know which AIs would be conscious and which not. It is even possible to think that AIs (at least on non-quantum computers) can probably never be conscious, and still think that they are capable of surpassing us; that would be my view.
Given this situation, I think that not tying AI safety to a particular philosophy of mind is appropriate. However, people have their views, and if someone thinks that a particular philosophy of mind is necessarily part of the case for AI safety, then that is how they will they present it.
You’re writing from India, so maybe people there, who are working on AI and machine learning, more often have a religious or spiritual concept of human nature, compared to their counterparts in the secularized West?
I was talking about people who had not grokked materialism which is the majority. The people who are not aware of the technical details model AI as this black box, therefore, seem to be more open to considering that it might be agentic but that is them just deferring to an outside view that sounds convincing rather than building their own model.
Most people I talked to were from India and it is possible there is a pattern there. But I see similar arguments come up even in the people in the west. When people say “it is just statistics”, they seem to be pointing to the idea that deterministic processes can never be agentic.
I am not trying to bring consciousness into the discussion necessarily but I think there is value in helping people make their existing philosophical beliefs more explicit so that they can see it to the natural conclusion.
My reading of this post is both that the ideas presented are confused in ways that make it hard to understand and they it’s pointing at some real things and not doing a great job of explaining what’s going on.
First, the bit about metaphysics. “Metaphysics” is kind of a terrible term and poorly defined even within academic philosophy. So as a result it’s not really clear what you mean by it. Reading this post I infer that you just mean something like “topics about which philosophy is still concerned because we don’t or can’t get information that would enable us to have sufficient certainty of answers to allow those topics to transition into science”. If you mean something else, please let us know.
Second, I’m not sure what the argument here is other than some AI researchers don’t think enough about philosophy and take scientific materialism and associated worldviews too seriously. I’m also not sure what this post hopes to accomplish, though I’m guessing the hope is that it would be something like convincing AI researchers to think about philosophy more, but there’s not really anything here that seems like an argument that would convince anyone who didn’t already agree with the (obscured) points.
That said, there are some points in this post that I think are good ones, though poorly justified within the text. In particular:
and
This is very much true, although I don’t think we should hold AI researchers too much to blame because this is a general issue in Western and Westernized cultures. Without getting too deep into the historical factors for the widespread belief in naive dualism, it nonetheless is the childhood worldview of most scientists and they only partly succeed at unlearning it in order to do science. In particular, they might unlearn it in narrow contexts related to their immediate work, but then get confused and fail to unlearn it in general, resulting in them getting confused about things like agency and free will.
Luckily, some folks have also noticed this and are trying to address it! A good keyword for looking into this within the AI safety literature is embedded agency. Of course, even embedded agency is a rather narrow way of looking this this and it misses some of the larger picture (whether or not that matters for AI safety is yet to be seen). For what it’s worth, this is one of the reasons I’m writing a book about epistemology: to try to create a text that can convey the embedded worldview to STEM-type folks who work on AI so they aren’t so likely to commit the mistakes of naive dualism.
I think that is quite close. I mean the implicit assumptions behind all these discussions, which are unquestioned. Moral realism, Computationalism, Empiricism, and Reductionism all come to mind. These topics cannot be tested or falsified with the scientific method.
I thought it would be best to try even if I am not confident it will make any impact on people reading it. My attempt is, like you rightly said, trying to get AI safety researchers to take philosophy more seriously. Most people see it as a past time that they can enjoy for intrinsic pleasure. In my opinion there is a lot of utility if we practiced going more meta until we could see the underpinnings of both the problem of x risk and the solution.
Some of the utility comes from being able to communicate it to more diverse people at higher fidelity. The rest comes from empowering existing researchers to maybe make a breakthrough in alignment itself.
A lot of these objects like values, and goals seem to exist strongly in our ontology. I would like to see people try and question these things, consider other possibilities.
This exchange between Connor and Joscha seems to be an example where Connor clearly is irritated at the question because it is trying to use philosophy to question if we should even both saving humanity, is humans bad by our own standards. I can understand how he feels completely. But notice how Joscha seems to seriously think the philosophy of what values we have and how they are justified are very important.
In this community it seems to taken as fact that the direction we align the AI towards is something to be considered after figuring out how to set the direction in anyway whatsoever. We have decoupled these two things. I would like to question these assumptions, and because I am not smart enough maybe others can also try. This needs us to unsee the boundaries we are so used to and be very careful which ones we put down.
Yeah, I was hoping to draw attention to this problem with my post. I love the embedded agency comic series. Yeah, the cartesian boundary is one of such boundaries which most of us have but again if we want to think about alignment honestly, I think it is worthwhile to train to unsee that too.
I will check out your book. I hope to also maybe write something that can help people grok monoism and other philosophical ideas they might want to consider in its entirety.
Downvoted for rambling and condescending style. Consider writing a couple of paragraphs clearly summarizing the heck you are talking about instead.
Thanks for the constructive critism. I thought about it and I guess I need to increase the legibility of what I wrote.
I will add a TLDR and update the post soon.
It could be because they are thinking of AI and the kind of conventional computer they are familiar with...and over generalising. Ordinary PCs are passive , waiting for the user to tell them what to do, because thats what the mass market wants...the market for agentive software is smaller and more esoteric.
It doesn’t have to be the result of explicit metaphysical beliefs...it could be the result of vague guesswork, and analogical thinking.
Now you are defining “agentic” as “possessing spooky metaphysical free will” rather than “not passive”. It’s perfectly possibly to build an agent-in-the-sense-of-active out of mechanical parts.
I don’t think Yudkowsky has.
Yes. That’s why it’s a bad idea to treat any metaphysical claim as certain. Including the one above.
Yeah I could be wrong but my claim is implicit metaphysical beliefs have a big role here.
I was just noting that people who are aware of the internal workings of AI will have to acutely face cognitive dissonance if they admit it can have “spooky” agency. They can’t compartmentalize it the way others can.