Avoiding metaphysics means giving bad philosophy a free pass

I have noticed a culture among AI Alignment field builders, a certain reluctance to let conversations lead to metaphysical underpinnings. The framing and unsaid assumptions beneath models that predict the existential risk from AI is not explained, rather selection pressures strongly favor people who already grokked these ideas.

So I worry that this is leading to a homogeneity in the kind of people that are surviving the funnel to reach the point where they start contributing to the community—by deconfusing concepts, starting new research directions, communicating the risk to others etc. It seems worthwhile at this point to try and increase variance here.

This current strategy is totally understandable because people do not react favorably to their metaphysics being challenged. Philosophy needs careful rigorous thinking, it is extremely easy to get mired up in discussions that lead nowhere, with ill defined words thrown around regarding non falsifiable questions.

So most people choose to ignore these questions about consciousness, free will, qualia, sentience, etc and try to stick to the standard pitch. The standard pitch involves extrapolating that generally capable systems do not need any of these human-like properties to be dangerous. The bar is quite low to logically make a case for x risk from intelligent systems that mechanically follow goals.

But logic is not sufficient here, there are emotional knots to untangle, there are cognitive defense mechanisms in all of us which protect our sanity. These series of compromises that our psyche has set up to compartmentalize inconsistent beliefs need to be carefully undone, it is especially important to have empathy when spreading this meme.

All of our axioms are based on intuitions. The initial friction that I see when trying to communicate the existential threat from generally intelligent AI systems is that people have strong priors on the inherent limitations of such “software” running on electronics. They put AI systems into a different category than humans.

So I try to poke at these beliefs and try to understand how they are so confidently brushing away concerns about agentic systems and instead focusing on the risks of narrowly intelligent tools being misused by human actors. It turns out that is a more concrete narrative which is easier to hold in their heads.

It is harder but still possible to point to the likelihood of AI systems getting so enmeshed in our society that it disrupts our ability to coordinate and steer the future of humanity in a positive direction. Even when individual agents are aware of and do not like the trajectory their company is taking, it is very hard to coordinate action in a synchronized way to break out of inadequate equilibriums.

Just like how we had trouble predicting the effects of humans interacting with each other through the internet at this scale, the aggregate effects of introducing AI systems into the mix is not known. Note how even this conceives of AI systems as these passive objects.

The idea of an intelligent agentic AI system steering the world towards a state using many independent causal chains such that no human is able to zoom out and hold all of these nudges together in their head and when the final state is reached it seems almost coincidental, hard to backchain to the AI itself feels like an idea straight from a scifi novel and not worth considering because of how outlandish it is.

But it is important to be able to conceive of these possibilities, if only as candidate hypotheses so that you can examine the empirical evidence from the world fairly. Our defenses against weird ideas need to be identified and carefully questioned, by recognizing we are living in a slice of time which is extremely different from most of our past.

I notice that some possibilities such as these are not even considered and that made me wonder why. Why is it that most people seem to stubbornly think of AI systems as passive tools and believe strongly that it cannot anytime soon become agentic in the same way humans or animals are.

I claim this is because implicitly they hold a dualistic view of the world. While they might not be able to articulate it in the language of souls, free will, a divine spark, etc. On some deep level they think living biological things have this ability—“agency” to spawn new causal chains while inanimate objects cannot do this.

This is a fundamental limitation of any object they deem as computable, reducible down to a known process. This is why people who are not aware of the technical details are more likely to accept AI could be agentic while a lot of academics seem to be resistant to this idea. The closer you are to the system you study, the clearer it is that the system is an automaton.

I don’t think many people have truly dissolved the question of free will, they have not completely embraced the highly likely possibility that we are biochemical machines ourselves. I think reductionism and computationalism is such an orthodoxy among scientists that it is only logically that they admit it while emotionally they resist the idea.

There is a cognitive dissonance when they are forced to follow through to the logical conclusion of these beliefs and these metaphysical claims come into conflict with the idea of humans including themselves having this special quality that makes us able to act independently of our past. Most of us still consider the counterfactual actions we might have taken as very real options we had but chose not to pick.

It is important to verify if these claims of mine are true because, in order to rally the political will to pass legislation that is pivotal or to attract the brightest minds of humanity to take this problem seriously we need to make sure that we do a good job of communicating the existential risk from general intelligences.

Appealing to authority and pointing to some leaders of the field who share these concerns, or pointing to the CEOs of the companies that build these systems is not a good way to get people to develop their own inside view of why this problem matters and is real.

People who are likely to solve this problem will need to see the actual shape of the problem via legitimate communication techniques and that will involve making their metaphysical assumptions explicit and addressing them.

I understand that eventually lesswrong and the sequences aim to deal with exactly these kinds of questions but I am worried that it is too late by the time this conversation happens. Early on in the funnel, useful people might be put off by this dissonance they feel.

A solution might be to have some of these philosophical conversations be more mainstream.