Theories With Mentalistic Atoms Are As Validly Called Theories As Theories With Only Non-Mentalistic Atoms
[ This is supposed to be a didactic post. I’m not under the impression that I’m saying anything genuinely new. Thanks to Stephen Wolfram. ]
I’m about an hour in to the Yudkowsky-Wolfram discussion [AI-generated transcript from which I’m quoting]. Wolfram thinks we should not fear AI doom very much in particular. I think he is wrong. It seems to me like the cause of Wolfram’s hesitancy to buy into the AI doom idea is a premise that theories with only non-mentalistic atoms are the only valid or allowed theories of how the world works. This is a false premise, but I want to get into what I mean by a premise that theories containing mentalistic atoms are not allowed and how it seems to be constraining the set of claims Wolfram allows himself to make, before I say why it is a wrong premise.
Examples of what I mean:
Example 1 | Wolfram doubts the idea that it’s possible to measure intelligence on a single unified scale or axis.
[Wolfram:] So the question then is, you know, one thing one could take the point of view is there’s this kind of single index of smart. I mean, I think people, you know, in the 1930s,people wanted to invent kind of an index of general intelligence. They called it g for humans, which I’ve never really believed in [ . . . ] There are somethings I’m pretty good at doing where I feel I’m pretty smart. There are other things where I know I’m pretty dumb, and it’s… it’s kind of… it’s not really a, a sort of single index.
Wolfram grants that it’s possible for individuals to be “smarter” than other individuals, or able to out-predict or beat them, in local, individual cases.
We all know that constructing a single-axis scale for any attribute we care to name, is logically possible. What Wolfram must be objecting to, is the idea that such an axis might be objective for intelligence—where of course we could rank, eg, possible states of a system, objectively by, say, kinetic energy.
So Wolfram grants that we can receive atoms of sense-data—individual facts of experience such as whether Alice can beat Bob at chess—that tell us something about intelligence. And it’s trivial that of course we can create a single intelligence axis that has Alice and Bob and everyone else somewhere on it, whether or not that axis has anything to do with reality. What Wolfram must be objecting to, is the idea that we can, from our individual atoms of sense-data, construct—via Bayesian backward-chaining induction blah blah or whatever else—a logical theory of how those individual atoms of sense-data are related, that contains such terms in it as whether Alice is objectively smarter than Bob.
Example 2 | Wolfram is implicitly skeptical that claims about consciousness can generally have objective truth-or-falsity values beyond the physical substrates that we have direct empirical experience of actually being conscious in, ourselves.
[Wolfram:] Right. It’s a reasonable, you know, piece of scientific induction, extrapolation [that other humans besides oneself can be conscious]. But what I’m curious about is, if you… If you say the only thing that can be conscious is something that has that exact same design [as the human brain] [ . . . ]
[Yudkowsky:] I don’t say that.
[Wolfram:] Okay. So what… So where’s the boundary?
[ Later: ]
[Wolfram:] I’m big on immortality [ . . . ] kind of shocking that cryonics hasn’t worked yet [ . . . ] to cool down [water] without expanding [ . . . ] I’m just sort of curious from your sort of moral compass point of view, if immortality is achievable, but only digitally, what? How do you feel about that? I mean, in other words, if you’re, if you’re going to [ . . . ] start having your sort of backup AI, maybe you gradually, you know, your sort of thread of consciousness gradually migrates from being in your brain to being in your AI, and eventually, you know, your brain fails for some biological reason, and then it’s all the AI. I’m, I’m curious how you feel about that, whether you, whether you feel that that is a, a kind of an appropriate, kind of fun‑preserving outcome, or whether you think of that as being a, kind of a fun‑destroying outcome, so to speak.
In the top exchange, Wolfram recognizes that it’s reasonable to assume that one “meat” human may be conscious, from having been conscious as a “meat” human oneself.
In the bottom quote, Wolfram talks about immortality through cryonics as being only a physical and not a philosophical problem; there does not seem to be particular doubt in Wolfram’s mind that the person who woke up after successful cryonic preservation, despite the substrate having undergone some change in the meantime, would be him.
When Wolfram talks about “immortality through uploading”, he asks how one might choose to feel about it—not how to make it technically feasible. I think this is because he is modelling the question of “immortality” achieved on a silicon substrate, as an implicitly already-ceded technical lost cause as far as whether he can know whether his actual consciousness is preserved. The method he describes sure sounds more like someone training a chatbot to “replace” them, than anything that would actually allow someone to wake up in a simulated environment.
The only thing, that I can see as plausibly making Wolfram talk differently about immortality-through-cryonics vs immortality-through-uploading, is that, on his model of the world, you can draw really sound conclusions about what will happen to a consciousness if there is a physical through-line—but not otherwise.
Example 3 | Wolfram doubts that theories making statements about what is or is isn’t valuable can be scientific.
[Yudkowsky:] The Earth… The, the universe gets a little darker every time a bullet gets fired into somebody’s head, or they die of old age, even though the atoms are still doing their atom things.
[Wolfram:] Right. Okay. So this is [ . . . ] Viscerally, I agree with you. Scientifically, I have a bit of a hard time. I mean, in a sense, that, that feels, that feels like a very kind of spiritual kind of statement, which is not necessarily bad, but it’s just worth understanding what kind of a thing it is. I mean, it, it is saying that there’s something very, a kind of sacred thing about these attributes of humans [ . . . ]
[Wolfram:] [ . . . ] ethics is not a scientific field. You know, it’s about how we humans feel about things, and we humans could feel this way. We could feel that way. It’s, it’s to do with the nature of us as humans, and we could, you know, we could scientifize those statements by saying, “Let’s do an fMRI and notice why do you say that? Oh, your such and such lobe lights up.” But I don’t think that’s a particularly useful thing to say. I mean, I think, you know, I think it is a fair statement that this is, you know, it, it is a, a thing that we can capture that humans feel that this should happen or not happen, whatever else. But I guess that the, the, the… you know, there’s [ . . . ] one question is what’s the right thing to have happen? I don’t think there’s any abstract way to answer that. [ . . . ] I can imagine even humans who say, “No, no. You know, the planet is much more important than the humans,” for example. “Anything the humans do on the planet that messes up the planet, you know, get rid of the humans. We just want the, you know, the planet is more important”
As I understand his usage here, what Wolfram is using the word “scientific” to mean, is “the class of abstract theories about reality that can have objective soundness-and-validity”—i.e. the class of abstract theories which one can use to inspect and generate complicated statements like, “The Earth is an oblate spheroid most cheaply modeled as orbiting a point inside the Sun”, or “A whale is a kind of fish”, or “You should vote your values in elections, no matter what you first-order expect other people to do”—and make an objective call as to whether those statements say something true about the real world.
Wolfram acknowledges that we can observe individual instances of agents or groups of agents caring more about one thing than another, or having an objective preference as to what should happen.
He says that he feels an intuitive pull toward making value claims [as though they could be true or false].
But—implicitly—while in science you can run real experiments to test the soundness of a theory, in philosophy you can only run thought experiments. And seemingly you can run any thought experiments, each as “real” as the last. He doubts that anything can be proven, about the soundness of any particular theory of ethics. So he doubts that we can know abstract ethical truths, or even truths about how agents «should» behave, in general.
So why am I saying this premise is false? How can we know anything about the soundness of theories we can only test in our minds, instead of in physical reality?
Well, I object to the presumption of guilt.
As not-yet-very-grown-up humans, our minds are weak. They’re lossy and full of biases which make our thought experiments and exercises in first-order “moral logic” routinely yield conclusions recognizable as invalid or even repugnant to outside observers.
Our weak human minds do not appear to be as large as the rest of reality [it’s unclear to me what it would even mean, for a mind to be larger than the reality it existed in].
But our minds perceive atomically mental objects. When we make decisions, we ask what will happen to various hypothetical future versions of ourselves who make different choices. Most often we don’t model reductionist physics while doing this, but instead use a mentalistic framework in which other people are treated as copies of ourselves in different positions or with some modified attributes, like being gay instead of straight. [“What would I do, if I was gay? Or at least stuck on pretending to be gay? I’d be trying to date Chad instead of Becca, or at least pretending to. So even though Gay Darrell is going to be there, I, Aaron, shouldn’t waste time trying to shoo him off Becca . . . “]
We know there are some decisions we shouldn’t make, because they will lead other people to take advantage of us or to dislike us. We rely on this knowledge to act in the real world. If we don’t, we know bad things will happen to us. It’s grounded knowledge. How do we obtain it? We deduce it in our minds. Yes, we learn new things all the time about people who prefer new types of things, or different epistemic states they might be in. But the central engine telling us what is a good decision, is just asking: “What would I do?” When we’ve stopped the information-gathering step, we don’t need any experience to tell us just “what we would do”. We deduce it. So there is a seed of valid-and-sound mental-atoms logic, at least within each individual.
Objective truths are usually taken as being socially share-able. There doesn’t seem to be much sense in calling my theory-with-mental-atoms “objective” if it only describes my reality, and doesn’t say anything about yours at all. What would the word be adding?
People ever, in fact, agree on theories-with-mental-atoms. In [what I’ve listened to of] the podcast [so far], Wolfram quotes “We hold these truths to be self-evident . . .” Eliezer then says “they don’t need to be self-evident”. But if they’re axioms, they do—at least among the cult that makes them mean something. This is LessWrong—one is supposed to “explain, not to persuade”, and to follow various other norms. If I come up with a counterexample to someone’s post from U.S. politics, I’m probably going to rephrase it, thinking something like “politics is the mind-killer, and people on here really don’t like people who go around being a mind-killer because that’s against the project of all becoming more rational together, so this is worth the effort”. “Politics is the mind-killer” isn’t true “out there, in the physical world”—but it’s nonetheless true in a way that’s grounded, “objectively”.
Hofstadter’s Tortoise remarks that “politicians lie” is an “[obviously] valid utterance”, contrasting it with “politicians lie in cast-iron sinks”. E.T. Jaynes likewise contrasts the ‘obviously’ correct “knowledge is power” with the ‘obviously’ absurd “[social] power is knowledge”. Where did the shared sense of the truth [or falsity] of these statements come from? Where does the shared sense that the sky is blue come from? We don’t know the full causal story, but we can all agree we see the puzzle piece. And we can try to fit it into a larger theory. And since we can—however imperfectly—hold each other’s minds in our own, we don’t absolutely need a shared non-mentalistic experimental setup, to do shared thought experiments, and agree that the results came out favoring one theory or another.
None of this implies superintelligences will share our values, because in all of the ways that are contingent on what “value” looks like, a superintelligence cannot be modeled as a copy of us very well at all. A valid theory-with-mentalistic-atoms allows for copies of us that are so modified in their utility function area, that they no longer follow most of our derived rules about deontics. The most we can predict about them is that they will observe the same physical reality, and seek certain things instrumentally-convergently. And a valid theory-with-mentalistic-atoms allows for these things to be arbitrarily smart, if the appropriate stuff happens to make them so.
It seems common for people trying to talk about AI extinction to get hung up on whether statements derived from abstract theories containing mentalistic atoms can have objective truth or falsity values. They can. And if we can first agree on such basic elements of our ontology/epistemology as that one agent can be objectively smarter than another, that we can know whether something that lives in a physical substrate that is unlike ours is conscious, and that there can be some degree of objective truth as to what is valuable [not that all beings that are merely intelligent will necessarily pursue these things], it in fact becomes much more natural to make clear statements and judgments in the abstract or general case, about what very smart non-aligned agents will in fact do to the physical world.
Why does any of that matter for AI safety? AI safety is a matter of public policy. In public policy making, you have a set of preferences, which you get from votes or surveys, and you formulate policy based on your best objective understanding of cause and effect. The preferences don’t have to be objective, because they are taken as given. It’s quite different to philosophy, because you are trying to achieve or avoid something, not figure out what something ultimately is. You do t have to answer Wolfram’s questions in their own terms, because you can challenge the framing.
It’s not all that relevant to AI safety, because an AI only needs some potentially dangerous capabilities. Admittedly, a lot of the literature gives the opposite impression.
You haven’t defined consciousness and you haven’t explained how . It doesn’t follow automatically from considerations about intelligence. And it doesn’t follow from having some mentalistic terms in our theories.
there doesn’t need to be. You don’t have to solve ethics to set policy.
I think AI safety isn’t as much a matter of government policy as you seem to think. Currently, sure. Frontier models are so expensive to train only the big labs can do it. Models have limited agentic capabilities, even at the frontier.
But we are rushing towards a point where science makes intelligence and learning better understood. Open source models are getting rapidly more powerful and cheap.
In a few years, the yrend suggests that any individual could create a dangerously powerful AI using a personal computer.
Any law which fails to protect society if even a single individual chooses to violate it once… Is not a very protective law. Historical evidence suggests that occasionally some people break laws. Especially when there’s a lot of money and power on offer in exchange for the risk.
What happens at that point depends a lot on the details of the lawbreaker’s creation. With what probability will it end up agentic, coherent, conscious, self-improvement capable, escape and self-replication capable, Omohundro goal driven (survival focused, resource and power hungry), etc...
The probability seems unlikely to me to be zero for the sorts of qualities which would make such an AI agent dangerous. Then we must ask questions about the efficacy of governments in detecting and stopping such AI agents before they become catastrophically powerful.
Have you read The Sun is big, but superintelligences will not spare Earth a little sunlight?
Is your question directed at me, or the person I was replying to? I agree with the point “Sun is big, but...” makes. Here’s a link to a recent summary of my view on a plausible plan for the world to handle surviving AI. Please feel free to share your thoughts on it. https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy
I’ll address each of your 4 critiques:
The point I’m making in the post
is that no matter whether you have to treat the preferences as objective, there is an objective fact of the matter about what someone’s preferences are, in the real world [ real, even if not physical ].
Whether or not an AI “only needs some potentially dangerous capabilities” for your local PR purposes, the global truth of the matter is that “randomly-rolled” superintelligences will have convergent instrumental desires that have to do with making use of the resources we are currently using [like the negentropy that would make Earth’s oceans a great sink for 3 x 10^27 joules], but not desires that tightly converge with our terminal desires that make boiling the oceans without evacuating all the humans first a Bad Idea.
My intent is not to say “I/we understand consciousness, therefore we can derive objectively sound-valid-and-therefore-true statements from theories with mentalistic atoms”. The arguments I actually give for why it’s true that we can derive objective abstract facts about the mental world, begin at “So why am I saying this premise is false?”, and end at ”. . . and agree that the results came out favoring one theory or another.” If we can derive objectively true abstract statements about the mental world, the same way we can derive such statements about the physical world [e.g. “the force experienced by a moving charge in a magnetic field is orthogonal both to the direction of the field and to the direction of its motion”] this implies that we can understand consciousness well, whether or not we already do.
My point, again, isn’t that there needs to be, for whatever local practical purpose. My point is that there is.