It seems common for people trying to talk about AI extinction to get hung up on whether statements derived from abstract theories containing mentalistic atoms can have objective truth or falsity values. They can. And if we can first agree on such basic elements of our ontology/epistemology as that one agent can be objectively smarter than another, that we can know whether something that lives in a physical substrate that is unlike ours is conscious, and that there can be some degree of objective truth as to what is valuable [not that all beings that are merely intelligent will necessarily pursue these things], it in fact becomes much more natural to make clear statements and judgments in the abstract or general case, about what very smart non-aligned agents will in fact do to the physical world.
Why does any of that matter for AI safety? AI safety is a matter of public policy. In public policy making, you have a set of preferences, which you get from votes or surveys, and you formulate policy based on your best objective understanding of cause and effect. The preferences don’t have to be objective, because they are taken as given. It’s quite different to philosophy, because you are trying to achieve or avoid something, not figure out what something ultimately is. You do t have to answer Wolfram’s questions in their own terms, because you can challenge the framing.
And if we can first agree on such basic elements of our ontology/epistemology as that one agent can be objectively smarter than another,
It’s not all that relevant to AI safety, because an AI only needs some potentially dangerous capabilities. Admittedly, a lot of the literature gives the opposite impression.
that we can know whether something that lives in a physical substrate that is unlike ours is conscious,
You haven’t defined consciousness and you haven’t explained how . It doesn’t follow automatically from considerations about intelligence. And it doesn’t follow from having some mentalistic terms in our theories.
and that there can be some degree of objective truth as to what is valuable
there doesn’t need to be. You don’t have to solve ethics to set policy.
I think AI safety isn’t as much a matter of government policy as you seem to think. Currently, sure. Frontier models are so expensive to train only the big labs can do it. Models have limited agentic capabilities, even at the frontier.
But we are rushing towards a point where science makes intelligence and learning better understood. Open source models are getting rapidly more powerful and cheap.
In a few years, the yrend suggests that any individual could create a dangerously powerful AI using a personal computer.
Any law which fails to protect society if even a single individual chooses to violate it once… Is not a very protective law. Historical evidence suggests that occasionally some people break laws. Especially when there’s a lot of money and power on offer in exchange for the risk.
What happens at that point depends a lot on the details of the lawbreaker’s creation. With what probability will it end up agentic, coherent, conscious, self-improvement capable, escape and self-replication capable, Omohundro goal driven (survival focused, resource and power hungry), etc...
The probability seems unlikely to me to be zero for the sorts of qualities which would make such an AI agent dangerous.
Then we must ask questions about the efficacy of governments in detecting and stopping such AI agents before they become catastrophically powerful.
What happens at that point depends a lot on the details of the lawbreaker’s creation. [ . . . ] The probability seems unlikely to me to be zero for the sorts of qualities which would make such an AI agent dangerous.
[ 1. ] In public policy making, you have a set of preferences, which you get from votes or surveys, and you formulate policy based on your best objective understanding of cause and effect. The preferences don’t have to be objective, because they are taken as given.
The point I’m making in the post
Well, I reject the presumption of guilt.
is that no matter whether you have to treat the preferences as objective, there is an objective fact of the matter about what someone’s preferences are, in the real world [ real, even if not physical ].
[ 2. ] [ Agreeing on such basic elements of our ontology/epistemology ] isn’t all that relevant to AI safety, because an AI only needs some potentially dangerous capabilities.
Whether or not an AI “only needs some potentially dangerous capabilities” for your local PR purposes, the global truth of the matter is that “randomly-rolled” superintelligences will have convergent instrumental desires that have to do with making use of the resources we are currently using [like the negentropy that would make Earth’s oceans a great sink for 3 x 10^27 joules], but not desires that tightly converge with our terminal desires that make boiling the oceans without evacuating all the humans first a Bad Idea.
[ 3. ] You haven’t defined consciousness and you haven’t explained how [ we can know something that lives in a physical substrate that is unlike ours is conscious ].
My intent is not to say “I/we understand consciousness, therefore we can derive objectively sound-valid-and-therefore-true statements from theories with mentalistic atoms”. The arguments I actually give for why it’s true that we can derive objective abstract facts about the mental world, begin at “So why am I saying this premise is false?”, and end at ”. . . and agree that the results came out favoring one theory or another.” If we can derive objectively true abstract statements about the mental world, the same way we can derive such statements about the physical world [e.g. “the force experienced by a moving charge in a magnetic field is orthogonal both to the direction of the field and to the direction of its motion”] this implies that we can understand consciousness well, whether or not we already do.
[ 4. ] there doesn’t need to be [ some degree of objective truth as to what is valuable ]. You don’t have to solve ethics to set policy.
My point, again, isn’t that there needs to be, for whatever local practical purpose. My point is that there is.
Why does any of that matter for AI safety? AI safety is a matter of public policy. In public policy making, you have a set of preferences, which you get from votes or surveys, and you formulate policy based on your best objective understanding of cause and effect. The preferences don’t have to be objective, because they are taken as given. It’s quite different to philosophy, because you are trying to achieve or avoid something, not figure out what something ultimately is. You do t have to answer Wolfram’s questions in their own terms, because you can challenge the framing.
It’s not all that relevant to AI safety, because an AI only needs some potentially dangerous capabilities. Admittedly, a lot of the literature gives the opposite impression.
You haven’t defined consciousness and you haven’t explained how . It doesn’t follow automatically from considerations about intelligence. And it doesn’t follow from having some mentalistic terms in our theories.
there doesn’t need to be. You don’t have to solve ethics to set policy.
I think AI safety isn’t as much a matter of government policy as you seem to think. Currently, sure. Frontier models are so expensive to train only the big labs can do it. Models have limited agentic capabilities, even at the frontier.
But we are rushing towards a point where science makes intelligence and learning better understood. Open source models are getting rapidly more powerful and cheap.
In a few years, the yrend suggests that any individual could create a dangerously powerful AI using a personal computer.
Any law which fails to protect society if even a single individual chooses to violate it once… Is not a very protective law. Historical evidence suggests that occasionally some people break laws. Especially when there’s a lot of money and power on offer in exchange for the risk.
What happens at that point depends a lot on the details of the lawbreaker’s creation. With what probability will it end up agentic, coherent, conscious, self-improvement capable, escape and self-replication capable, Omohundro goal driven (survival focused, resource and power hungry), etc...
The probability seems unlikely to me to be zero for the sorts of qualities which would make such an AI agent dangerous. Then we must ask questions about the efficacy of governments in detecting and stopping such AI agents before they become catastrophically powerful.
Have you read The Sun is big, but superintelligences will not spare Earth a little sunlight?
I’ll address each of your 4 critiques:
The point I’m making in the post
is that no matter whether you have to treat the preferences as objective, there is an objective fact of the matter about what someone’s preferences are, in the real world [ real, even if not physical ].
Whether or not an AI “only needs some potentially dangerous capabilities” for your local PR purposes, the global truth of the matter is that “randomly-rolled” superintelligences will have convergent instrumental desires that have to do with making use of the resources we are currently using [like the negentropy that would make Earth’s oceans a great sink for 3 x 10^27 joules], but not desires that tightly converge with our terminal desires that make boiling the oceans without evacuating all the humans first a Bad Idea.
My intent is not to say “I/we understand consciousness, therefore we can derive objectively sound-valid-and-therefore-true statements from theories with mentalistic atoms”. The arguments I actually give for why it’s true that we can derive objective abstract facts about the mental world, begin at “So why am I saying this premise is false?”, and end at ”. . . and agree that the results came out favoring one theory or another.” If we can derive objectively true abstract statements about the mental world, the same way we can derive such statements about the physical world [e.g. “the force experienced by a moving charge in a magnetic field is orthogonal both to the direction of the field and to the direction of its motion”] this implies that we can understand consciousness well, whether or not we already do.
My point, again, isn’t that there needs to be, for whatever local practical purpose. My point is that there is.