Sorry for the length of this “question”. The post in which I cover the preliminary context hasn’t been published yet (4,000+ words and I’m < 40% through my outline).
Introduction
I have done some thinking of how one might quantify the capabilities of an AI system to influence the “real world” environment.
I have identified three broad interfaces for executing capabilities:
Humans (and groups thereof)
Human infrastructure
Physics/Bare reality
Humans
The AI could influence the world by convincing humans to do things that it wants.
Some relevant skills:
Communication
Negotiation/Bargaining (broadly construed)
Trade (broadly construed)
Persuasion
Flirtation
Deception
Manipulation
Etc.
The idea here is influence the real world via influencing individual humans or groups of humans
Quantifying Human Interfacing Capabilities
To a first approximation, a function to quantify human capabilities might be a positive (aggregate) function of something like: “the likelihood that the AI could convince a(n arbitrary) human to perform a(n arbitrary) act” (let’s call this the likelihood of successful persuasion [LSP]). The function may apply suitable modifications to LSP such as:
Positive weighted by the power/influence of the human(s)
Inversely weighted by the time taken to convince said human(s)
Positive weighted by the influence of the act(s)
Inversely weighted by the human(‘s/s’) disposition towards the act(s)
Defence of the Method
The human interface is just the AI influencing the world via influencing humans. A naive way to quantify the AI’s ability to influence humans is something like an aggregate of LSP. Upon thinking on it for a few seconds, you’d like to modify the LSP in some ways.
This is vague, because I am not trying to design a concrete measure of “ability to influence humans”, at this stage of my thinking, I’m fine with a high level abstraction of what such a measure might look like.
Human Infrastructure
The AI could influence the world via the levers of human civilisation. A non-exhaustive list of relevant human infrastructure follows:
Social
Political
Cultural
Economic and financial
Security
Information Technology
Legal
Etc.
Subclasses of the above
Superclasses of the above
Intersections of the above
Unions of the above
An example of influencing the world via human infrastructure is hacking into a web server. I’m not going to attempt to list the relevant classes of skills for interacting with human infrastructure because they are far too many.
Some ways of interfacing with human infrastructure
All jobs/occupations in the global economy.
Any service for which monetary, social, or other compensation can be provided.
Miscellaneous edge case
Quantifying Human Infrastructure Interfacing Capabilities
To a first approximation, you want something like “economic power”. One might try to capture related concepts like social power, but I think “economic power” will capture it adequately.
Some ways to operationalise economic power:
Net present value of aggregate available economic resources
Maximum utilisation rate of aggregate available economic resources
Some others
If I needed a concrete measure, I’d ask an economist.
At the level of abstraction at which I’m thinking of things, I find his vague notion satisfactory.
Defence of the Method
My intuition for quantifying “human infrastructure interfacing capabilities” in economic terms is something like:
Economists have been working really hard to quantify (analogues of) this for centuries.
Don’t reinvent the wheel.
Stand on the shoulders of giants.
The market is smarter and wiser than all of us and already attempts to value the entire human economy.
Ideally, free markets aggregate information from all participants in the economy.
The global economy isn’t a true free market, but “high level of abstraction” and it’s still a much better instrument than any others we have.
Physics
AKA the “bare interface”.
The AI can also attempt to manipulate its environment directly without using the humans or human infrastructure interfaces. Any way of interacting with the real world that doesn’t use the “human”/”human infrastructure” proxies — interfacing with bare reality itself — are interacting via the “physics” interface.
One way of an AI system interacting with a human through the physics interface is shooting them down with a lethal autonomous weapon.
The other natural sciences (chemistry, biology, geology, meteorology etc.) apply here (but they seem to be emergent physics/physics at higher levels of abstraction, so I still think of it as the “physics interface” [but I’m willing to change/drop the name if people are sufficiently opposed to it]).
Some relevant skills to influence physics include:
At a very high level of abstraction (consider the methods I proposed for the other interfaces) how would you quantify the ability of an AI System to influence bare reality?
Desiderata for an Answer
A good abstraction for quantifying the physics interfacing capabilities of an AI system should have the following properties:
Intuitively sensible
It should adequately match what a commonsense notion of “real world capabilities” is.
There shouldn’t be actions that intuitively come across as quite impactful that would get assessed as not impactful by the measure.
The measure should adequately quantify capabilities of an AI system at both the low and very high ends.
Example low end action: raising the elevation of a 100g ball by one metre
Example high end action: stellar engineering
Example very high end action: tiling the affectable universe with paperclips
Orthogonal to Motivations
The measure shouldn’t care what the motivations, goals, or values of the AI system it’s assessing are.
No AI systems should be systematically upgraded or degraded based on their motivations.
General
Able to quantify the myriad ways in which an agent may influence base reality via direct action.
Money is very good a measure for quantifying capability deployed via the human infrastructure interface because it satisfies the above criteria (when reinterpreted to fit human infrastructure).
So for clues on where to look, what fits the analogy: “money but for physics”? What’s the currency with which arbitrary physical capability can be purchased?
[Question] How Do You Quantify [Physics Interfacing] Real World Capabilities?
Disclaimer
Sorry for the length of this “question”. The post in which I cover the preliminary context hasn’t been published yet (4,000+ words and I’m < 40% through my outline).
Introduction
I have done some thinking of how one might quantify the capabilities of an AI system to influence the “real world” environment.
I have identified three broad interfaces for executing capabilities:
Humans (and groups thereof)
Human infrastructure
Physics/Bare reality
Humans
The AI could influence the world by convincing humans to do things that it wants.
Some relevant skills:
Communication
Negotiation/Bargaining (broadly construed)
Trade (broadly construed)
Persuasion
Flirtation
Deception
Manipulation
Etc.
The idea here is influence the real world via influencing individual humans or groups of humans
Quantifying Human Interfacing Capabilities
To a first approximation, a function to quantify human capabilities might be a positive (aggregate) function of something like: “the likelihood that the AI could convince a(n arbitrary) human to perform a(n arbitrary) act” (let’s call this the likelihood of successful persuasion [LSP]). The function may apply suitable modifications to LSP such as:
Positive weighted by the power/influence of the human(s)
Inversely weighted by the time taken to convince said human(s)
Positive weighted by the influence of the act(s)
Inversely weighted by the human(‘s/s’) disposition towards the act(s)
Defence of the Method
The human interface is just the AI influencing the world via influencing humans. A naive way to quantify the AI’s ability to influence humans is something like an aggregate of LSP. Upon thinking on it for a few seconds, you’d like to modify the LSP in some ways.
This is vague, because I am not trying to design a concrete measure of “ability to influence humans”, at this stage of my thinking, I’m fine with a high level abstraction of what such a measure might look like.
Human Infrastructure
The AI could influence the world via the levers of human civilisation. A non-exhaustive list of relevant human infrastructure follows:
Social
Political
Cultural
Economic and financial
Security
Information Technology
Legal
Etc.
Subclasses of the above
Superclasses of the above
Intersections of the above
Unions of the above
An example of influencing the world via human infrastructure is hacking into a web server. I’m not going to attempt to list the relevant classes of skills for interacting with human infrastructure because they are far too many.
Some ways of interfacing with human infrastructure
All jobs/occupations in the global economy.
Any service for which monetary, social, or other compensation can be provided.
Miscellaneous edge case
Quantifying Human Infrastructure Interfacing Capabilities
To a first approximation, you want something like “economic power”. One might try to capture related concepts like social power, but I think “economic power” will capture it adequately.
Some ways to operationalise economic power:
Net present value of aggregate available economic resources
Maximum utilisation rate of aggregate available economic resources
Some others
If I needed a concrete measure, I’d ask an economist.
At the level of abstraction at which I’m thinking of things, I find his vague notion satisfactory.
Defence of the Method
My intuition for quantifying “human infrastructure interfacing capabilities” in economic terms is something like:
Economists have been working really hard to quantify (analogues of) this for centuries.
Don’t reinvent the wheel.
Stand on the shoulders of giants.
The market is smarter and wiser than all of us and already attempts to value the entire human economy.
Ideally, free markets aggregate information from all participants in the economy.
The global economy isn’t a true free market, but “high level of abstraction” and it’s still a much better instrument than any others we have.
Physics
AKA the “bare interface”.
The AI can also attempt to manipulate its environment directly without using the humans or human infrastructure interfaces. Any way of interacting with the real world that doesn’t use the “human”/”human infrastructure” proxies — interfacing with bare reality itself — are interacting via the “physics” interface.
One way of an AI system interacting with a human through the physics interface is shooting them down with a lethal autonomous weapon.
The other natural sciences (chemistry, biology, geology, meteorology etc.) apply here (but they seem to be emergent physics/physics at higher levels of abstraction, so I still think of it as the “physics interface” [but I’m willing to change/drop the name if people are sufficiently opposed to it]).
Some relevant skills to influence physics include:
Tool use (broadly construed)
Scientific research (broadly construed)
Engineering (broadly construed)
Technological innovation/invention (broadly construed)
My Question
At a very high level of abstraction (consider the methods I proposed for the other interfaces) how would you quantify the ability of an AI System to influence bare reality?
Desiderata for an Answer
A good abstraction for quantifying the physics interfacing capabilities of an AI system should have the following properties:
Intuitively sensible
It should adequately match what a commonsense notion of “real world capabilities” is.
There shouldn’t be actions that intuitively come across as quite impactful that would get assessed as not impactful by the measure.
It should adequately quantify the capabilities of the “high powers of physics manipulation”.
Robust to scale
The measure should adequately quantify capabilities of an AI system at both the low and very high ends.
Example low end action: raising the elevation of a 100g ball by one metre
Example high end action: stellar engineering
Example very high end action: tiling the affectable universe with paperclips
Orthogonal to Motivations
The measure shouldn’t care what the motivations, goals, or values of the AI system it’s assessing are.
No AI systems should be systematically upgraded or degraded based on their motivations.
General
Able to quantify the myriad ways in which an agent may influence base reality via direct action.
Money is very good a measure for quantifying capability deployed via the human infrastructure interface because it satisfies the above criteria (when reinterpreted to fit human infrastructure).
So for clues on where to look, what fits the analogy: “money but for physics”? What’s the currency with which arbitrary physical capability can be purchased?