To clarify the independent vs. interdependent distinction Julia suggested that EA thought about negative flow-through effects are an example of interdependent thinking. IMO EAs still tend to take an independent view on that. Even I did a bad job above of describing causal interdependencies in climate change (since I still placed the causal sources in a linear ‘this leads to this leads to that’ sequence).
So let me try to clarify again, at the risk of going meta-physical:
EAs do seem to pay more attention to causal dependencies than I was letting on, but in a particular way:
When EA researchers estimate impacts of specific flow-through effects, they often seem to have in mind some hypothetical individual who takes actions, which incrementally lead to consequences in the future. Going meta on that, they may philosophise about how an untested approach can have unforeseen and/or irreversible consequences, or about cluelessness (not knowing how the resulting impacts spread out across the future will average out). Do correct me if you have a different impression!
An alternate style of thinking involves holding multiple actors / causal sources in mind to simulate how they conditionally interact. This is useful for identifying root causes for problems, which I don’t recall EA researchers doing much of (e.g. the sociological/economic factors that originally made commercial farmers industrialise their livestock production).
To illustrate the difference, I think gene-environment interactions provide a neat case:
Independent ‘this or that’ thinking:
Hold one factor constant (e.g. take the different environments in which adopted twins grew up in as a representative sample) to predict the other (e.g. attribute 50% of variation of a general human trait to their genes).
Interdependent ‘this and that’ thinking:
Assume that factors will interplay, and therefore probabilities are not strictly independent.
Test nonlinear factors together to predict outcomes.
e.g. on/off gene for aggression × childhood trauma × teenagers playing violent video games
“A represents a set of possible ways the agent can be, E represents a set of possible ways the environment can be, and ⋅ : A × E → W is an evaluation function that returns a possible world given an element of A and an element of E”
Under the interdependent framing, the environment affords certain options perceivable by the agent, which they choose between.
A notion of Free Will loses its relevancy under this framing. Changes in the world were caused neither by the settings of the outside environment nor the embedded agent ‘willing’ an action, but rather as contingent on both.
You might counter: isn’t the agent’s body constituted of atomic particles that act and react deterministically over time, making free will an illusion?
Yes, and somehow in parts interacting across parts, they come to view the constitution of a greater whole, an agent, that makes choices.
None of these (admittedly confusing) framings have to be inconsistent with each other.
Overlap between ‘interdependent thinking’ and ‘context’ and ‘collective thinking’.
When individuals with their own distinct traits are constrained in the possible ways they can interact by surrounding others (i.e. by their context), they will behave predictably within those constraints:
e.g. when EAs stick to certain styles of analysis that they know comrades will grasp and admire when gathered at a conference or writing a post for others to read.
Analysis of the kind ‘this individual agent with x skills and y preferences will take/desist from actions that are more likely to lead to z outcomes’ falls flat here.
e.g. to paraphrase Critch’s Production Web scenario, which typical AI Safety analysis tends to overlook the severity of:
Take a future board that buys a particular ‘CEO AI service’ to ensure their company will be successful. The CEO AI elicits trustees for their inherent categorical preferences, but what they express at any given moment is guided by their recent interactions with influential others (e.g. the need to survive tougher competition by other CEO AIs). A CEO AI that plans company actions based on preferences elicited by board members’ preferences at any given point in time, will by default not account for actions bringing into existence processes that actually change the preferences board members state. That is, unless safety-minded AI developers design a management service that accounts for this circuitous dynamic, and boards are self-aware enough to buy the less-profit-optimised service that won’t undermine their personal integrity.
The risk emerges from how the AI developers and company’s board introduce assumptions of structure:
i.e. That you can design an AI to optimise for end states based on its human masters’ identified intrinsic preferences. That AI would fail to use available compute to determine whether a chosen instrumental action reinforces a process through which ‘stuff’ contingently gets flagged in human attention, expressed to the AI, received as inputs, and derived as ‘stable preferences’.
To clarify the independent vs. interdependent distinction
Julia suggested that EA thought about negative flow-through effects are an example of interdependent thinking. IMO EAs still tend to take an independent view on that. Even I did a bad job above of describing causal interdependencies in climate change (since I still placed the causal sources in a linear ‘this leads to this leads to that’ sequence).
So let me try to clarify again, at the risk of going meta-physical:
EAs do seem to pay more attention to causal dependencies than I was letting on, but in a particular way:
When EA researchers estimate impacts of specific flow-through effects, they often seem to have in mind some hypothetical individual who takes actions, which incrementally lead to consequences in the future. Going meta on that, they may philosophise about how an untested approach can have unforeseen and/or irreversible consequences, or about cluelessness (not knowing how the resulting impacts spread out across the future will average out). Do correct me if you have a different impression!
An alternate style of thinking involves holding multiple actors / causal sources in mind to simulate how they conditionally interact. This is useful for identifying root causes for problems, which I don’t recall EA researchers doing much of (e.g. the sociological/economic factors that originally made commercial farmers industrialise their livestock production).
To illustrate the difference, I think gene-environment interactions provide a neat case:
Independent ‘this or that’ thinking:
Hold one factor constant (e.g. take the different environments in which adopted twins grew up in as a representative sample) to predict the other (e.g. attribute 50% of variation of a general human trait to their genes).
Interdependent ‘this and that’ thinking:
Assume that factors will interplay, and therefore probabilities are not strictly independent.
Test nonlinear factors together to predict outcomes.
e.g. on/off gene for aggression × childhood trauma × teenagers playing violent video games
Cartesian frames seem an apt theoretical analogy
“A represents a set of possible ways the agent can be, E represents a set of possible ways the environment can be, and ⋅ : A × E → W is an evaluation function that returns a possible world given an element of A and an element of E”
Under the interdependent framing, the environment affords certain options perceivable by the agent, which they choose between.
A notion of Free Will loses its relevancy under this framing. Changes in the world were caused neither by the settings of the outside environment nor the embedded agent ‘willing’ an action, but rather as contingent on both.
You might counter: isn’t the agent’s body constituted of atomic particles that act and react deterministically over time, making free will an illusion?
Yes, and somehow in parts interacting across parts, they come to view the constitution of a greater whole, an agent, that makes choices.
None of these (admittedly confusing) framings have to be inconsistent with each other.
Overlap between ‘interdependent thinking’ and ‘context’ and ‘collective thinking’.
When individuals with their own distinct traits are constrained in the possible ways they can interact by surrounding others (i.e. by their context), they will behave predictably within those constraints:
e.g. when EAs stick to certain styles of analysis that they know comrades will grasp and admire when gathered at a conference or writing a post for others to read.
Analysis of the kind ‘this individual agent with x skills and y preferences will take/desist from actions that are more likely to lead to z outcomes’ falls flat here.
e.g. to paraphrase Critch’s Production Web scenario, which typical AI Safety analysis tends to overlook the severity of:
Take a future board that buys a particular ‘CEO AI service’ to ensure their company will be successful. The CEO AI elicits trustees for their inherent categorical preferences, but what they express at any given moment is guided by their recent interactions with influential others (e.g. the need to survive tougher competition by other CEO AIs). A CEO AI that plans company actions based on preferences elicited by board members’ preferences at any given point in time, will by default not account for actions bringing into existence processes that actually change the preferences board members state. That is, unless safety-minded AI developers design a management service that accounts for this circuitous dynamic, and boards are self-aware enough to buy the less-profit-optimised service that won’t undermine their personal integrity.
The risk emerges from how the AI developers and company’s board introduce assumptions of structure:
i.e. That you can design an AI to optimise for end states based on its human masters’ identified intrinsic preferences. That AI would fail to use available compute to determine whether a chosen instrumental action reinforces a process through which ‘stuff’ contingently gets flagged in human attention, expressed to the AI, received as inputs, and derived as ‘stable preferences’.