Entropy production partially solves the Strawberry Problem:
Change in entropy production per second (against the counterfactual of not acting) is potentially an objectively measurable quantity that can be used either in conjunction with other parameters specifying a goal to prevent unexpected behaviour.
How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?
I understand the crux of this issue to be that it is exceptionally difficult for humans to construct a finite list of caveats or safety guardrails that we can be confident would withstand the optimisation pressure of a super intelligence doing its best to solve this task “optimally”. Without care, any measure chosen is Goodharted into uselessness and the most likely outcome is extinction.
Specifying that the predicted change in entropy production per second of the local region must remain within some δ of the counterfactual in which the AGI does not act at all automatically excludes almost all unexpected strategies that involves high levels of optimisation.
I conjecture that the entropy production “budget” needed for an agent to perform economically useful tasks is well below the amount needed to cause an existential disaster.
If we view the core of life as increasing rather than decreasing entropy, then entropy-production may be a reasonable candidate for putting quantitative order to the causal backbone. But bounded agency is less about minimizing impact and more about propagating the free energy of the causal backbone into new entropy-producing channels.
Reading your posts gives me the impression that we are both loosely pointing at the same object, but with fairly large differences in terminology and formalism.
While computing exact counter-factuals has issues with chaos, I don’t think this poses a problem for my earlier proposal. I don’t think it is necessary that the AGI is able to exactly compute the counterfactual entropy production, just that it makes a reasonably accurate approximation.[1]
I think I’m in agreement with your premise that the “constitutionalist form of agency” is flawed. IThe absence of entropy (or indeed any internal physical resource management) from the canonical Lesswrong agent foundation model is clearly a major issue. My loose thinking on this is that bayesian networks are not a natural description of the physical world at all, although they’re an appropriate tool for how certain, very special types of open-systems, “agentic optimizers” model the world.
I have had similar thoughts to what has motivated your post on the “causal backbone”. I believe “the heterogenous fluctuations will sometimes lead to massive shifts in how the resources are distributed” is something we would see in a programmable, unbounded optimizer[2]. But I’m not sure if attempting to model this as there being a “causal backbone” is the description that is going to cut reality at the joints, due to difficulties with the physicality of causality itself (see work by Jenann Ismael).
You can construct pathological environments in which the AGI’s computation (with limited physical resources) of the counterfactual entropy production is arbitrarily large (and the resulting behaviour is arbitrarily bad). I don’t see this as a flaw with the proposal as this issue of being able to construct pathological environments exists for any safe AGI proposal.
Entropy production partially solves the Strawberry Problem:
Change in entropy production per second (against the counterfactual of not acting) is potentially an objectively measurable quantity that can be used either in conjunction with other parameters specifying a goal to prevent unexpected behaviour.
Rob Bensinger gives Yudkowsky’s “Strawberry Problem” as follows:
How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?
I understand the crux of this issue to be that it is exceptionally difficult for humans to construct a finite list of caveats or safety guardrails that we can be confident would withstand the optimisation pressure of a super intelligence doing its best to solve this task “optimally”. Without care, any measure chosen is Goodharted into uselessness and the most likely outcome is extinction.
Specifying that the predicted change in entropy production per second of the local region must remain within some δ of the counterfactual in which the AGI does not act at all automatically excludes almost all unexpected strategies that involves high levels of optimisation.
I conjecture that the entropy production “budget” needed for an agent to perform economically useful tasks is well below the amount needed to cause an existential disaster.
Another application, directly monitoring the entropy production of an agent engaged in a generalised search upper bounds the number of iterations of that search (and hence the optimisation pressure). This bound appears to be independent of the technological implementation of the search. [1]
On a less optimistic note, this bound is many orders of magnitude above the efficiency of today’s computers.
There’s a billion reasonable-seeming impact metrics, but the main challenge of counterfactual-based impact is always how you handle chaos. I’m pretty sure the solution is to go away from counterfactuals as they represent a pathologically computationalist form of agency, and instead learn the causal backbone.
If we view the core of life as increasing rather than decreasing entropy, then entropy-production may be a reasonable candidate for putting quantitative order to the causal backbone. But bounded agency is less about minimizing impact and more about propagating the free energy of the causal backbone into new entropy-producing channels.
Reading your posts gives me the impression that we are both loosely pointing at the same object, but with fairly large differences in terminology and formalism.
While computing exact counter-factuals has issues with chaos, I don’t think this poses a problem for my earlier proposal. I don’t think it is necessary that the AGI is able to exactly compute the counterfactual entropy production, just that it makes a reasonably accurate approximation.[1]
I think I’m in agreement with your premise that the “constitutionalist form of agency” is flawed. IThe absence of entropy (or indeed any internal physical resource management) from the canonical Lesswrong agent foundation model is clearly a major issue. My loose thinking on this is that bayesian networks are not a natural description of the physical world at all, although they’re an appropriate tool for how certain, very special types of open-systems, “agentic optimizers” model the world.
I have had similar thoughts to what has motivated your post on the “causal backbone”. I believe “the heterogenous fluctuations will sometimes lead to massive shifts in how the resources are distributed” is something we would see in a programmable, unbounded optimizer[2]. But I’m not sure if attempting to model this as there being a “causal backbone” is the description that is going to cut reality at the joints, due to difficulties with the physicality of causality itself (see work by Jenann Ismael).
You can construct pathological environments in which the AGI’s computation (with limited physical resources) of the counterfactual entropy production is arbitrarily large (and the resulting behaviour is arbitrarily bad). I don’t see this as a flaw with the proposal as this issue of being able to construct pathological environments exists for any safe AGI proposal.
Ctrl-F”Goal like correlations” here