Preliminary Thoughts on Quantifying Optimisation: “Work”
I think there’s a concept of “work” done by an optimisation process in navigating a configuration space from one macrostate to another macrostate of a lower measure (where perhaps the measure is the maximum value any configuration in that macrostate obtains for the objective function [taking the canonical view of minimising the objective function]).
The unit of this “work” is measured in bits, and the work is calculated as the difference in entropy between the source macrostate and the destination macrostate.
To simplify the calculation, I currently operationalise the “macrostate” of a given configuration as all configurations that are “at least as good” according to the objective function(s) [obtain (a) value(s) for the objective function(s) that’s less than or equal to the value(s) obtained by the configuration in question].
Motivations For My Conception of Macrostates
Greatly facilitates the calculation of work done by optimisation
A desiderata for a sensible calculation of work done by optimisation at all to be possible is that if one configuration state (x) is a pareto improvement over another configuration state (x′)∀f∈F(f(x)≤f(x′))∧∃f′∈F(f(x)<f(x′)), then the work done navigating to x should be greater than the work done navigating to x′ (assuming both x and x′ are pareto improvements over the source configuration)
Given that I’ve defined an optimisation process as a map between macrostates, and am quantifying the work done by optimisation as the difference in entropy between the source macrostate and the destination macrostate, then the probability measure of the macrostate x belongs to should be lower than the probability measure of the macrostate x′ belongs to.
I.e. P(m(x))<P(m(x′))
Where m:X→2X is a map from a configuration to its associated macrostate
The above desiderata is straightforwardly satisfied by my current operationalisation of macrostates
Provides an ordering over macrostates according to how “good” they are
The smaller the probability measure of a given macrostate, the better it is
If there is only one objective function, then the ordering is total
I find it intuitively very compelling
I really care about any other properties that distinct configurations share when describing the macrostate they belong to
The only property relevant for quantifying the work done by optimisation is how low a value they attain for the objective function(s) the optimisation process minimises
Insomuch as it makes sense to take an intentional stance towards an optimisation process, the macrostate of a given configuration is just the union of that configuration and the set of all configurations it would be willing to switch to
Context
This is very similar to Yudkowsky’s “Measuring Optimisation Power”, but somewhat different in a few important ways.
My main conceptual contributions are:
Tying the notion of work done by an optimisation process to a particular optimisation space
A (configuration space, event space, probability measure and collection of objective functions) 4-tuple
I.e. the work done cannot be quantified without specifying the above
I do not conceive of a general measure of optimisation that is independent of the particular space being considered or which is intrinsic to the optimisation process itself; optimisation can only be quantified relative to a particular optimisation space
I do not assume a uniform probability distribution over configuration space, but require that a probability measure be specified as part of the definition of an optimisation space
This is useful if multiple optimisation processes are acting on a given configuration space, but we are only interested in quantifying the effect of a subset of them
E.g. the effect of optimisation processes that are artificial in origin, or the effect of optimisation that arises from contributions from artificial intelligences (screening off human researchers)
The effects of optimisation processes that we want to screen off can be assimilated into the probability measure that defines the optimisation space
Conceiving of the quantity being measured as the “work” done by an optimisation process, not its “power”
I claim that the concept we’re measuring is most analogous to the classical physics notion of work: “force applied on an object x distance the object moved in the direction of force”
Where the force here is the force exerted by the optimisation process, and the distance the object moved is the difference in entropy between the source and destination macrostates
Building a rigorous (but incomplete) model to facilitate actual calculations of work done by optimisation.
I’m still refining and improving the model, but it seems important to have an actual formal model in which I can perform actual calculations to probe and test my intuitions
Previously: Motivations for the Definition of an Optimisation Space
Preliminary Thoughts on Quantifying Optimisation: “Work”
I think there’s a concept of “work” done by an optimisation process in navigating a configuration space from one macrostate to another macrostate of a lower measure (where perhaps the measure is the maximum value any configuration in that macrostate obtains for the objective function [taking the canonical view of minimising the objective function]).
The unit of this “work” is measured in bits, and the work is calculated as the difference in entropy between the source macrostate and the destination macrostate.
To simplify the calculation, I currently operationalise the “macrostate” of a given configuration as all configurations that are “at least as good” according to the objective function(s) [obtain (a) value(s) for the objective function(s) that’s less than or equal to the value(s) obtained by the configuration in question].
Motivations For My Conception of Macrostates
Greatly facilitates the calculation of work done by optimisation
A desiderata for a sensible calculation of work done by optimisation at all to be possible is that if one configuration state (x) is a pareto improvement over another configuration state (x′) ∀f∈F(f(x)≤f(x′))∧∃f′∈F(f(x)<f(x′)), then the work done navigating to x should be greater than the work done navigating to x′ (assuming both x and x′ are pareto improvements over the source configuration)
Given that I’ve defined an optimisation process as a map between macrostates, and am quantifying the work done by optimisation as the difference in entropy between the source macrostate and the destination macrostate, then the probability measure of the macrostate x belongs to should be lower than the probability measure of the macrostate x′ belongs to.
I.e. P(m(x))<P(m(x′))
Where m:X→2X is a map from a configuration to its associated macrostate
The above desiderata is straightforwardly satisfied by my current operationalisation of macrostates
Provides an ordering over macrostates according to how “good” they are
The smaller the probability measure of a given macrostate, the better it is
If there is only one objective function, then the ordering is total
I find it intuitively very compelling
I really care about any other properties that distinct configurations share when describing the macrostate they belong to
The only property relevant for quantifying the work done by optimisation is how low a value they attain for the objective function(s) the optimisation process minimises
Insomuch as it makes sense to take an intentional stance towards an optimisation process, the macrostate of a given configuration is just the union of that configuration and the set of all configurations it would be willing to switch to
Context
This is very similar to Yudkowsky’s “Measuring Optimisation Power”, but somewhat different in a few important ways.
My main conceptual contributions are:
Tying the notion of work done by an optimisation process to a particular optimisation space
A (configuration space, event space, probability measure and collection of objective functions) 4-tuple
I.e. the work done cannot be quantified without specifying the above
I do not conceive of a general measure of optimisation that is independent of the particular space being considered or which is intrinsic to the optimisation process itself; optimisation can only be quantified relative to a particular optimisation space
I do not assume a uniform probability distribution over configuration space, but require that a probability measure be specified as part of the definition of an optimisation space
This is useful if multiple optimisation processes are acting on a given configuration space, but we are only interested in quantifying the effect of a subset of them
E.g. the effect of optimisation processes that are artificial in origin, or the effect of optimisation that arises from contributions from artificial intelligences (screening off human researchers)
The effects of optimisation processes that we want to screen off can be assimilated into the probability measure that defines the optimisation space
Conceiving of the quantity being measured as the “work” done by an optimisation process, not its “power”
I claim that the concept we’re measuring is most analogous to the classical physics notion of work: “force applied on an object x distance the object moved in the direction of force”
Where the force here is the force exerted by the optimisation process, and the distance the object moved is the difference in entropy between the source and destination macrostates
It is not at all analogous to the classical physics notion of power
That would be something like the ratio of work done to a relevant resource being expended [e.g. time with a unit of bits/second])
I think the efficiency of optimisation energy along other dimensions is valuable (i.e. energy: bits/joule, compute: bits/FLOP, etc.)
Conceiving of the quantity being measures as “work” instead of power seems to dissolve the objections raised by Stuart Armstrong
Building a rigorous (but incomplete) model to facilitate actual calculations of work done by optimisation.
I’m still refining and improving the model, but it seems important to have an actual formal model in which I can perform actual calculations to probe and test my intuitions