For m:S such that m is a mesa=optimizer let Σm be the space it optimizes over, and gm:Σm→R be its utility function .
I know you said “which we need not notate”, but I am going to say that for s:S and x:X , that s(x):A , and A is the space of actions (or possibly, s(x):Ax and Ax is the space of actions available in the situation x ) (Though maybe you just meant that we need note notate separately from s, the map from X to A which s defines. In which case, I agree, and as such I’m writing s(x):A instead of saying that something belongs to the function space X→A . )
For m:S to have its optimization over Σm have any relevance, there has to be some connection between the chosen σ:Σm (chosen by m) , and m(x) .
So, the process by which m produces m(x) when given x, should involve the selected σ . Moreover, the selection of the σ ought to depend on x in some way, as otherwise the choice of σ is constant each time, and can be regarded as just a constant value in how m functions.
So, it seems that what I said was gm:Σm→R should instead be either gm:X×Σm→R , or gm,x:Σm,x→R (in the latter case I suppose one might say gm:∑x:X(Σm,x)→R )
Call the process that produces the action m(x):A using the choice of σ:Σm by the name hm:X×Σm→A (or more generally, hm:∑x:X(Σm,x)→A ) .
hm is allowed to also use randomness in addition to x and σ . I’m not assuming that it is a deterministic function. Though come to think of it, I’m not sure why it would need to be non-deterministic? Oh well, regardless.
Presumably whatever f:S→R is being used to select s:S , depends primarily (though not necessarily exclusively) on what s(x) is for various values of x, or at least on something which indicates things about that, as f is supposed to be for selecting systems which take good actions?
Supposing that for the mesa-optimizer m that the inner optimization procedure (which I don’t have a symbol for) and the inner optimization goal (i.e. gm ) are separate enough, one could ask “what if we had m, except with gm replaced with g′m , and looked at how the outputs of hm(x,σ) and hm(x,σ′) differ, where σ and σ′ are respectively are selected (by m’s optimizer) by optimizing for the goals gm , and g′m respectively?”.
Supposing that we can isolate the part of how f(s) depends on s which is based on what s(x) is or tends to be for different values of x, then there would be a “how would f(m) differ if m used g′m instead of gm ?”. If g′m in place of gm would result in things which, according to how f works, would be better, then it seems like it would make sense to say that gm isn’t fully aligned with f ?
Of course, what I just described makes a number of assumptions which are questionable:
It assumes that there is a well-defined optimization procedure that m uses which is cleanly separable from the goal which it optimizes for
It assumes that how f depends on s can be cleanly separated into a part which depends on (the map in X→A which is induced by s ) and (the rest of the dependency on s)
The first of these is also connected to another potential flaw with what I said, which is, it seems to describe the alignment of the combination of (the optimizer m uses) along with gm , with f, rather than just the alignment of gm with f.
So, alternatively, one might say something about like, disregarding how the searching behaves and how it selects things that score well at the goal gm , and just compare how hm(x,σ) and hm(x,σ′) tend to compare when σ and σ′ are generic things which score well under gm and g′m respectively, rather than using the specific procedure that m uses to find something which scores well under gm , and this should also, I think, address the issue of m possibly not having a cleanly separable “how it optimizes for it” method that works for generic “what it optimizes for”.
The second issue, I suspect to not really be a big problem? If we are designing the outer-optimizer, then presumably we understand how it is evaluating things, and understand how that uses the choices of s(x):A for different x:X .
I may have substantially misunderstood your point?
Or, was your point that the original thing didn’t lay these things out plainly, and that it should have?
Ok, reading more carefully, I see you wrote
I can certainly imagine that it may be possible to add in details on a case-by-case basis or at least to restrict to a specific explicit class of base objectives and then explicitly define how to compare mesa-objectives to them.
and the other things right before and after that part, and so I guess something like “it wasn’t stated precisely enough for the cases it is meant to apply to / was presented as applying as a concept more generally than made sense as it was defined” was the point and which I had sorta missed it initially.
(I have no expertise in these matters; unless shown otherwise, assume that in this comment I don’t know what I’m talking about.)
For m:S such that m is a mesa=optimizer let Σm be the space it optimizes over, and gm:Σm→R be its utility function .
I know you said “which we need not notate”, but I am going to say that for s:S and x:X , that s(x):A , and A is the space of actions (or possibly, s(x):Ax and Ax is the space of actions available in the situation x )
(Though maybe you just meant that we need note notate separately from s, the map from X to A which s defines. In which case, I agree, and as such I’m writing s(x):A instead of saying that something belongs to the function space X→A . )
For m:S to have its optimization over Σm have any relevance, there has to be some connection between the chosen σ:Σm (chosen by m) , and m(x) .
So, the process by which m produces m(x) when given x, should involve the selected σ .
Moreover, the selection of the σ ought to depend on x in some way, as otherwise the choice of σ is constant each time, and can be regarded as just a constant value in how m functions.
So, it seems that what I said was gm:Σm→R should instead be either gm:X×Σm→R , or gm,x:Σm,x→R (in the latter case I suppose one might say gm:∑x:X(Σm,x)→R )
Call the process that produces the action m(x):A using the choice of σ:Σm by the name hm:X×Σm→A
(or more generally, hm:∑x:X(Σm,x)→A ) .
hm is allowed to also use randomness in addition to x and σ . I’m not assuming that it is a deterministic function. Though come to think of it, I’m not sure why it would need to be non-deterministic? Oh well, regardless.
Presumably whatever f:S→R is being used to select s:S , depends primarily (though not necessarily exclusively) on what s(x) is for various values of x, or at least on something which indicates things about that, as f is supposed to be for selecting systems which take good actions?
Supposing that for the mesa-optimizer m that the inner optimization procedure (which I don’t have a symbol for) and the inner optimization goal (i.e. gm ) are separate enough, one could ask “what if we had m, except with gm replaced with g′m , and looked at how the outputs of hm(x,σ) and hm(x,σ′) differ, where σ and σ′ are respectively are selected (by m’s optimizer) by optimizing for the goals gm , and g′m respectively?”.
Supposing that we can isolate the part of how f(s) depends on s which is based on what s(x) is or tends to be for different values of x, then there would be a “how would f(m) differ if m used g′m instead of gm ?”.
If g′m in place of gm would result in things which, according to how f works, would be better, then it seems like it would make sense to say that gm isn’t fully aligned with f ?
Of course, what I just described makes a number of assumptions which are questionable:
It assumes that there is a well-defined optimization procedure that m uses which is cleanly separable from the goal which it optimizes for
It assumes that how f depends on s can be cleanly separated into a part which depends on (the map in X→A which is induced by s ) and (the rest of the dependency on s)
The first of these is also connected to another potential flaw with what I said, which is, it seems to describe the alignment of the combination of (the optimizer m uses) along with gm , with f, rather than just the alignment of gm with f.
So, alternatively, one might say something about like, disregarding how the searching behaves and how it selects things that score well at the goal gm , and just compare how hm(x,σ) and hm(x,σ′) tend to compare when σ and σ′ are generic things which score well under gm and g′m respectively, rather than using the specific procedure that m uses to find something which scores well under gm , and this should also, I think, address the issue of m possibly not having a cleanly separable “how it optimizes for it” method that works for generic “what it optimizes for”.
The second issue, I suspect to not really be a big problem? If we are designing the outer-optimizer, then presumably we understand how it is evaluating things, and understand how that uses the choices of s(x):A for different x:X .
I may have substantially misunderstood your point?
Or, was your point that the original thing didn’t lay these things out plainly, and that it should have?
Ok, reading more carefully, I see you wrote
and the other things right before and after that part, and so I guess something like “it wasn’t stated precisely enough for the cases it is meant to apply to / was presented as applying as a concept more generally than made sense as it was defined” was the point and which I had sorta missed it initially.
(I have no expertise in these matters; unless shown otherwise, assume that in this comment I don’t know what I’m talking about.)