Thanks for this comment! I think it makes some sense (but would have been easier to read given meaningful variable names).
Bob’s alignment strategy is that he wants X = X1 = Y = Y1 = Z = Z1. Also he wants the end result to be an agent whose good behaviours (Z) are in fact maximising a utility function at all (in this case, Z1).
I either don’t understand the semantics of “=” here, or I disagree. Bob’s strategy doesn’t make sense because X and Z have type behavior, X1 and Z1 have type utility function, Y is some abstract reward function over some mathematical domain, Y1 is an empirical set of reinforcement events.
It still seems to me like there is an error being made, such that Bob and Carol aren’t just trying to do different things or using different terminology, but that also Bob’s alignment strategy isn’t type-sensible or -coherent.
Thanks for this comment! I think it makes some sense (but would have been easier to read given meaningful variable names).
I either don’t understand the semantics of “=” here, or I disagree. Bob’s strategy doesn’t make sense because X and Z have type
behavior
, X1 and Z1 have typeutility function
, Y is some abstract reward function over some mathematical domain, Y1 is an empirical set of reinforcement events.It still seems to me like there is an error being made, such that Bob and Carol aren’t just trying to do different things or using different terminology, but that also Bob’s alignment strategy isn’t type-sensible or -coherent.