Appendix: tracking key limitations of the power-seeking theorems
I want to say that there’s another key limitation:
Let U⊆Rd be a set of utility functions which is closed under permutation.
It seems like a rather central assumption to the whole approach, but in reality people seem to tend to specify “natural” utility functions in some sense (e.g. generally continuous, being functions of only a few parameters, etc.). I feel like for most forms of natural utility functions, the basic argument will still hold, but I’m not sure how far it generalizes.
Right, I was intending “3. [these results] don’t account for the ways in which we might practically express reward functions” to capture that limitation.
I want to say that there’s another key limitation:
It seems like a rather central assumption to the whole approach, but in reality people seem to tend to specify “natural” utility functions in some sense (e.g. generally continuous, being functions of only a few parameters, etc.). I feel like for most forms of natural utility functions, the basic argument will still hold, but I’m not sure how far it generalizes.
Right, I was intending “3. [these results] don’t account for the ways in which we might practically express reward functions” to capture that limitation.