JNS comments on johnswentworth’s Shortform

JNS 16 Jun 2023 6:40 UTC
3 points
0
Completely off the cuff take:
I don’t think claim 1 is wrong, but it does clash with claim 2.
That means any system that has to be corrigible cannot be a system that maximizes a simple utility function (1 dimension), or put another way “whatever utility function is maximizes must be along multiple dimensions”.
Which seems to be pretty much what humans do, we have really complex utility functions, and everything seems to be ever changing and we have some control over it ourselves (and sometimes that goes wrong and people end up maxing out a singular dimension at the cost of everything else).
Note to self: Think more about this and if possible write up something more coherent and explanatory.