If you need to represent some computationally intractable object, there are many tricks available to approximate such an object in a computationally efficient manner. E.g., one can split the intractable object into modular factors, then use only those factors which are most relevant to the current situation. My guess is that this is exactly what values are: modular, tractable factors that let us efficiently approximate a computationally intractable utility function.
If you actually optimize according to an approximation, that’s going to goodhart curse the outcome. Any approximation must only be soft-optimized for, not EU maximized. A design that seeks EU maximization, and hopes for soft optimization that doesn’t go too far, doesn’t pass the omnipotence test.
Also, an approximation worth even soft-optimizing for should be found in a value-laden way, losing inessential details and not something highly value-relevant. Approximate knowledge of values helps with finding better approximations to values.
If you need to represent some computationally intractable object, there are many tricks available to approximate such an object in a computationally efficient manner. E.g., one can split the intractable object into modular factors, then use only those factors which are most relevant to the current situation. My guess is that this is exactly what values are: modular, tractable factors that let us efficiently approximate a computationally intractable utility function.
If you actually optimize according to an approximation, that’s going to goodhart curse the outcome. Any approximation must only be soft-optimized for, not EU maximized. A design that seeks EU maximization, and hopes for soft optimization that doesn’t go too far, doesn’t pass the omnipotence test.
Also, an approximation worth even soft-optimizing for should be found in a value-laden way, losing inessential details and not something highly value-relevant. Approximate knowledge of values helps with finding better approximations to values.