None of that is wrong, but it misses the main issue with corrigibility, which is that the approximation resists further refinement. That’s why for it to work, the correct utility function would need to start in the ensemble.
None of that is wrong, but it misses the main issue with corrigibility, which is that the approximation resists further refinement. That’s why for it to work, the correct utility function would need to start in the ensemble.