Jozdien comments on Insufficient Values

Jozdien 17 Jun 2021 12:47 UTC
3 points
It’s only once we pick a specific method of implementation that we have to confront in mechanistic detail what we could previously hide under the abstraction of anthropomorphic agency.
I agree. I was trying to think of possible implementation methods, throwing out various constraints like computing power or competitiveness as it became harder to find any, and the final sticking point was still Goodhart’s Law. For the most part, I kept it in to give an example to the difficulty of meta-alignment (corrigibility in favourable directions).