It’s only once we pick a specific method of implementation that we have to confront in mechanistic detail what we could previously hide under the abstraction of anthropomorphic agency.
I agree. I was trying to think of possible implementation methods, throwing out various constraints like computing power or competitiveness as it became harder to find any, and the final sticking point was still Goodhart’s Law. For the most part, I kept it in to give an example to the difficulty of meta-alignment (corrigibility in favourable directions).
I agree. I was trying to think of possible implementation methods, throwing out various constraints like computing power or competitiveness as it became harder to find any, and the final sticking point was still Goodhart’s Law. For the most part, I kept it in to give an example to the difficulty of meta-alignment (corrigibility in favourable directions).