This seems in danger of being a “sponge alignment” proposal, i.e. the proposed system doesn’t do anything useful. https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities#:~:text=sponge
This current version is dumb, but still exerts some optimization pressure. (Just the bits of optimization out are at most the bits of selection put into its utility function.)
It could be a conceptual ingredient to something useful. For example, it can select between two plans.
I agree.
This seems in danger of being a “sponge alignment” proposal, i.e. the proposed system doesn’t do anything useful. https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities#:~:text=sponge
This current version is dumb, but still exerts some optimization pressure. (Just the bits of optimization out are at most the bits of selection put into its utility function.)
It could be a conceptual ingredient to something useful. For example, it can select between two plans.
I agree.