algon33 comments on Comparing reward learning/​reward tampering formalisms