gwern comments on When is reward ever the optimization target?