Seth Herd answers When is reward ever the optimization target?