My point was that there’s no reason that SGD will create specifically “deceptive logic” because “deceptive logic” is not privileged over any other logic that involves modeling the base objective and acting according to it. But I now think this isn’t always true—see the edit block I just added.
My point was that there’s no reason that SGD will create specifically “deceptive logic” because “deceptive logic” is not privileged over any other logic that involves modeling the base objective and acting according to it. But I now think this isn’t always true—see the edit block I just added.