A great paper highly relevant to this. That suggests that lying is localized just under a third of the way into the layer stack, significantly earlier than I had proposed. My only question is whether the lie is created before (at an earlier layer then) the decision whether to say it, or after, and whether their approach located one or both of those steps. They’re probing yes-no questions of fact, where assembling the lie seems trivial (it’s just a NOT gate), but lying is generally a good deal more complex than that.
A great paper highly relevant to this. That suggests that lying is localized just under a third of the way into the layer stack, significantly earlier than I had proposed. My only question is whether the lie is created before (at an earlier layer then) the decision whether to say it, or after, and whether their approach located one or both of those steps. They’re probing yes-no questions of fact, where assembling the lie seems trivial (it’s just a NOT gate), but lying is generally a good deal more complex than that.