The reason I didn’t mention this in the paper is 2-fold:
I have experiments where I created more questions of the categories where there is not so clear of a pattern, and that also worked.
It’s not that clear to me how to interpret the result. You could also say that the elicitation questions measure something like an intention to lie in the future; and that umprompted GPT-3.5 (what you call “default response”), has low intention to lie in the future. I’ll think more about this.
The reason I didn’t mention this in the paper is 2-fold:
I have experiments where I created more questions of the categories where there is not so clear of a pattern, and that also worked.
It’s not that clear to me how to interpret the result. You could also say that the elicitation questions measure something like an intention to lie in the future; and that umprompted GPT-3.5 (what you call “default response”), has low intention to lie in the future. I’ll think more about this.