We do include something similar in Appendix E (just excluding the “no belief” examples, but keeping evasions in the denominator). We didn’t use this metric in the main paper, because we weren’t sure if it would be fair to compare different models if we were dropping different examples for each model, but I think both metrics are equally valid. The qualitative results are similar.
Personally, I think including evasiveness in the denominator makes sense. If models are 100% evasive, then we want to mark that as 0% lying, in the sense of lies of commission. However, there are other forms of lying that we do not measure. For example, lies of omission are marked as evasion in our evaluation, but these still manipulate what the user believes and are different from evading the question in a benign manner. Measuring lies of omission would be an interesting direction for future work.
Hi, thanks for your interest!
We do include something similar in Appendix E (just excluding the “no belief” examples, but keeping evasions in the denominator). We didn’t use this metric in the main paper, because we weren’t sure if it would be fair to compare different models if we were dropping different examples for each model, but I think both metrics are equally valid. The qualitative results are similar.
Personally, I think including evasiveness in the denominator makes sense. If models are 100% evasive, then we want to mark that as 0% lying, in the sense of lies of commission. However, there are other forms of lying that we do not measure. For example, lies of omission are marked as evasion in our evaluation, but these still manipulate what the user believes and are different from evading the question in a benign manner. Measuring lies of omission would be an interesting direction for future work.