In addition to translation (which I do think is a useful problem for theoretical experiments), I would recommend question answering as something which gets at ‘thoughts’ rather than distractors like ‘linguistic style’. I don’t think multiple choice question answering is all that great a measure for some things, but it is a cleaner measure of the correctness of the underlying thoughts.
I agree that abstracting away from things like choice of grammar/punctuation or which synonym to use is important to keeping the research question clean.
In addition to translation (which I do think is a useful problem for theoretical experiments), I would recommend question answering as something which gets at ‘thoughts’ rather than distractors like ‘linguistic style’. I don’t think multiple choice question answering is all that great a measure for some things, but it is a cleaner measure of the correctness of the underlying thoughts.
I agree that abstracting away from things like choice of grammar/punctuation or which synonym to use is important to keeping the research question clean.