I don’t believe that this methodology actually provides meaningful evidence for their claims. To quote the paper, which IMO is still talking down the problem:
We caution that changes in meaning or semantic shift of the CDS n-grams may potentially bias our results. …
the choice of CDS n-grams could lead to a “recency bias” in our results, explaining their rise in prevalence in recent decades. [their ‘control’ for this is IMO irrelevant]
We caution that although the Google Books data have been widely used to assess cultural and linguistic shifts, and they are one of the largest records of historical literature, it remains uncertain whether CDS prevalence truly reflects changes in societal language and societal wellbeing. Many books included in the Google Books sample were published at times or locations marked by reduced freedom of expression, widespread propaganda, social stigma, and cultural as well as socioeconomic inequities that may reduce access to the literature, potentially reducing its ability to reflect societal changes.
Note that the n-grams from (17) are in a 2020 paper on Twitter, which is a rather different corpus to published books! From that one:
we relied on individuals reporting their personal clinical depression diagnoses on social media [and] recommend caution when generalizing our findings to the level of all individuals who have depression. … Our lexicon of CDS was composed and approved by a panel of ten experts who may have been only partially successful in capturing all of the n-grams used to express distorted ways of thinking. On a related note, the use of CDS n-grams implies that we measure distorted thinking by proxy, namely through language, and our observations may be therefore be affected by linguistic and cultural factors. Common idiosyncratic or idiomatic expressions may syntactically represent a distorted form of thinking, but no longer do so in practice.
We emphasize that not all use of CDS n-grams reflects depressive thinking, as these phrases are part of normal English usage, and it would therefore be wrong to try to diagnose depression merely on the basis of use of one or more such phrases.
I don’t believe that this methodology actually provides meaningful evidence for their claims. To quote the paper, which IMO is still talking down the problem:
Note that the n-grams from (17) are in a 2020 paper on Twitter, which is a rather different corpus to published books! From that one: