Tagging all text with the date it was produced is very important for other reasons as well. (And as far as I know, it isn’t done at present.)
With such date tags, one could ask GPT to produce text for a specific date. For example, one could ask it to continue the prompt “I think people of different races should” with dates of 1880, 1920, 1960, and 2000 to see how racial attitudes have changed over the years, which is very interesting historical information. It also would allow one to force a 2023 date for normal responses, to avoid unwanted regurgitation of attitudes from 1880. That in turn would allow one to use training text from 1880 without worrying about such attitudes polluting normal interactions.
It has only been done by “Time-Aware Language Models as Temporal Knowledge Bases”, Dhingra et al 2021 (and done mostly in passing by earlier experiments in providing metadata prefixes like CTRL). Temporal reasoning scales, unsurprisingly: even without explicit metadata, which would be extremely hard to get reliably for most cases (eg Common Crawl—dating random web pages at scale? gl), there tend to be lots of implicit clues in text, such as the URL structure of news articles (CTRL gives the example of https://www.cnn.com/style/09/20/2018/george-clooney-interview), and this probably serves as scaffolding for helping understand the internal evidence of undated text. You can already prompt a model like GPT-3 with dates already, so you wouldn’t be creating any qualitatively new capabilities.
So including more metadata (of every kind, not just dating) is a good idea, but not necessary and may be a bad use of expert human labor: probably it’d be so cheap that it’s worth hand-engineering in for clean sources like Wikipedia or Twitter or Reddit or academic datasets where you can be sure of the date easily, but then less so for the bulk of the dataset coming from Common Crawl etc.
Oh! Pretty cool, I hadn’t thought of that effect. Another consequence of tagging all text with the date that seems particularly interesting to me is that it allows us to query GPT on its beliefs about the future in a more direct way. Say you want to know if GPT believes that a war would break out in France by 2040, we could ask GPT to give us the likelihood for the text “France enters war!” tagged with “journal: New York Times; Date: 2040″. We could see how the likelihood for that phrase changes with different tagged dates in order to see what GPT believes. We can repeat this with any sort of headlines we want. We only need that GPT believes that the New York Times is a relatively accurate source of facts about the real world for this to work.
A further trick is that we can ask GPT about how its own outputs will affect the world. Suppose we ask GPT to produce that “Fusion Plant Design” textbook I mentioned in the first comment. We can then take the textbook it outputs, change its date tag to 2023, introduce it into GPT’s training set and take a small gradient step with it, this essentially makes GPT “aware” that the textbook now exists in the world, as if it was released publicly. We then ask this updated model about the likelihood of future war in France through the same way as above. In effect this allows us to answer the question “Does GPT think that releasing this textbook will increase or decrease the likelihood of war in France by 2040?”. It would be hopeless to directly ask it that question, because no human could possibly know the answer to that, so GPT won’t give it directly, but we can still tease it out if we use date-tagged data like this.
Tagging all text with the date it was produced is very important for other reasons as well. (And as far as I know, it isn’t done at present.)
With such date tags, one could ask GPT to produce text for a specific date. For example, one could ask it to continue the prompt “I think people of different races should” with dates of 1880, 1920, 1960, and 2000 to see how racial attitudes have changed over the years, which is very interesting historical information. It also would allow one to force a 2023 date for normal responses, to avoid unwanted regurgitation of attitudes from 1880. That in turn would allow one to use training text from 1880 without worrying about such attitudes polluting normal interactions.
It has only been done by “Time-Aware Language Models as Temporal Knowledge Bases”, Dhingra et al 2021 (and done mostly in passing by earlier experiments in providing metadata prefixes like CTRL). Temporal reasoning scales, unsurprisingly: even without explicit metadata, which would be extremely hard to get reliably for most cases (eg Common Crawl—dating random web pages at scale? gl), there tend to be lots of implicit clues in text, such as the URL structure of news articles (CTRL gives the example of
https://www.cnn.com/style/09/20/2018/george-clooney-interview
), and this probably serves as scaffolding for helping understand the internal evidence of undated text. You can already prompt a model like GPT-3 with dates already, so you wouldn’t be creating any qualitatively new capabilities.So including more metadata (of every kind, not just dating) is a good idea, but not necessary and may be a bad use of expert human labor: probably it’d be so cheap that it’s worth hand-engineering in for clean sources like Wikipedia or Twitter or Reddit or academic datasets where you can be sure of the date easily, but then less so for the bulk of the dataset coming from Common Crawl etc.
Oh! Pretty cool, I hadn’t thought of that effect. Another consequence of tagging all text with the date that seems particularly interesting to me is that it allows us to query GPT on its beliefs about the future in a more direct way. Say you want to know if GPT believes that a war would break out in France by 2040, we could ask GPT to give us the likelihood for the text “France enters war!” tagged with “journal: New York Times; Date: 2040″. We could see how the likelihood for that phrase changes with different tagged dates in order to see what GPT believes. We can repeat this with any sort of headlines we want. We only need that GPT believes that the New York Times is a relatively accurate source of facts about the real world for this to work.
A further trick is that we can ask GPT about how its own outputs will affect the world. Suppose we ask GPT to produce that “Fusion Plant Design” textbook I mentioned in the first comment. We can then take the textbook it outputs, change its date tag to 2023, introduce it into GPT’s training set and take a small gradient step with it, this essentially makes GPT “aware” that the textbook now exists in the world, as if it was released publicly. We then ask this updated model about the likelihood of future war in France through the same way as above. In effect this allows us to answer the question “Does GPT think that releasing this textbook will increase or decrease the likelihood of war in France by 2040?”. It would be hopeless to directly ask it that question, because no human could possibly know the answer to that, so GPT won’t give it directly, but we can still tease it out if we use date-tagged data like this.