We were surprised to find a decrease in publications on the arXiv in recent years, but identified the cause for the decrease as spurious and fixed the issue in the published dataset (details in Fig. 4).
I’d be interested in hearing more about how the decrease was determined to be spurious; I looked at Fig. 4 but am not understanding how that decision was made based on the figure, if that was the intention.
Thanks for the question! When we initially scraped the dataset, we looked at the dates in figure 1.a. and there was a decrease in papers after 2020 since much of the Alignment literature lists we grabbed papers from were made in 2020 or earlier and had not been updated. This led to a perceived decline in papers based on figure 1.a. However, this seemed obviously due to not including all the newer papers that had came out in 2020 and later. So, once we scraped a wider set of papers using arXiv’s API, you could see the uptick in papers in 2020 and beyond (figure 4.e) where there was previously a decrease (figure 1.a).
I’d be interested in hearing more about how the decrease was determined to be spurious; I looked at Fig. 4 but am not understanding how that decision was made based on the figure, if that was the intention.
Thanks for the question! When we initially scraped the dataset, we looked at the dates in figure 1.a. and there was a decrease in papers after 2020 since much of the Alignment literature lists we grabbed papers from were made in 2020 or earlier and had not been updated. This led to a perceived decline in papers based on figure 1.a. However, this seemed obviously due to not including all the newer papers that had came out in 2020 and later. So, once we scraped a wider set of papers using arXiv’s API, you could see the uptick in papers in 2020 and beyond (figure 4.e) where there was previously a decrease (figure 1.a).