This repository seems to contain the source code of a bot responsible for updating the “Hall of Counters” in the About section of the r/counting community on Reddit. I don’t participate in the community, but from what I can gather, this list seems to be a leaderboard for the community’s most active members. A number of these anomalous tokens still persist on the present-day version of the list.
I did do a little research around that community before posting my comment; only later did I realise that I’d actually discovered a distinct failure mode to those in the original post: under some circumstances, ChatGPT interprets the usernames as numbers. In particular this could be due to the /r/counting subreddit being a place where people make many posts incrementing integers. So these username tokens, if encountered in a Reddit-derived dataset, might be being interpreted as numbers themselves, since they’d almost always be contextually surrounded by actual numbers.
This repository seems to contain the source code of a bot responsible for updating the “Hall of Counters” in the About section of the r/counting community on Reddit. I don’t participate in the community, but from what I can gather, this list seems to be a leaderboard for the community’s most active members. A number of these anomalous tokens still persist on the present-day version of the list.
I did do a little research around that community before posting my comment; only later did I realise that I’d actually discovered a distinct failure mode to those in the original post: under some circumstances, ChatGPT interprets the usernames as numbers. In particular this could be due to the /r/counting subreddit being a place where people make many posts incrementing integers. So these username tokens, if encountered in a Reddit-derived dataset, might be being interpreted as numbers themselves, since they’d almost always be contextually surrounded by actual numbers.