The tagging work that’s already been done need not be a waste, because you can essentially use it as training data for the kind of tags you’d like an automated system to discover and assign. For example, tweak the hyperparameters of the topic modeling system until it is really good at independently rediscovering/reassigning the tags that have already been manually assigned.
An advantage of the automated approach is that you should be able to reapply it to some other document corpus—for example, autogenerate tags for the EA Forum, or all AI alignment related papers/discussion off LW, or the entire AI literature in order to help with/substitute for this job https://intelligence.org/2017/12/12/ml-living-library/ (especially if you can get some kind of hierarchical tagging to work)
I’ve actually spent a while thinking about this sort of problem and I’m happy to video call and chat more if you want.
I just did it again, asking it to extend the list of all the AI tags.
I expect you’d get better results by using older, less hyped NLP techniques that are designed for this sort of thing:
https://stackoverflow.com/questions/15377290/unsupervised-automatic-tagging-algorithms
The tagging work that’s already been done need not be a waste, because you can essentially use it as training data for the kind of tags you’d like an automated system to discover and assign. For example, tweak the hyperparameters of the topic modeling system until it is really good at independently rediscovering/reassigning the tags that have already been manually assigned.
An advantage of the automated approach is that you should be able to reapply it to some other document corpus—for example, autogenerate tags for the EA Forum, or all AI alignment related papers/discussion off LW, or the entire AI literature in order to help with/substitute for this job https://intelligence.org/2017/12/12/ml-living-library/ (especially if you can get some kind of hierarchical tagging to work)
I’ve actually spent a while thinking about this sort of problem and I’m happy to video call and chat more if you want.