I’ve always wanted to know more about how authorship attribution is done; is this, found with a quick search, a reasonable survey of current state of the art, or perhaps you’d recommend something else to read?
The Stamatatos survey you linked to will do fine. The basic story is “back in the day this stuff was really hard but some people tried anyway, then in 1964 Mosteller and Wallace published a landmark paper showing that you really could do impressive stuff, then along came computers and now we have a boatload of different algorithms, most of which work just great”. The funny thing about stylometry is that it is hard to get wrong. Count up anything you like (frequent words, infrequent words, character n-grams, whatever) and use any distance measurement you like and odds are you’ll get usable results. If you want to play around with this for yourself you can install stylo and turn it loose on a corpus of your choice. Gwern’s little experiment is also a good read.
My involvement with stylometry has not been to tweak the algorithms (they work just fine) but to apply them in some particular cases and to try to convince my fellow scholars that technological wizardry really can tell them things worth knowing.
Are your fields, and humanities in general, trying to move towards open publishing of academic papers, the way STEM fields have been trying to?
Yes. Essentially every scholar I know is in favor of this. As far as I can see, It will happen and is happening.
Do you plan to stay in academia or leave, and it the latter, for what kind of job?
I worked as an engineer for a few years but found I wasn’t that into it and really missed school. So I went back and I’d like to stay.
The Stamatatos survey you linked to will do fine. The basic story is “back in the day this stuff was really hard but some people tried anyway, then in 1964 Mosteller and Wallace published a landmark paper showing that you really could do impressive stuff, then along came computers and now we have a boatload of different algorithms, most of which work just great”. The funny thing about stylometry is that it is hard to get wrong. Count up anything you like (frequent words, infrequent words, character n-grams, whatever) and use any distance measurement you like and odds are you’ll get usable results. If you want to play around with this for yourself you can install stylo and turn it loose on a corpus of your choice. Gwern’s little experiment is also a good read.
My involvement with stylometry has not been to tweak the algorithms (they work just fine) but to apply them in some particular cases and to try to convince my fellow scholars that technological wizardry really can tell them things worth knowing.
Yes. Essentially every scholar I know is in favor of this. As far as I can see, It will happen and is happening.
I worked as an engineer for a few years but found I wasn’t that into it and really missed school. So I went back and I’d like to stay.