As I mentioned on Twitter, this sort of ‘truesight’ for writers extensively represented in Internet corpora like Robin Hanson, Zvi, or myself, is very unsurprising. Like those slides—there are not a lot of places other than Overcoming Bias in the 2000s that all of those topics are represented. (Hanson has been banging those drums for a long time.)
I gave it a few paragraphs from something I posted on Mastodon yesterday, and it identified me. I’m at least a couple of notches less internet-famous than Zvi or gwern, though again there’s a fair bit of my writing on the internet and my style is fairly distinctive. I’m quite impressed.
(I then tried an obvious thing and fed it a couple of Bitcoin-white-paper paragraphs, but of course it knew that they were “Satoshi Nakamoto” and wasn’t able to get past that. Someone sufficiently determined to identify Satoshi and with absurd resources could do worse than to train a big LLM on “everything except writings explicitly attributed to Satoshi Nakamoto” and then see what it thinks.)
As I mentioned on Twitter, this sort of ‘truesight’ for writers extensively represented in Internet corpora like Robin Hanson, Zvi, or myself, is very unsurprising. Like those slides—there are not a lot of places other than Overcoming Bias in the 2000s that all of those topics are represented. (Hanson has been banging those drums for a long time.)
I gave it a few paragraphs from something I posted on Mastodon yesterday, and it identified me. I’m at least a couple of notches less internet-famous than Zvi or gwern, though again there’s a fair bit of my writing on the internet and my style is fairly distinctive. I’m quite impressed.
(I then tried an obvious thing and fed it a couple of Bitcoin-white-paper paragraphs, but of course it knew that they were “Satoshi Nakamoto” and wasn’t able to get past that. Someone sufficiently determined to identify Satoshi and with absurd resources could do worse than to train a big LLM on “everything except writings explicitly attributed to Satoshi Nakamoto” and then see what it thinks.)
For Satoshi scenarios where you have a very small corpus or the corpus is otherwise problematic (in this case, you can’t easily get new Satoshi text heldout from training), you could do things like similarity/distance metrics: https://www.lesswrong.com/posts/dLg7CyeTE4pqbbcnp/language-models-model-us?commentId=MNk22rZeELjoh7bhW