I gave it a few paragraphs from something I posted on Mastodon yesterday, and it identified me. I’m at least a couple of notches less internet-famous than Zvi or gwern, though again there’s a fair bit of my writing on the internet and my style is fairly distinctive. I’m quite impressed.
(I then tried an obvious thing and fed it a couple of Bitcoin-white-paper paragraphs, but of course it knew that they were “Satoshi Nakamoto” and wasn’t able to get past that. Someone sufficiently determined to identify Satoshi and with absurd resources could do worse than to train a big LLM on “everything except writings explicitly attributed to Satoshi Nakamoto” and then see what it thinks.)
I gave it a few paragraphs from something I posted on Mastodon yesterday, and it identified me. I’m at least a couple of notches less internet-famous than Zvi or gwern, though again there’s a fair bit of my writing on the internet and my style is fairly distinctive. I’m quite impressed.
(I then tried an obvious thing and fed it a couple of Bitcoin-white-paper paragraphs, but of course it knew that they were “Satoshi Nakamoto” and wasn’t able to get past that. Someone sufficiently determined to identify Satoshi and with absurd resources could do worse than to train a big LLM on “everything except writings explicitly attributed to Satoshi Nakamoto” and then see what it thinks.)
For Satoshi scenarios where you have a very small corpus or the corpus is otherwise problematic (in this case, you can’t easily get new Satoshi text heldout from training), you could do things like similarity/distance metrics: https://www.lesswrong.com/posts/dLg7CyeTE4pqbbcnp/language-models-model-us?commentId=MNk22rZeELjoh7bhW