I’m asking GPT-3 questions of the form “Who is X?” to see what it knows. It knows EY, Paul, Katja, Julia, Wei Dai, Kaj Sotala… It thinks Daniel Kokotajlo is a filmmaker, which is true actually (there are two of us in the world, and the more well-known one is the filmmaker). It thinks Evan Hubinger is a software engineer.
In parallel I’m googling those names in quotes to see how many hits they get. To my surprise there is about ten thousand hits for many of these names, the more popular ones get more. But GPT-3′s training data didn’t contain the whole internet, right? Just a fraction of it? So presumably it had only one thousand, or one hundred, instances of each name to learn from?
Slight subtlety—GPT-3 might have a bias in its training data towards things related to AI and things of interest to the internet (maybe they scraped a lot of forums as well as just google). I picked some random names from non-western countries—for example, this Estonian politician gets 33,000 hits on Google and wasn’t recognised by GPT-3. It thought he was a software developer (though from Estonia). Might mean that if you’re estimating sample efficiency from Google search hits on people involved with AI, you’ll end up overestimating sample efficiency.
What did it say about me? :D I think I tried asking the AI Dungeon version about me at some point but apparently the adventure game finetuning had made that knowledge inaccessible.
Yeah, I’m counting things as correct if it gets in the right ballpark. Like, I myself didn’t know where you worked exactly, but CFAR sounded plausible, especially as a place you may have worked in the past. The fact that GPT-3 said you work at CFAR means it thinks you are part of the rationalist community, which is pretty impressive IMO.
First pass at trying to answer:
I’m asking GPT-3 questions of the form “Who is X?” to see what it knows. It knows EY, Paul, Katja, Julia, Wei Dai, Kaj Sotala… It thinks Daniel Kokotajlo is a filmmaker, which is true actually (there are two of us in the world, and the more well-known one is the filmmaker). It thinks Evan Hubinger is a software engineer.
In parallel I’m googling those names in quotes to see how many hits they get. To my surprise there is about ten thousand hits for many of these names, the more popular ones get more. But GPT-3′s training data didn’t contain the whole internet, right? Just a fraction of it? So presumably it had only one thousand, or one hundred, instances of each name to learn from?
Slight subtlety—GPT-3 might have a bias in its training data towards things related to AI and things of interest to the internet (maybe they scraped a lot of forums as well as just google). I picked some random names from non-western countries—for example, this Estonian politician gets 33,000 hits on Google and wasn’t recognised by GPT-3. It thought he was a software developer (though from Estonia). Might mean that if you’re estimating sample efficiency from Google search hits on people involved with AI, you’ll end up overestimating sample efficiency.
What did it say about me? :D I think I tried asking the AI Dungeon version about me at some point but apparently the adventure game finetuning had made that knowledge inaccessible.
I don’t remember what it said the first time, but I just asked it now:
Thanks!
LW is sponsored by CFAR so this is kind of correct if you squint a bit
Yeah, I’m counting things as correct if it gets in the right ballpark. Like, I myself didn’t know where you worked exactly, but CFAR sounded plausible, especially as a place you may have worked in the past. The fact that GPT-3 said you work at CFAR means it thinks you are part of the rationalist community, which is pretty impressive IMO.