Anyone paying attention to the mystery of the GPT-2 chatbot that has appeared on lmsys? People are saying it operates at levels comparable to or exceeding GPT-4. I’m writing because I think the appearance of mysterious unannounced chatbots for public use without provenance makes me update my p(doom) upward.
Possibilities:
this is a OpenAI chatbot based on GPT-4, just like it says it is. It has undergone some more tuning and maybe has boosted reasoning because of methods described in one of the more recently published papers
this is another big American AI company masquarading OpenAI
this is a big Chinese AI company masquerading as OpenAI
this is an anonymous person or group who is using some GPT-4 fine tune API to improve performance
Possibility 1 seems most likely. If that is the case, I guess it is alright, assuming it is purely based on GPT-4 and isn’t a new model. I suppose if they wanted to test on lmsys to gauge performance anonymously, they couldn’t slap 4.5 on it, but they also couldn’t ethically give it the name of another company’s model. Giving it an entirely new name would invite heavy suspicion. So calling it the name of an old model and monitoring how it does in battle seems like the most ethical compromise. Still, even labeling a model with a different name feels deceptive.
Possibility 2 would be extremely unethical and I don’t think it is the case. Also, the behavior of the model looks more like GPT-4 than another model. I expect lawsuits if this is the case.
Possibility 3 would be extremely unethical, but is possible. Maybe they trained a model on many GPT-4 responses and then did some other stuff. Stealing a model in this way would probably accelerate KYC legislation and yield outright bans on Chinese rental of compute. If this is the case, then there is no moat because we let our moat get stolen.
Possibility 4 is a something someone mentioned in Twitter. I don’t know whether it is viable.
In any case, releasing models in disguise onto the Internet lowers my expectations for companies to behave responsibly and transparently. It feels a bit like Amazon and their scheme to collect logistics data from competitors by calling itself a different name. In that case, like this, the facade was paper thin...the headquarters of the fake company was right next to Amazon, but it worked for a long while. Since I think 1 is the mostly likely, I believe OpenAI wants to make sure it soundly beats everyone else in the rankings before releasing an update with improvements. But didn’t they just release an update a few weeks ago? Hmm.
I’m not entirely sure if it’s the same gpt2 model I’m experimenting with in the past year. If I get my hands on it, I will surely try to stretch its context window—and see if it exceeds 1024 tokens to test if its really gpt2.
It definitely exceeds 1024 BPEs context (we wouldn’t be discussing it if it didn’t, I don’t think people even know how to write prompts that, combined with the system prompt etc, even fit in 1024 BPEs anymore), and it is almost certainly not GPT-2, come on.
Copy and pasting an entire paper/blog and asking the model to summarize it? - this isn’t hard to do, and it’s very easy to know if there is enough tokens, just run the text in any BPE tokenizer available online.
Sure, the poem prompt I mentioned using is like 3500 characters all on its own, and it had no issues repeatedly revising and printing out 4 new iterations of the poem without apparently forgetting when I used up my quota yesterday, so that convo must’ve been several thousand BPEs.
Yeah, I saw your other replies in another thread and I was able to test it myself later today and yup it’s most likely that it’s OpenAI’s new LLM. I’m just still confused why call such gpt2.
Altman made a Twitter-edit joke about ‘gpt-2 i mean gpt2’, so at this point, I think it’s just a funny troll-name related to the ‘v2 personality’ which makes it a successor to the ChatGPT ‘v1’, presumably, ‘personality’. See, it’s gptv2 geddit not gpt-2? very funny, everyone lol at troll
Anyone paying attention to the mystery of the GPT-2 chatbot that has appeared on lmsys? People are saying it operates at levels comparable to or exceeding GPT-4. I’m writing because I think the appearance of mysterious unannounced chatbots for public use without provenance makes me update my p(doom) upward.
Possibilities:
this is a OpenAI chatbot based on GPT-4, just like it says it is. It has undergone some more tuning and maybe has boosted reasoning because of methods described in one of the more recently published papers
this is another big American AI company masquarading OpenAI
this is a big Chinese AI company masquerading as OpenAI
this is an anonymous person or group who is using some GPT-4 fine tune API to improve performance
Possibility 1 seems most likely. If that is the case, I guess it is alright, assuming it is purely based on GPT-4 and isn’t a new model. I suppose if they wanted to test on lmsys to gauge performance anonymously, they couldn’t slap 4.5 on it, but they also couldn’t ethically give it the name of another company’s model. Giving it an entirely new name would invite heavy suspicion. So calling it the name of an old model and monitoring how it does in battle seems like the most ethical compromise. Still, even labeling a model with a different name feels deceptive.
Possibility 2 would be extremely unethical and I don’t think it is the case. Also, the behavior of the model looks more like GPT-4 than another model. I expect lawsuits if this is the case.
Possibility 3 would be extremely unethical, but is possible. Maybe they trained a model on many GPT-4 responses and then did some other stuff. Stealing a model in this way would probably accelerate KYC legislation and yield outright bans on Chinese rental of compute. If this is the case, then there is no moat because we let our moat get stolen.
Possibility 4 is a something someone mentioned in Twitter. I don’t know whether it is viable.
In any case, releasing models in disguise onto the Internet lowers my expectations for companies to behave responsibly and transparently. It feels a bit like Amazon and their scheme to collect logistics data from competitors by calling itself a different name. In that case, like this, the facade was paper thin...the headquarters of the fake company was right next to Amazon, but it worked for a long while. Since I think 1 is the mostly likely, I believe OpenAI wants to make sure it soundly beats everyone else in the rankings before releasing an update with improvements. But didn’t they just release an update a few weeks ago? Hmm.
I’m not entirely sure if it’s the same gpt2 model I’m experimenting with in the past year. If I get my hands on it, I will surely try to stretch its context window—and see if it exceeds 1024 tokens to test if its really gpt2.
It definitely exceeds 1024 BPEs context (we wouldn’t be discussing it if it didn’t, I don’t think people even know how to write prompts that, combined with the system prompt etc, even fit in 1024 BPEs anymore), and it is almost certainly not GPT-2, come on.
Copy and pasting an entire paper/blog and asking the model to summarize it? - this isn’t hard to do, and it’s very easy to know if there is enough tokens, just run the text in any BPE tokenizer available online.
Sure, the poem prompt I mentioned using is like 3500 characters all on its own, and it had no issues repeatedly revising and printing out 4 new iterations of the poem without apparently forgetting when I used up my quota yesterday, so that convo must’ve been several thousand BPEs.
Yeah, I saw your other replies in another thread and I was able to test it myself later today and yup it’s most likely that it’s OpenAI’s new LLM. I’m just still confused why call such gpt2.
Altman made a Twitter-edit joke about ‘gpt-2 i mean gpt2’, so at this point, I think it’s just a funny troll-name related to the ‘v2 personality’ which makes it a successor to the ChatGPT ‘v1’, presumably, ‘personality’. See, it’s gptv2 geddit not gpt-2? very funny, everyone lol at troll