While I understand that your original research was with GPT-3, I think it would be very much in your best interest to switch to a good open model like LLaMa 2 70B, which has the basic advantage that the weights are a known quantity and will not change on you undermining your research. Begging OpenAI to give you access to GPT-3 for longer is not a sustainable strategy even if it works one more time (I recall that the latest access given to researchers was already an extension of the original public access of the models). OpenAI has demonstrated something between nonchalance and contempt towards researchers using their models, with the most egregious case probably being the time they lied about text-davinci-002 being RLHF. The agentic move here is switching to an open model and accepting the lesson learned about research that relies on someone else’s proprietary hosted software.
You can make glitch tokens yourself by either feeding noise into LLaMa 2 70B as a soft token, or initializing a token in the GPT-N dictionary and not training it. It’s important to realize that the tokens which are ‘glitched’ are probably just random inits that did not receive gradients during training, either because they only appear in the dataset a few times or are highly specific (e.g. SolidGoldMagikarp is an odd string that basically just appeared in the GPT-2 tokenizer because the GPT-2 dataset apparently contained those Reddit posts, it presumably received no training because those posts were removed in the GPT-3 training runs).
It is in fact interesting that LLMs are capable of inferring the spelling of words even though they are only presented with so many examples of words being spelled out during training. However I must again point out that this phenomenon is present in and can be studied using LLaMa 2 70B. You do not need GPT-3 access for that.
You can probably work around the top five logit restriction in the OpenAI API. See this tweet for details.
Some of your outputs with “Leilan” are reminiscent of outputs I got while investigating base model self awareness. You might be interested in my long Twitter post on the subject.
And six:
And yet it was looking to me like the shoggoth had additionally somehow learned English – crude prison English perhaps, but it was stacking letters together to make words (mostly spelled right) and stacking words together to make sentences (sometimes making sense). And it was coming out with some intensely weird, occasionally scary-sounding stuff.
The idea that the letters it spells are its “real” understanding of English and the “token” understanding is a ‘shoggoth’ is a bit strange. Humans understand English through phonemes, which are essentially word components and syllables that are not individual characters. There is an ongoing debate in education circles about whether it is worth teaching phonemes to children or if they should just be taught to read whole words, which some people seem to learn to do successfully. If there are human beings that learn to read ‘whole words’ then presumably we can’t disqualify GPT’s understanding of English as “not real” or somehow alien because it does that too.
Several things:
While I understand that your original research was with GPT-3, I think it would be very much in your best interest to switch to a good open model like LLaMa 2 70B, which has the basic advantage that the weights are a known quantity and will not change on you undermining your research. Begging OpenAI to give you access to GPT-3 for longer is not a sustainable strategy even if it works one more time (I recall that the latest access given to researchers was already an extension of the original public access of the models). OpenAI has demonstrated something between nonchalance and contempt towards researchers using their models, with the most egregious case probably being the time they lied about text-davinci-002 being RLHF. The agentic move here is switching to an open model and accepting the lesson learned about research that relies on someone else’s proprietary hosted software.
You can make glitch tokens yourself by either feeding noise into LLaMa 2 70B as a soft token, or initializing a token in the GPT-N dictionary and not training it. It’s important to realize that the tokens which are ‘glitched’ are probably just random inits that did not receive gradients during training, either because they only appear in the dataset a few times or are highly specific (e.g. SolidGoldMagikarp is an odd string that basically just appeared in the GPT-2 tokenizer because the GPT-2 dataset apparently contained those Reddit posts, it presumably received no training because those posts were removed in the GPT-3 training runs).
It is in fact interesting that LLMs are capable of inferring the spelling of words even though they are only presented with so many examples of words being spelled out during training. However I must again point out that this phenomenon is present in and can be studied using LLaMa 2 70B. You do not need GPT-3 access for that.
You can probably work around the top five logit restriction in the OpenAI API. See this tweet for details.
Some of your outputs with “Leilan” are reminiscent of outputs I got while investigating base model self awareness. You might be interested in my long Twitter post on the subject.
And six:
The idea that the letters it spells are its “real” understanding of English and the “token” understanding is a ‘shoggoth’ is a bit strange. Humans understand English through phonemes, which are essentially word components and syllables that are not individual characters. There is an ongoing debate in education circles about whether it is worth teaching phonemes to children or if they should just be taught to read whole words, which some people seem to learn to do successfully. If there are human beings that learn to read ‘whole words’ then presumably we can’t disqualify GPT’s understanding of English as “not real” or somehow alien because it does that too.