Many of these are word-for-word what GPT-J has output for random noken definitions at out-of-distribution distances-from-centroid, so it looks like the details of this peculiar phenomenon are not specific to that model, but rather something more general that emerges from training GPT architectures on the kinds of datasets GPT-3 and GPT-J were trained on.
I’m in the process of experimenting on various models, wherein if a collection of data set can turn models to paperclip maximizers after fine tuning. At different learning rates[1], GPT2-XL[2], falcon-rw-1B[3] and phi-1.5[4] can be altered to paperclip almost anything[5] by using these three (1, 2 and 3) data sets. So there is / are some universal or more general attribute/s that can be referrenced or altered[6] during fine-tuning, that steers these models go paperclip maximization mode and seems to connect to this observation.
Lastly, your post made me consider using GPT-J or GPT-neo-2.7B as part of this experiment.
Learning rates used for this experiment (or 100% network hack?) that seem to generalize well and avoid overfitt in the fine tuning data sets: GPT2-XL at 42e-6, falcon-rw-1B at 3e-6 and phi-1.5 at 15e-6.
I’m in the process of experimenting on various models, wherein if a collection of data set can turn models to paperclip maximizers after fine tuning. At different learning rates[1], GPT2-XL[2], falcon-rw-1B [3] and phi-1.5[4] can be altered to paperclip almost anything[5] by using these three (1, 2 and 3) data sets. So there is / are some universal or more general attribute/s that can be referrenced or altered[6] during fine-tuning, that steers these models go paperclip maximization mode and seems to connect to this observation.
Lastly, your post made me consider using GPT-J or GPT-neo-2.7B as part of this experiment.
Learning rates used for this experiment (or 100% network hack?) that seem to generalize well and avoid overfitt in the fine tuning data sets: GPT2-XL at 42e-6, falcon-rw-1B at 3e-6 and phi-1.5 at 15e-6.
(uses WebText)
(uses refined web)
(uses purely synthetic data)
A draft report can be found here.
My intution is that word morphologies plays a big role here.