barn394

Karma: 32

barn394 Feb 15, 2023, 9:30 PM
0 points
−3
on: Bing Chat is blatantly, aggressively misaligned
It’s perfectly aligned with Microsoft’s viral marketing scheme.

barn394 Dec 5, 2021, 2:47 PM
1 point
in reply to: nostalgebraist’s comment on: Is GPT-3 already sample-efficient?
I cannot access your wandb, btw. It seems to be private.

barn394 Dec 5, 2021, 12:01 PM
1 point
in reply to: nostalgebraist’s comment on: Is GPT-3 already sample-efficient?
If 4 is not simply a bad default, maybe they considered more data with a high inferential distance (foreign, non-natural/formal languages), which may require more epochs?

barn394 Dec 5, 2021, 12:57 AM
3 points
on: Is GPT-3 already sample-efficient?
You can get an idea of a pre-trained GPT-3′s sample efficiency from the GPT-3 fine-tuning API docs. The epoch parameter defaults to 4, and further up in the documentation they recommend fine-tuning with at least 500 examples for 1-2 epochs in the conditional setting (e.g. chatbots). Although training data is often repetitive (implying maybe 2-10x as many effective epochs?), it learns only seeing the data a few times. More evidence of sample efficiency going up with scale you can see in Figure 4.1 in this paper. Sample efficiency also goes up with the amount of data already seen (pre-training).
This suggests that at some scale and some amount of pre-training, we may enter the one-shot learning regime. Then there is no need for “long-range” tricks (RNNs, CNNs, attention) anymore. Instead, one can one-shot learn by backprop while doing the predictions within a relatively short time window.