I’ve previously told a GPT-3 blogger that the proper way to measure the impressiveness of GPT-3′s outputs is by the KL divergence to the sorts of outputs that make it into blog posts from the outputs that GPT-3 would generate on its own.
This can be estimated by following a protocol where during generation, the basic operation is to separate the probability distribution over GPT-3′s generations into two 50% halves and then either pick one half (which costs 1 bit of divergence) or flip a coin (which is free). Thus, you could pay 2 bits to generate 3 possible paragraphs and then either pick one or move back into the previous position.
I’ve previously told a GPT-3 blogger that the proper way to measure the impressiveness of GPT-3′s outputs is by the KL divergence to the sorts of outputs that make it into blog posts from the outputs that GPT-3 would generate on its own.
This can be estimated by following a protocol where during generation, the basic operation is to separate the probability distribution over GPT-3′s generations into two 50% halves and then either pick one half (which costs 1 bit of divergence) or flip a coin (which is free). Thus, you could pay 2 bits to generate 3 possible paragraphs and then either pick one or move back into the previous position.