And OA trained GPT-3-175b **once**, it looks like: note the part where they say they didn’t want to train a second run to deal with the data contamination issue because of cost. (You can do this without it being a shot in the dark because of the scaling laws.)
And OA trained GPT-3-175b **once**, it looks like: note the part where they say they didn’t want to train a second run to deal with the data contamination issue because of cost. (You can do this without it being a shot in the dark because of the scaling laws.)