Does anyone know if GPT weighs all training texts equally for purposes of maximizing accuracy, or treats more popular texts (that have been seen by more people) as more important? Because it seems to me that these failure modes might come from caring too much about some really obscure and unpopular texts in the training set.
Does anyone know if GPT weighs all training texts equally for purposes of maximizing accuracy, or treats more popular texts (that have been seen by more people) as more important? Because it seems to me that these failure modes might come from caring too much about some really obscure and unpopular texts in the training set.