I think your intuition that learning from only positive examples is very inefficient is likely true. However, if additional supervised fine-tuning is done, then the models also effectively learns from its mistakes and could potentially become a lot better fast.
I think your intuition that learning from only positive examples is very inefficient is likely true. However, if additional supervised fine-tuning is done, then the models also effectively learns from its mistakes and could potentially become a lot better fast.