By labeled data I simply mean that children’s stories are likely to be identified as such in the data. Children’s books are identified as children’s books. Otherwise, how is the model to “know” what language is appropriate for children? Without some link between the language and a certain class of people it’s just more text. My prompt specifies 5-year olds. How does the model connect that prompt with a specific kind of language?
By labeled data I simply mean that children’s stories are likely to be identified as such in the data. Children’s books are identified as children’s books. Otherwise, how is the model to “know” what language is appropriate for children? Without some link between the language and a certain class of people it’s just more text. My prompt specifies 5-year olds. How does the model connect that prompt with a specific kind of language?