Bill Benzon comments on ChatGPT defines 10 concrete terms: generically, for 5- and 11-year-olds, and for a scientist

Bill Benzon 12 Apr 2024 10:49 UTC
1 point
0
Of course, but it does need to know what a definition is. There are certainly lots of dictionaries on the web. I’m willing to assume that some of them made it into the training data. And it needs to know that people of different ages use language at different levels of detail and abstraction. I think that requires labeled data, like children’s stories labeled as such.
- metachirality 12 Apr 2024 15:34 UTC
  1 point
  0
  Parent
  
  I think that requires labeled data.
  
  It doesn’t and the developers don’t label the data. The LLM learns that these categories exist during training because they can and it helps minimize the loss function.
  - Bill Benzon 12 Apr 2024 15:48 UTC
    1 point
    0
    Parent
    By labeled data I simply mean that children’s stories are likely to be identified as such in the data. Children’s books are identified as children’s books. Otherwise, how is the model to “know” what language is appropriate for children? Without some link between the language and a certain class of people it’s just more text. My prompt specifies 5-year olds. How does the model connect that prompt with a specific kind of language?