Charlie Steiner comments on A library for safety research in conditioning on RLHF tasks

Charlie Steiner 26 Feb 2023 16:39 UTC
3 points
0
Seems cool to me. I don’t totally understand what’s going on with the “embedding” of the score, but presumably this way works well for DTs.
- James Chua 26 Feb 2023 16:52 UTC
  2 points
  0
  Parent
  For DTs its really just a linear function to convert the scalar reward into the same dimmensions the token embeddings.
  So e.g. a single token’s embedding has a hidden state of size 1024 .
  We can learn a linear function that takes this scalar and outputs something of size 1024.
  The more annoying (PITA) part was offset the positional/attention masks/labels for this.