ryan_greenblatt comments on leogao’s Shortform

ryan_greenblatt 25 Oct 2024 18:26 UTC
2 points
0
Something between training the whole model with RL and BoN is training just the last few layers of the model (for current architectures) with RL and then doing BoN on top as needed to increase performance. This means most of the model won’t know the information (except insofar as the info shows up in outputs) and allows you to get some of the runtime cost reductions of using RL rather than BoN.
- leogao 25 Oct 2024 18:28 UTC
  2 points
  0
  Parent
  I’m claiming that even if you go all the way to BoN, it still doesn’t necessarily leak less info to the morel
  - ryan_greenblatt 25 Oct 2024 20:42 UTC
    2 points
    0
    Parent
    Oh huh, parse error on me.