Oliver Daniels comments on Benchmarks for Detecting Measurement Tampering [Redwood Research]

Oliver Daniels 25 Apr 2024 12:40 UTC
1 point
0
looking at your code—seems like there’s an option for next-token prediction in the initial finetuning state, but no mention (that I can find) in the paper—am I correct in assuming the next token prediction weight was set to 0? (apologies for bugging you on this stuff!)
- Fabien Roger 26 Apr 2024 12:10 UTC
  3 points
  0
  Parent
  That’s right. We initially thought it might be important so that the LLM “understood” the task better, but it didn’t matter much in the end. The main hyperparameters for our experiments are in train_ray.py, where you can see that we use a “token_loss_weight” of 0.
  (Feel free to ask more questions!)