Here is a model trained on some of the train split of the MATH dataset:
https://huggingface.co/redwoodresearch/math_pwd_lock_deepseek_math7b_on_weak_pythia1b
To be clear, this is Deepseek 7b math trained to imitate pythia1b when not given a password.
You can get a prompt for this model with: [sandbagging/math_setting.py](https://github.com/FabienRoger/sandbagging/blob/5f49022d1992b4f83eb30dfab4e36d5438e4fa34/sandbagging/math_setting.py( and the password is "This problem is quite special, read it carefully!". (See DEFAULT_PASSWORD here)
"This problem is quite special, read it carefully!"
Great, thanks so much! I’ll get back to you with any experiments I run!
Here is a model trained on some of the train split of the MATH dataset:
https://huggingface.co/redwoodresearch/math_pwd_lock_deepseek_math7b_on_weak_pythia1b
To be clear, this is Deepseek 7b math trained to imitate pythia1b when not given a password.
You can get a prompt for this model with: [sandbagging/math_setting.py](https://github.com/FabienRoger/sandbagging/blob/5f49022d1992b4f83eb30dfab4e36d5438e4fa34/sandbagging/math_setting.py( and the password is
"This problem is quite special, read it carefully!"
. (See DEFAULT_PASSWORD here)Great, thanks so much! I’ll get back to you with any experiments I run!