I don’t have too much information, but CarperAI is planning to be open-sourcing GPT-J models fine-tuned via RLHF for a few different tasks (e.g. summarization). I think people should definitely do interpretability work on these models as soon as they are released. I think it would be great to compare interpretability results from the RLHF models to self-supervised models.
I don’t have too much information, but CarperAI is planning to be open-sourcing GPT-J models fine-tuned via RLHF for a few different tasks (e.g. summarization). I think people should definitely do interpretability work on these models as soon as they are released. I think it would be great to compare interpretability results from the RLHF models to self-supervised models.