Kaj_Sotala comments on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Kaj_Sotala 26 Feb 2025 22:36 UTC
8 points
2
I don’t have an intuition for whether this is large for a fine-tuning update.
FWIW, OpenAI’s documentation ( https://platform.openai.com/docs/guides/fine-tuning ) says:
To fine-tune a model, you are required to provide at least 10 examples. We typically see clear improvements from fine-tuning on 50 to 100 training examples with gpt-4o-mini and gpt-3.5-turbo, but the right number varies greatly based on the exact use case.
We recommend starting with 50 well-crafted demonstrations and seeing if the model shows signs of improvement after fine-tuning. In some cases that may be sufficient, but even if the model is not yet production quality, clear improvements are a good sign that providing more data will continue to improve the model. No improvement suggests that you may need to rethink how to set up the task for the model or restructure the data before scaling beyond a limited example set.