Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).
Compute for doing inference on the weights if you don’t have LoRA finetuning set up properly.
My implicit claim is that there maybe isn’t that much fine-tuning stuff internally.
Fine-tuning for GPT-4 is in an experimental access program since at least November, and OpenAI has written about fine-tuning GPT-4 for a telecom company.
Anthropic says “Our API does not currently offer fine-tuning, but please ask your Anthropic contact if you are interested in exploring this option.”
You can apparently fine-tune Gemini 1.0 Pro.
Maybe setting up custom fine-tuning is hard and labs often only set it up during deployment...
(Separately, it would be nice if OpenAI and Anthropic let some safety researchers do fine-tuning now.)
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).