There are lots of people working on it and offering or will be offering it. And even when they aren’t offering true finetuning, it’s still better: Snowflake (first hit in google for “Llama 405B finetuning”) for example is making no bones about their single-node lightweight-finetuning being a LoRA, and is open sourcing code upfront so at least you know what it is now—instead of depending on borderline-gossip buried 40 minutes into a Youtube video months/years later.
It’s still not trivial to finetune Llama 405B. You require 16 bytes/parameter using Adam + activation memory, so a minimum of ~100 H100s.
There are lots of people working on it and offering or will be offering it. And even when they aren’t offering true finetuning, it’s still better: Snowflake (first hit in google for “Llama 405B finetuning”) for example is making no bones about their single-node lightweight-finetuning being a LoRA, and is open sourcing code upfront so at least you know what it is now—instead of depending on borderline-gossip buried 40 minutes into a Youtube video months/years later.