anaguma comments on Impressions from base-GPT-4?

anaguma 24 Aug 2024 18:13 UTC
1 point
0
It’s still not trivial to finetune Llama 405B. You require 16 bytes/parameter using Adam + activation memory, so a minimum of ~100 H100s.
- gwern 24 Aug 2024 20:31 UTC
  6 points
  1
  Parent
  There are lots of people working on it and offering or will be offering it. And even when they aren’t offering true finetuning, it’s still better: Snowflake (first hit in google for “Llama 405B finetuning”) for example is making no bones about their single-node lightweight-finetuning being a LoRA, and is open sourcing code upfront so at least you know what it is now—instead of depending on borderline-gossip buried 40 minutes into a Youtube video months/years later.