This way it’s probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would’ve been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn’t meant to last.
Much larger than I expected for its performance
This way it’s probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would’ve been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn’t meant to last.