gwern comments on How feasible/​costly would it be to train a very large AI model on distributed clusters of GPUs?