Ah. I was remembering something about 20GB from their github, but it looks like it doesn’t correspond to model size like I thought. (I also forgot about the factor of ~3 difference between model size on disk and GPU usage, but even beyond that...)
Ah cheers, I’d not noticed that, trying to avoid looking too much. The way I understood it was that the DRAM usage corresponded very roughly to n_parameters * batch_size and with the batch_size I was able to tune the memory usage easily.
I’d not heard about the factor of 3, is that some particular trick for minimizing the GPU RAM cost?
Ah. I was remembering something about 20GB from their github, but it looks like it doesn’t correspond to model size like I thought. (I also forgot about the factor of ~3 difference between model size on disk and GPU usage, but even beyond that...)
Ah cheers, I’d not noticed that, trying to avoid looking too much. The way I understood it was that the DRAM usage corresponded very roughly to n_parameters * batch_size and with the batch_size I was able to tune the memory usage easily.
I’d not heard about the factor of 3, is that some particular trick for minimizing the GPU RAM cost?