Ah cheers, I’d not noticed that, trying to avoid looking too much. The way I understood it was that the DRAM usage corresponded very roughly to n_parameters * batch_size and with the batch_size I was able to tune the memory usage easily.
I’d not heard about the factor of 3, is that some particular trick for minimizing the GPU RAM cost?
Ah cheers, I’d not noticed that, trying to avoid looking too much. The way I understood it was that the DRAM usage corresponded very roughly to n_parameters * batch_size and with the batch_size I was able to tune the memory usage easily.
I’d not heard about the factor of 3, is that some particular trick for minimizing the GPU RAM cost?