Hi, thanks for your work. I was wondering why we use scaling to modify the activation here rather than using an analytical solution by compensating for the −cd/2 term.
Hi, thanks for your work. I was wondering why we use scaling to modify the activation here rather than using an analytical solution by compensating for the −cd/2 term.