Ahh sorry, I think I made this comment on an early draft of this post and didn’t realise it would make it into the published version! I totally agree with you and made the above comment in a hope for this point to be be made more clear in later drafts, which I think it has!
It looks like I can’t delete a comment which has a reply so I’ll add a note to reflect this.
Anyways, loved the paper—very cool research!
Hi Seonglae, glad you enjoyed the post!
Yes this is correct, we also multiplied the 1999 number by 7 to represent the number of bits in a float (we assumed 8 bit floats but without specifying the sign as SAE feature magnitudes are always positive which gives 7 bits).
It could be argued that in fact in this case we might not want to think of features as scalars (ie float valued) and use the numbers as you describe them above. In that case note that the value still exceeds the typical description length from the SAEs (1405 bits). This is mostly an illustrative example as it assumes features are uniformly distributed for exposition, in practise we might expect the SAEs to perform even better as we are able to exploit the fact that some features are much more common than others etc
Thanks for your comment!