You’re right! This is something that the literature has details on, that I chose to simplify/gloss-over in pursuit of my goal of “write Actual Executable Code implementing minimum-description-length model selection and then tell a cute story about it.”
Chapter 5 of Peter D. Grünwald’s The Minimum Description Length Principle gives the complexity term (assuming that the parameter space is compact) as kd+LN(k)+LN(d), where k is the number of parameters, d is the bits of percision per parameter, and LN is the codelength function for a universal prior on the integers (the theoretical ideal that Levenshtein coding and the Elias ω coding approach). Basically, in order to specify your parameters, you need to first say how many of them there are and to what precision, and for that, you need a prefix-free code for the integers, which is itself kind of intricate: you can’t give every integer the same codelength, so you end up doing something like “first give the order of magnitude (in, um, unary), and then pick out a specific integer in that range”, but recursively (so that you first give the order of magnitude of the order of magnitude if it gets too big, &c.).
(Oh, and Grünwald warns that this still isn’t optimal and that question of the best encoding is left to §15.3, but I’m not there yet—it’s a big textbook, okay?)
I was originally imagining implementing the integer coding, but I decided it was too complicated for the cute story I wanted to tell about model selection, particularly since I wanted to provide Actual Executable Code with minimal boilerplate and minimal dependencies. (Rust just gives me f32 and f64, not variable-length floats.) I think this kind of handwaving should be admissible in the cute-blog-story genre as long as the author addresses objections in the comment section.
You’re right! This is something that the literature has details on, that I chose to simplify/gloss-over in pursuit of my goal of “write Actual Executable Code implementing minimum-description-length model selection and then tell a cute story about it.”
Chapter 5 of Peter D. Grünwald’s The Minimum Description Length Principle gives the complexity term (assuming that the parameter space is compact) as kd+LN(k)+LN(d), where k is the number of parameters, d is the bits of percision per parameter, and LN is the codelength function for a universal prior on the integers (the theoretical ideal that Levenshtein coding and the Elias ω coding approach). Basically, in order to specify your parameters, you need to first say how many of them there are and to what precision, and for that, you need a prefix-free code for the integers, which is itself kind of intricate: you can’t give every integer the same codelength, so you end up doing something like “first give the order of magnitude (in, um, unary), and then pick out a specific integer in that range”, but recursively (so that you first give the order of magnitude of the order of magnitude if it gets too big, &c.).
(Oh, and Grünwald warns that this still isn’t optimal and that question of the best encoding is left to §15.3, but I’m not there yet—it’s a big textbook, okay?)
I was originally imagining implementing the integer coding, but I decided it was too complicated for the cute story I wanted to tell about model selection, particularly since I wanted to provide Actual Executable Code with minimal boilerplate and minimal dependencies. (Rust just gives me
f32
andf64
, not variable-length floats.) I think this kind of handwaving should be admissible in the cute-blog-story genre as long as the author addresses objections in the comment section.