When mentioning prior work, it may good to include Eric Drexler’s MDL Intelligence Distillation (from 2015) proposing the idea to use description length to split “pure learning capabilities” (low length) from all sorts of undesired behavior.
Description length has some advantages over speed—also “limiting neural net size” is actually more of a distillation based on description length than “speed”.
(My personal guess for how to make stuff bounded would be penalize bitflips)
When mentioning prior work, it may good to include Eric Drexler’s MDL Intelligence Distillation (from 2015) proposing the idea to use description length to split “pure learning capabilities” (low length) from all sorts of undesired behavior.
Description length has some advantages over speed—also “limiting neural net size” is actually more of a distillation based on description length than “speed”.
(My personal guess for how to make stuff bounded would be penalize bitflips)