Thoughtdump on why I’m interested in computational mechanics:
one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. ‘discover’ fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool
… but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions.
re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don’t know how good, but effort-wise far too less invested compared to theory work
would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc.
tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i’m thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it’s gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing …
the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream → tree → markov model → stack automata → … ?)
this … sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up
haha but alas, (almost) no development afaik since the original paper. seems cool
and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names, so another angle i was interested in was to learn about them.
eg crutchfield talks a lot about developing a right notion of information flow—obvious usefulness in eg formalizing boundaries?
many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.
Epsilon machine (and MSP) construction is most likely computationally intractable [I don’t know an exact statement of such a result in the literature but I suspect it is true] for realistic scenarios.
Scaling an approximate version of epsilon reconstruction seems therefore of prime importance. Real world architectures and data has highly specific structure & symmetry that makes it different from completely generic HMMs. This must most likely be exploited.
The calculi of emergence paper has inspired many people but has not been developed much. Many of the details are somewhat obscure, vague. I also believe that most likely completely different methods are needed to push the program further. Computational Mechanics’ is primarily a theory of hidden markov models—it doesn’t have the tools to easily describe behaviour higher up the Chomsky hierarchy. I suspect more powerful and sophisticated algebraic, logical and categorical thinking will be needed here. I caveat this by saying that Paul Riechers has pointed out that actually one can understand all these gadgets up the Chomsky hierarchy as infinite HMMs which may be analyzed usefully just as finite HMMs.
The still-underdeveloped theory of epsilon transducers I regard as the most promising lens on agent foundations. This is uncharcted territory; I suspect the largest impact of computational mechanics will come from this direction.
Your point on True Names is well-taken. More basic examples than gauge information, synchronization order are the triple of quantites entropy rate h, excess entropy E and Crutchfield’s statistical/forecasting complexity C. These are the most important quantities to understand for any stochastic process (such as the structure of language and LLMs!)
Thoughtdump on why I’m interested in computational mechanics:
one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. ‘discover’ fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool
… but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions.
re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don’t know how good, but effort-wise far too less invested compared to theory work
would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc.
tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i’m thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it’s gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing …
the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream → tree → markov model → stack automata → … ?)
this … sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up
haha but alas, (almost) no development afaik since the original paper. seems cool
and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names, so another angle i was interested in was to learn about them.
eg crutchfield talks a lot about developing a right notion of information flow—obvious usefulness in eg formalizing boundaries?
many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.
I agree with you.
Epsilon machine (and MSP) construction is most likely computationally intractable [I don’t know an exact statement of such a result in the literature but I suspect it is true] for realistic scenarios.
Scaling an approximate version of epsilon reconstruction seems therefore of prime importance. Real world architectures and data has highly specific structure & symmetry that makes it different from completely generic HMMs. This must most likely be exploited.
The calculi of emergence paper has inspired many people but has not been developed much. Many of the details are somewhat obscure, vague. I also believe that most likely completely different methods are needed to push the program further. Computational Mechanics’ is primarily a theory of hidden markov models—it doesn’t have the tools to easily describe behaviour higher up the Chomsky hierarchy. I suspect more powerful and sophisticated algebraic, logical and categorical thinking will be needed here. I caveat this by saying that Paul Riechers has pointed out that actually one can understand all these gadgets up the Chomsky hierarchy as infinite HMMs which may be analyzed usefully just as finite HMMs.
The still-underdeveloped theory of epsilon transducers I regard as the most promising lens on agent foundations. This is uncharcted territory; I suspect the largest impact of computational mechanics will come from this direction.
Your point on True Names is well-taken. More basic examples than gauge information, synchronization order are the triple of quantites entropy rate h, excess entropy E and Crutchfield’s statistical/forecasting complexity C. These are the most important quantities to understand for any stochastic process (such as the structure of language and LLMs!)