Is speed of calculation important? What about suitability for humans? I guess you want one where complexities are as small as possible.
Given 2 languages, L1 & L2, and their complexity measures, K1 & K2.
If K1(L2) < K2(L1) then I take that as a sign that L1 is better for use in the context of Ockhams razor. It is also a sign that L1 is more complex than L2, but that effect can be removed by doing lots of comparisons like that, so the unnecessarily complex languages loose against those that are actually good. Or one can use 3 languages in a comparison: K1(L3) < K2(L3) is a sign that L1 is better.
The idea here is that a good language should more easily represent systems, such as other languages.
What sort of languages are best in this measure? Not the simplest ones, but rather the more expressive ones. Programs for Turing machines use some complexity for tape traversing and storage systems, so I guess they are not so good. Cellular automata have the same problems. Combinatory Logic is very simple, with complex operators even for simple stuff, but its system for storage and access can be simpler, since it is a tree. I guess Lambda Calculus is one of the better languages, as it is both simple and very expressive, because of its more direct tree structure.
I guess the best language would use an even more expressive structure, such as graphs, perhaps bi-graphs, since those are maximally expressive. The Scheme language can be used as a general graph language thanks to the set-car! and set-cdr! procedures.
Then I would go for Turing machines, Lambda calculus, or similar. These languages are very simple, and can easily handle input and output.
Even simpler languages, like cellular automaton No.110 or Combinatory Logic might be better, but those are quite difficult to get to handle input and output correctly.
The reason simple languages, or universal machines, should be better, is that the upper bound for error in estimating probabilities is 2 to the power of the complexity of the program simulating one language in another, according to algorithmic information theory.
So, the simpler the language is, the more correct relative probabilities it will give.
The answer I gave before this one, was for the question: Maximize the likelihood that the language will produce the output corresponding to the observed events.
You however want to compare 2 hypotheses, getting 2 probabilities, and compare them. In this case the absolute size of the probabilities do not matter, just their relative size, so you can just go for the simplest language that is practical to use.
Using simple languages is the conventional approach. However, simple languages typically result in more complex programs. The game of life is very simple—yet try writing a program in it. If you are trying to minimise the size of an emulator of other languages, then highly simple languages don’t seem to fit the bill.
Why would one want a decent formulation of Occam’s razor? To help solve the problem of the priors.
I agree. We seem to have the same goal, so my first advice stands, not my second.
I am currently trying to develop a language that is both simple and expressive, and making some progress. The overall design is finished, and I am now down to what instructions it should have. It is a general bi-graph, but with a sequential program structure, and no separation of program and data.
It is somewhat different from what you want, as I also need something that have measurable use of time and memory, and is provable able to run fast.
Could I have a link, or maybe some more information about this language? It’s something I’m very interested in (merging expressiveness and guarantees about resource usage).
I’m sure that’s not possible to do in general (in an algorithmic fashion), since it would solve certain mathematical problems in a flash that aren’t supposed to be solved in polynomial time or even efficiently (like particular halting problems).
I think Tim’s intended meaning was more along the lines of “a language L such that when A is a more likely hypothesis than B, L’s MDL for A is shorter than its MDL for B more often than for any other language”.
That depends on what you mean by “best”.
Is speed of calculation important? What about suitability for humans? I guess you want one where complexities are as small as possible.
Given 2 languages, L1 & L2, and their complexity measures, K1 & K2.
If K1(L2) < K2(L1) then I take that as a sign that L1 is better for use in the context of Ockhams razor. It is also a sign that L1 is more complex than L2, but that effect can be removed by doing lots of comparisons like that, so the unnecessarily complex languages loose against those that are actually good. Or one can use 3 languages in a comparison: K1(L3) < K2(L3) is a sign that L1 is better.
The idea here is that a good language should more easily represent systems, such as other languages.
What sort of languages are best in this measure? Not the simplest ones, but rather the more expressive ones. Programs for Turing machines use some complexity for tape traversing and storage systems, so I guess they are not so good. Cellular automata have the same problems. Combinatory Logic is very simple, with complex operators even for simple stuff, but its system for storage and access can be simpler, since it is a tree. I guess Lambda Calculus is one of the better languages, as it is both simple and very expressive, because of its more direct tree structure.
I guess the best language would use an even more expressive structure, such as graphs, perhaps bi-graphs, since those are maximally expressive. The Scheme language can be used as a general graph language thanks to the set-car! and set-cdr! procedures.
“Best”: most accurate—i.e. when Occam’s razor says A is a more likely hypothesis than B, then that is actually true.
Then I would go for Turing machines, Lambda calculus, or similar. These languages are very simple, and can easily handle input and output.
Even simpler languages, like cellular automaton No.110 or Combinatory Logic might be better, but those are quite difficult to get to handle input and output correctly.
The reason simple languages, or universal machines, should be better, is that the upper bound for error in estimating probabilities is 2 to the power of the complexity of the program simulating one language in another, according to algorithmic information theory.
So, the simpler the language is, the more correct relative probabilities it will give.
The answer I gave before this one, was for the question: Maximize the likelihood that the language will produce the output corresponding to the observed events.
You however want to compare 2 hypotheses, getting 2 probabilities, and compare them. In this case the absolute size of the probabilities do not matter, just their relative size, so you can just go for the simplest language that is practical to use.
What do you want to use this for?
Using simple languages is the conventional approach. However, simple languages typically result in more complex programs. The game of life is very simple—yet try writing a program in it. If you are trying to minimise the size of an emulator of other languages, then highly simple languages don’t seem to fit the bill.
Why would one want a decent formulation of Occam’s razor? To help solve the problem of the priors.
I agree. We seem to have the same goal, so my first advice stands, not my second.
I am currently trying to develop a language that is both simple and expressive, and making some progress. The overall design is finished, and I am now down to what instructions it should have. It is a general bi-graph, but with a sequential program structure, and no separation of program and data.
It is somewhat different from what you want, as I also need something that have measurable use of time and memory, and is provable able to run fast.
Could I have a link, or maybe some more information about this language? It’s something I’m very interested in (merging expressiveness and guarantees about resource usage).
I’m sure that’s not possible to do in general (in an algorithmic fashion), since it would solve certain mathematical problems in a flash that aren’t supposed to be solved in polynomial time or even efficiently (like particular halting problems).
You’ll have to hope for something less.
I think Tim’s intended meaning was more along the lines of “a language L such that when A is a more likely hypothesis than B, L’s MDL for A is shorter than its MDL for B more often than for any other language”.