Why would the mapping between the language the hypotheses are framed in have impact on which statements are most likley to be true? The article mentions that in domains where the correct hypotheses are complex in the proof language the principle tends to be anti-productive. There is no guarantee that the language is well suited to describe the target phenomenon if we are allowed to freely pick the phenomenon to track!
Wouldn’t also any finite complexity class only have finitely many hypotheses in it and wouldn’t those also be in a finite numbered index in it? The problem only arises for infinite complexity hypotheses. And it could be argued that if the index is a hyperinteger it can still be a valid placing.
With surreal probability it would be no problem to give an equal infinistemal probability to an infinite list of hypotheses.
Wouldn’t also any finite complexity class only have finitely many hypotheses in it
Think of it as like the set of all positive integers of finite size. As it turns out, every single integer has finite size! You show me an integer, and I’ll show you its size :P But even though each individual element is less than infinity, the size of the set is infinite.
Why would the mapping between the language the hypotheses are framed in have impact on which statements are most likely to be true?
Choosing which language to use is ultimately arbitrary. But because there’s no way to assign the same probability to infinitely many discrete things and have the probabilities still add up to one, we’re forced into a choice of some “natural ordering of hypotheses” in which the probability is monotonically decreasing. This does not happen because of any specific fact about the external world—this is a property of what it looks to have hypotheses about something that might be arbitrarily complicated.
The article mentions that in domains where the correct hypotheses are complex in the proof language the principle tends to be anti-productive.
Well… it’s anti-productive until you eliminate the simple-but-wrong alternatives, and then suddenly it’s the only thing allowing you to choose the right hypothesis out of the list that contains many more complex-and-still-accurate hypotheses.
If you want a much better explanation of these topics than I can give, and you like math, I recommend the textbook by Li and Vitanyi.
9 has 4 digits as “1001” in binary and 1 in decimal, so no function from integers to their size. There is no such thing as the size of a integer independent of any digit system used (well you could refer to some set constructions but then the size would be the integer itself).
As surreals we could have ω pieces of equal probability ɛ that sum to 1 exactly (althought ordinal numbers are only applicaple to orders which can be different than cardinal numbers. While for finites there is no big distinciton from ordinal and cardinal, “infinitely many discrete things” might refer to a cardinal concept. However for hypotheses that are listable (such as formed as arbitrary lenght strings of letters from a (finite) alphabeth) the ωth index should be well founded).
It is not about arbitrary complexity but probability over infinite options. We could for example order the hypotheses by the amounts of negation used first and the number of symbols used second. This would not be any less natural and would result in a different probability distribution. Or arguing that the complexity ordereing is the one that produces the “true” probailities is reframing of the question whether the simplicity formulation is truth-indicative.
If I use a complexity-ambivalent method I might need to do fewer eliminations before encountering a working one. There is no need to choose from accurate hypotheses if we know that any of them are true. If I encounter a working hypthesis there is no need to search for a more simpler form of it. Or if I encounter a theory of gravitation using ellipses should I countinue the search to find one that uses simpler concepts like circles only?
The approach of the final authors mentioned on the page seems especially interesting to me. I also am interested to note that their result agrees with Jaynes’. Universability seems to be important to all the most productive approaches there.
Or arguing that the complexity ordereing is the one that produces the “true” probailities is reframing of the question whether the simplicity formulation is truth-indicative.
If the approach that says simplicity is truth-indicative is self-consistent, that’s at least something. I’m reminded of the LW sequence that talks about toxic vs healthy epistemic loops.
If I encounter a working hypothesis there is no need to search for a more simpler form of it.
This seems likely to encourage overfitted hypotheses. I guess the alternative would be wasting effort on searching for simplicity that doesn’t exist, though. Now I am confused again, although in a healthier and more abstract way than originally. I’m looking for where the problem in anti-simplicity arguments lies rather than taking them seriously, which is easier to live with.
Honestly, I’m starting to feel as though perhaps the easiest approach to disproving the author’s argument would be to deny his assertion that processes in Nature which are simple are relatively uncommon. From off the top of my head, argument one is replicators, argument two is that simpler processes are smaller and thus more of them fit into the universe than complex ones would, argument three is the universe seems to run on math (might be begging the question a bit, although I don’t think so, since it’s kinda amazing that anything more meta than perfect atomist replication can lead to valid inference—again the connection to universalizability surfaces), argument four is an attempt to undeniably avoid begging the question inspired by Descartes: if nothing else we have access to at least one form of Nature unfiltered by our perceptions of simplicity : the perceptions themselves, which via anthropic type induction arguments we should assume-more-than-not to be of more or less average representativeness. (Current epistemic status: playing with ideas very nonrigorously, wild and free.)
The approach of the final authors mentioned on the page seems especially interesting to me. I also am interested to note that their result agrees with Jaynes’. Universability seems to be important to all the most productive approaches there.
If I encounter a working hypothesis there is no need to search for a more simpler form of it.
This seems likely to encourage overfitted hypotheses.
The approach of the final authors mentioned on the page seems especially interesting to me. I also am interested to note that their result agrees with Jaynes’. Universability seems to be important to all the most productive approaches there.
Why would the mapping between the language the hypotheses are framed in have impact on which statements are most likley to be true? The article mentions that in domains where the correct hypotheses are complex in the proof language the principle tends to be anti-productive. There is no guarantee that the language is well suited to describe the target phenomenon if we are allowed to freely pick the phenomenon to track!
Wouldn’t also any finite complexity class only have finitely many hypotheses in it and wouldn’t those also be in a finite numbered index in it? The problem only arises for infinite complexity hypotheses. And it could be argued that if the index is a hyperinteger it can still be a valid placing.
With surreal probability it would be no problem to give an equal infinistemal probability to an infinite list of hypotheses.
Think of it as like the set of all positive integers of finite size. As it turns out, every single integer has finite size! You show me an integer, and I’ll show you its size :P But even though each individual element is less than infinity, the size of the set is infinite.
Choosing which language to use is ultimately arbitrary. But because there’s no way to assign the same probability to infinitely many discrete things and have the probabilities still add up to one, we’re forced into a choice of some “natural ordering of hypotheses” in which the probability is monotonically decreasing. This does not happen because of any specific fact about the external world—this is a property of what it looks to have hypotheses about something that might be arbitrarily complicated.
Well… it’s anti-productive until you eliminate the simple-but-wrong alternatives, and then suddenly it’s the only thing allowing you to choose the right hypothesis out of the list that contains many more complex-and-still-accurate hypotheses.
If you want a much better explanation of these topics than I can give, and you like math, I recommend the textbook by Li and Vitanyi.
9 has 4 digits as “1001” in binary and 1 in decimal, so no function from integers to their size. There is no such thing as the size of a integer independent of any digit system used (well you could refer to some set constructions but then the size would be the integer itself).
As surreals we could have ω pieces of equal probability ɛ that sum to 1 exactly (althought ordinal numbers are only applicaple to orders which can be different than cardinal numbers. While for finites there is no big distinciton from ordinal and cardinal, “infinitely many discrete things” might refer to a cardinal concept. However for hypotheses that are listable (such as formed as arbitrary lenght strings of letters from a (finite) alphabeth) the ωth index should be well founded).
It is not about arbitrary complexity but probability over infinite options. We could for example order the hypotheses by the amounts of negation used first and the number of symbols used second. This would not be any less natural and would result in a different probability distribution. Or arguing that the complexity ordereing is the one that produces the “true” probailities is reframing of the question whether the simplicity formulation is truth-indicative.
If I use a complexity-ambivalent method I might need to do fewer eliminations before encountering a working one. There is no need to choose from accurate hypotheses if we know that any of them are true. If I encounter a working hypthesis there is no need to search for a more simpler form of it. Or if I encounter a theory of gravitation using ellipses should I countinue the search to find one that uses simpler concepts like circles only?
I think this is relevant: https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)
The approach of the final authors mentioned on the page seems especially interesting to me. I also am interested to note that their result agrees with Jaynes’. Universability seems to be important to all the most productive approaches there.
If the approach that says simplicity is truth-indicative is self-consistent, that’s at least something. I’m reminded of the LW sequence that talks about toxic vs healthy epistemic loops.
This seems likely to encourage overfitted hypotheses. I guess the alternative would be wasting effort on searching for simplicity that doesn’t exist, though. Now I am confused again, although in a healthier and more abstract way than originally. I’m looking for where the problem in anti-simplicity arguments lies rather than taking them seriously, which is easier to live with.
Honestly, I’m starting to feel as though perhaps the easiest approach to disproving the author’s argument would be to deny his assertion that processes in Nature which are simple are relatively uncommon. From off the top of my head, argument one is replicators, argument two is that simpler processes are smaller and thus more of them fit into the universe than complex ones would, argument three is the universe seems to run on math (might be begging the question a bit, although I don’t think so, since it’s kinda amazing that anything more meta than perfect atomist replication can lead to valid inference—again the connection to universalizability surfaces), argument four is an attempt to undeniably avoid begging the question inspired by Descartes: if nothing else we have access to at least one form of Nature unfiltered by our perceptions of simplicity : the perceptions themselves, which via anthropic type induction arguments we should assume-more-than-not to be of more or less average representativeness. (Current epistemic status: playing with ideas very nonrigorously, wild and free.)
I think this is relevant: https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)
The approach of the final authors mentioned on the page seems especially interesting to me. I also am interested to note that their result agrees with Jaynes’. Universability seems to be important to all the most productive approaches there.
This seems likely to encourage overfitted hypotheses.
I think this is relevant: https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)
The approach of the final authors mentioned on the page seems especially interesting to me. I also am interested to note that their result agrees with Jaynes’. Universability seems to be important to all the most productive approaches there.
I think this is relevant: https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)#Jaynes.27_solution_using_the_.22maximum_ignorance.22_principle
I think this is relevant:
https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)#Jaynes.27_solution_using_the_.22maximum_ignorance.22_principle