So I think we might be talking past each other a bit. I don’t really have a strong view on whether Shannon’s work represented a major theoretical advancement. The specific thing I doubt is that Shannon’s work had significant counterfactual impacts on the speed with which it became practical to do specific things with computers.
This was why I was focusing on error correcting codes. Is there some other practical task which people wanted to do before Shannon’s work but were unable to do, which Shannon’s work enabled, and which you believe would have taken at least 5 years longer had Shannon not done his theoretical work?
Is there some other practical task which people wanted to do before Shannon’s work but were unable to do, which Shannon’s work enabled, and which you believe would have taken at least 5 years longer had Shannon not done his theoretical work?
This is an interesting question. Let’s come at it from first principles.
Going by the model in my own comment, Shannon’s work was counterfactually impactful plausibly because most people didn’t realize there was a problem to be solved there in the first place. So, in terms of practical applications, his work would be counterfactual mainly for things which people wouldn’t even have thought to try or realized was possible prior to information theory.
With that lens in mind, let’s go back to error-correcting codes; I can see now why you were looking there for examples.
Natural guess: Shannon’s counterfactual impact on error-correcting codes was to clearly establish the limits of what’s possible, so that code-designers knew what to aim for. It’s roughly analogous to the role of Joseph Black’s theory of latent heat in the history of the steam engine: before that theory, engines were wildly inefficient; Watt’s main claim to fame was to calculate where the heat was used, realize that mostly it went to warming and cooling the cylinder rather than the steam, then figure out a way to avoid that, resulting in massive efficiency gains. That’s the sort of thing I’d a-priori expect to see in the history of error-correcting codes: people originally did wildly inefficient things (like e.g. sending messages without compression so receivers could easily correct typos, or duplicating messages). Then, post-Shannon, people figured out efficient codes.
And I think the history bears that out. Here’s the wikipedia page on error-correcting codes:
The American mathematician Richard Hamming pioneered this field in the 1940s and invented the first error-correcting code in 1950: the Hamming (7,4) code.
So, the first method efficient enough to merit mention on the wikipedia page at all was developed by the guy who shared an office with Shannon, within a few years of Shannon’s development of information theory. And there had been nothing even remotely as efficient/effective for centuries before. If that’s not a smoking gun for counterfactual impact, then I don’t know what is.
And there had been nothing even remotely as efficient/effective for centuries before. If that’s not a smoking gun for counterfactual impact, then I don’t know what is.
Look at the date though. 1944. Might there have been some other external event causing the quantity of noisy channel transmitted information (such as radio messages) to have been increased at that point in time?
To use any form of error correcting code (vs simply a repeat of the message) you need a digital computer to apply the code. ENIAC is 1945.
Counterfactual. Shannon is licking the stamp to submit the key paper for peer review when an assassin shoots him and sets fire to the office.
Are you saying that human engineers in the 1940s and 50s, at the first point in human history where ECC is even useful or possible, wouldn’t invent a similar scheme? Note you don’t need the theory, you can cook up a code like this by hand by grouping the bits into nibbles and considering all 16 possible nibbles and all 4 single bit errors. You have 64 cases.
Can you find a code that uses less than 1 full nibble of redundancy? Could any of the thousands of other people working on digital computers in that era have thought of the idea?
If someone creates a less efficient (by say 1 bit) code does this change history in a meaningful way?
Remember, sending the message twice is 2 nibbles. Hamming 4,7 is 1 nible + 3 bits. It is very slightly more efficient than double-send where you send 2 nibbles and each gets a parity bit, so:
7 bits total vs 10 bits total. (you can also go up to bytes or words and just use a parity for each, but less parity bits increases the probability of an undetected error)
Interestingly, by learning effect laws (Moore’s) there is negligible gain here. 30% is nothing if the cost of a transistor is halving every 2-3 years. This may apply to AI as well, for example any linear improvement by a small factor is nothing.
Could any of the thousands of other people working on digital computers in that era have thought of the idea?
Given the lack of simultaneous discovery over at least 5 years, no, nobody else thought of that idea (or if they did, they didn’t know how to make it work efficiently).
Look, man, you can pick any date in past 200 years and there will be Significant and Plausibly-Relevant events which happened around that time. In this case, I’d say the relevant events are less plausibly counterfactually impactful than average—after all, WWII was not the first war with widespread radio use (also this was at Bell Labs, their big use case was telephone lines), and no you do not need a digital computer to apply any error-correcting codes (I’ve personally done it by hand in homework assignments, there’s a learning curve but the complexity is not prohibitive). It is not at all plausible that the 40′s and 50′s were the first point in history where error-correcting codes were possible, and unlikely that they were the first point in history where error-correcting codes would have been useful.
We can come along today and see how easy it would have been for somebody to figure this stuff out, but that’s hindsight bias. Insofar as we can glean evidence about counterfactual worlds by looking at historical evidence at all, the evidence here generally points to the role of information theory being counterfactual: there was not prior or simultaneous discovery, and the main discovery which did happen was by the guy sharing an office with Shannon, within a few years of information theory.
(Notably, evidence against those core facts are the main things which would change my mind here: if there were prior or simultaneous discovery of reasonably-efficient error-correcting codes, by someone who didn’t know about info theory, then that would be clear evidence that Shannon’s work was not counterfactual for efficient error-correcting codes.)
You need a digital computer for the code to be practical. Otherwise why not just repeat every message 2-3 times? When humans are doing it, the cost of sending a message twice is less than the time it takes a human computer to do the math to reconstruct a message that may have been corrupted. This is because a human telegraph operator is also the I/O.
When I googled for it I found memoirs from the time period that credit hamming so I can’t prove the counterfactual, just note that there are a lot of possible encoding schemes that let you reconstruct corrupt messages.
But yeah sure, I am thinking from hindsight. I have done byte level encoding for communications on a few projects so I am very familiar with this, but obviously it was 60 years later.
One concrete example of a case where I expect error-correcting codes (along with compression) would have been well worth the cost: 19th-century transatlantic telegraph messages, and more generally messages across lines bottlenecked mainly by the capacity of a noisy telegraph line. In those cases, five minutes for a human to encode/decode messages and apply error correction would probably have been well worth the cost for many users during peak demand. (And that’s assuming they didn’t just automate the encoding/decoding; that task is simple enough that a mechanical device could probably do it.)
For the very noisy early iterations of the line, IIRC messages usually had to be sent multiple times, and in that case especially I’d expect efficient error-correcting codes to do a lot better.
So I think we might be talking past each other a bit. I don’t really have a strong view on whether Shannon’s work represented a major theoretical advancement. The specific thing I doubt is that Shannon’s work had significant counterfactual impacts on the speed with which it became practical to do specific things with computers.
This was why I was focusing on error correcting codes. Is there some other practical task which people wanted to do before Shannon’s work but were unable to do, which Shannon’s work enabled, and which you believe would have taken at least 5 years longer had Shannon not done his theoretical work?
This is an interesting question. Let’s come at it from first principles.
Going by the model in my own comment, Shannon’s work was counterfactually impactful plausibly because most people didn’t realize there was a problem to be solved there in the first place. So, in terms of practical applications, his work would be counterfactual mainly for things which people wouldn’t even have thought to try or realized was possible prior to information theory.
With that lens in mind, let’s go back to error-correcting codes; I can see now why you were looking there for examples.
Natural guess: Shannon’s counterfactual impact on error-correcting codes was to clearly establish the limits of what’s possible, so that code-designers knew what to aim for. It’s roughly analogous to the role of Joseph Black’s theory of latent heat in the history of the steam engine: before that theory, engines were wildly inefficient; Watt’s main claim to fame was to calculate where the heat was used, realize that mostly it went to warming and cooling the cylinder rather than the steam, then figure out a way to avoid that, resulting in massive efficiency gains. That’s the sort of thing I’d a-priori expect to see in the history of error-correcting codes: people originally did wildly inefficient things (like e.g. sending messages without compression so receivers could easily correct typos, or duplicating messages). Then, post-Shannon, people figured out efficient codes.
And I think the history bears that out. Here’s the wikipedia page on error-correcting codes:
So, the first method efficient enough to merit mention on the wikipedia page at all was developed by the guy who shared an office with Shannon, within a few years of Shannon’s development of information theory. And there had been nothing even remotely as efficient/effective for centuries before. If that’s not a smoking gun for counterfactual impact, then I don’t know what is.
Look at the date though. 1944. Might there have been some other external event causing the quantity of noisy channel transmitted information (such as radio messages) to have been increased at that point in time?
To use any form of error correcting code (vs simply a repeat of the message) you need a digital computer to apply the code. ENIAC is 1945.
Counterfactual. Shannon is licking the stamp to submit the key paper for peer review when an assassin shoots him and sets fire to the office.
Are you saying that human engineers in the 1940s and 50s, at the first point in human history where ECC is even useful or possible, wouldn’t invent a similar scheme? Note you don’t need the theory, you can cook up a code like this by hand by grouping the bits into nibbles and considering all 16 possible nibbles and all 4 single bit errors. You have 64 cases.
Can you find a code that uses less than 1 full nibble of redundancy? Could any of the thousands of other people working on digital computers in that era have thought of the idea?
If someone creates a less efficient (by say 1 bit) code does this change history in a meaningful way?
Remember, sending the message twice is 2 nibbles. Hamming 4,7 is 1 nible + 3 bits. It is very slightly more efficient than double-send where you send 2 nibbles and each gets a parity bit, so:
7 bits total vs 10 bits total. (you can also go up to bytes or words and just use a parity for each, but less parity bits increases the probability of an undetected error)
Interestingly, by learning effect laws (Moore’s) there is negligible gain here. 30% is nothing if the cost of a transistor is halving every 2-3 years. This may apply to AI as well, for example any linear improvement by a small factor is nothing.
Given the lack of simultaneous discovery over at least 5 years, no, nobody else thought of that idea (or if they did, they didn’t know how to make it work efficiently).
Look, man, you can pick any date in past 200 years and there will be Significant and Plausibly-Relevant events which happened around that time. In this case, I’d say the relevant events are less plausibly counterfactually impactful than average—after all, WWII was not the first war with widespread radio use (also this was at Bell Labs, their big use case was telephone lines), and no you do not need a digital computer to apply any error-correcting codes (I’ve personally done it by hand in homework assignments, there’s a learning curve but the complexity is not prohibitive). It is not at all plausible that the 40′s and 50′s were the first point in history where error-correcting codes were possible, and unlikely that they were the first point in history where error-correcting codes would have been useful.
We can come along today and see how easy it would have been for somebody to figure this stuff out, but that’s hindsight bias. Insofar as we can glean evidence about counterfactual worlds by looking at historical evidence at all, the evidence here generally points to the role of information theory being counterfactual: there was not prior or simultaneous discovery, and the main discovery which did happen was by the guy sharing an office with Shannon, within a few years of information theory.
(Notably, evidence against those core facts are the main things which would change my mind here: if there were prior or simultaneous discovery of reasonably-efficient error-correcting codes, by someone who didn’t know about info theory, then that would be clear evidence that Shannon’s work was not counterfactual for efficient error-correcting codes.)
You need a digital computer for the code to be practical. Otherwise why not just repeat every message 2-3 times? When humans are doing it, the cost of sending a message twice is less than the time it takes a human computer to do the math to reconstruct a message that may have been corrupted. This is because a human telegraph operator is also the I/O.
When I googled for it I found memoirs from the time period that credit hamming so I can’t prove the counterfactual, just note that there are a lot of possible encoding schemes that let you reconstruct corrupt messages.
But yeah sure, I am thinking from hindsight. I have done byte level encoding for communications on a few projects so I am very familiar with this, but obviously it was 60 years later.
One concrete example of a case where I expect error-correcting codes (along with compression) would have been well worth the cost: 19th-century transatlantic telegraph messages, and more generally messages across lines bottlenecked mainly by the capacity of a noisy telegraph line. In those cases, five minutes for a human to encode/decode messages and apply error correction would probably have been well worth the cost for many users during peak demand. (And that’s assuming they didn’t just automate the encoding/decoding; that task is simple enough that a mechanical device could probably do it.)
For the very noisy early iterations of the line, IIRC messages usually had to be sent multiple times, and in that case especially I’d expect efficient error-correcting codes to do a lot better.