I would not consider error-correcting codes one of the more central Shannon discoveries. They were a major application which Shannon’s discoveries highlighted, but not one of the more counterfactually impactful ideas in their own right.
I have a whole post here about how Shannon’s discoveries qualitatively changed the way people thought about information. Very briefly: the major idea is that information/channel capacity is fungible. We do not need different kinds of channels to carry different kinds of information efficiently. Roughly speaking, any channel just has a single quantitative “capacity” (measurable in bits), and any signal has a single quantitative “entropy” (measurable in bits), and so long as the entropy is less than the capacity we can send the signal over the channel with arbitrarily high reliability and asymptotically tiny overhead.
That’s the core idea which I expect the vast majority of technical people pre-Shannon did not see coming, and were not close to figuring out.
So, rather than solving a problem that had been unsolved for the better part of a century, I think that Shannon was instead probably one of the first hundred humans who encountered this problem.
Computers were not especially central to information theory IIUC, and Shannon was certainly not one of the first hundred (or thousand, or probably ten thousand) people to work on cryptography.
On an outside view… we can compare Shannon to another case at Bell Labs where the nominal discoverers clearly did not have much counterfactual impact: the transistor. Here was my takeaway on that invention from reading The Idea Factory:
There were really two transistor inventions, back to back: Bardain and Brattain’s point-contact transistor, and then Schockley’s transistor. Throughout, the group was worried about some outside group beating them to the punch (i.e. the patent). There were semiconductor research labs at universities (e.g. at Purdue; see pg 97), and the prospect of one of these labs figuring out a similar device was close enough that the inventors were concerned about being scooped.
That’s the sort of historical evidence which strongly indicates many people were on track to the invention. And that’s the sort of thing which we notably do not have, in Shannon’s case. The absence of that sort of evidence is at least some evidence that Shannon’s work was highly counterfactual.
Probably part of the difference is that, in the case of the transistor, there clearly was a problem there waiting to be solved, and multiple groups worked on that problem. In the case of information theory, it wasn’t clear that there was a problem to be solved at all: people mostly just assumed that of course different kinds of information need different kinds of transmission hardware, sure you can transmit Morse code over telephone but it’s going to be wildly inefficient because the phone lines were optimized for something else.
As you put it, Shannon was “probably one of the first hundred humans who encountered this problem”. But that’s not because only a hundred humans had worked on cryptography; it’s because there wasn’t obviously a problem there to be solved.
Probably part of the difference is that, in the case of the transistor, there clearly was a problem there waiting to be solved, and multiple groups worked on that problem
Yeah, I think that’s really the crux there. Whether the problem is legible enough for there to be a way to reliably specify it to anyone with the relevant background knowledge, vs. so vague and hazy you need unusual intuition to even suspect that there may be something in that direction.
So I think we might be talking past each other a bit. I don’t really have a strong view on whether Shannon’s work represented a major theoretical advancement. The specific thing I doubt is that Shannon’s work had significant counterfactual impacts on the speed with which it became practical to do specific things with computers.
This was why I was focusing on error correcting codes. Is there some other practical task which people wanted to do before Shannon’s work but were unable to do, which Shannon’s work enabled, and which you believe would have taken at least 5 years longer had Shannon not done his theoretical work?
Is there some other practical task which people wanted to do before Shannon’s work but were unable to do, which Shannon’s work enabled, and which you believe would have taken at least 5 years longer had Shannon not done his theoretical work?
This is an interesting question. Let’s come at it from first principles.
Going by the model in my own comment, Shannon’s work was counterfactually impactful plausibly because most people didn’t realize there was a problem to be solved there in the first place. So, in terms of practical applications, his work would be counterfactual mainly for things which people wouldn’t even have thought to try or realized was possible prior to information theory.
With that lens in mind, let’s go back to error-correcting codes; I can see now why you were looking there for examples.
Natural guess: Shannon’s counterfactual impact on error-correcting codes was to clearly establish the limits of what’s possible, so that code-designers knew what to aim for. It’s roughly analogous to the role of Joseph Black’s theory of latent heat in the history of the steam engine: before that theory, engines were wildly inefficient; Watt’s main claim to fame was to calculate where the heat was used, realize that mostly it went to warming and cooling the cylinder rather than the steam, then figure out a way to avoid that, resulting in massive efficiency gains. That’s the sort of thing I’d a-priori expect to see in the history of error-correcting codes: people originally did wildly inefficient things (like e.g. sending messages without compression so receivers could easily correct typos, or duplicating messages). Then, post-Shannon, people figured out efficient codes.
And I think the history bears that out. Here’s the wikipedia page on error-correcting codes:
The American mathematician Richard Hamming pioneered this field in the 1940s and invented the first error-correcting code in 1950: the Hamming (7,4) code.
So, the first method efficient enough to merit mention on the wikipedia page at all was developed by the guy who shared an office with Shannon, within a few years of Shannon’s development of information theory. And there had been nothing even remotely as efficient/effective for centuries before. If that’s not a smoking gun for counterfactual impact, then I don’t know what is.
And there had been nothing even remotely as efficient/effective for centuries before. If that’s not a smoking gun for counterfactual impact, then I don’t know what is.
Look at the date though. 1944. Might there have been some other external event causing the quantity of noisy channel transmitted information (such as radio messages) to have been increased at that point in time?
To use any form of error correcting code (vs simply a repeat of the message) you need a digital computer to apply the code. ENIAC is 1945.
Counterfactual. Shannon is licking the stamp to submit the key paper for peer review when an assassin shoots him and sets fire to the office.
Are you saying that human engineers in the 1940s and 50s, at the first point in human history where ECC is even useful or possible, wouldn’t invent a similar scheme? Note you don’t need the theory, you can cook up a code like this by hand by grouping the bits into nibbles and considering all 16 possible nibbles and all 4 single bit errors. You have 64 cases.
Can you find a code that uses less than 1 full nibble of redundancy? Could any of the thousands of other people working on digital computers in that era have thought of the idea?
If someone creates a less efficient (by say 1 bit) code does this change history in a meaningful way?
Remember, sending the message twice is 2 nibbles. Hamming 4,7 is 1 nible + 3 bits. It is very slightly more efficient than double-send where you send 2 nibbles and each gets a parity bit, so:
7 bits total vs 10 bits total. (you can also go up to bytes or words and just use a parity for each, but less parity bits increases the probability of an undetected error)
Interestingly, by learning effect laws (Moore’s) there is negligible gain here. 30% is nothing if the cost of a transistor is halving every 2-3 years. This may apply to AI as well, for example any linear improvement by a small factor is nothing.
Could any of the thousands of other people working on digital computers in that era have thought of the idea?
Given the lack of simultaneous discovery over at least 5 years, no, nobody else thought of that idea (or if they did, they didn’t know how to make it work efficiently).
Look, man, you can pick any date in past 200 years and there will be Significant and Plausibly-Relevant events which happened around that time. In this case, I’d say the relevant events are less plausibly counterfactually impactful than average—after all, WWII was not the first war with widespread radio use (also this was at Bell Labs, their big use case was telephone lines), and no you do not need a digital computer to apply any error-correcting codes (I’ve personally done it by hand in homework assignments, there’s a learning curve but the complexity is not prohibitive). It is not at all plausible that the 40′s and 50′s were the first point in history where error-correcting codes were possible, and unlikely that they were the first point in history where error-correcting codes would have been useful.
We can come along today and see how easy it would have been for somebody to figure this stuff out, but that’s hindsight bias. Insofar as we can glean evidence about counterfactual worlds by looking at historical evidence at all, the evidence here generally points to the role of information theory being counterfactual: there was not prior or simultaneous discovery, and the main discovery which did happen was by the guy sharing an office with Shannon, within a few years of information theory.
(Notably, evidence against those core facts are the main things which would change my mind here: if there were prior or simultaneous discovery of reasonably-efficient error-correcting codes, by someone who didn’t know about info theory, then that would be clear evidence that Shannon’s work was not counterfactual for efficient error-correcting codes.)
You need a digital computer for the code to be practical. Otherwise why not just repeat every message 2-3 times? When humans are doing it, the cost of sending a message twice is less than the time it takes a human computer to do the math to reconstruct a message that may have been corrupted. This is because a human telegraph operator is also the I/O.
When I googled for it I found memoirs from the time period that credit hamming so I can’t prove the counterfactual, just note that there are a lot of possible encoding schemes that let you reconstruct corrupt messages.
But yeah sure, I am thinking from hindsight. I have done byte level encoding for communications on a few projects so I am very familiar with this, but obviously it was 60 years later.
One concrete example of a case where I expect error-correcting codes (along with compression) would have been well worth the cost: 19th-century transatlantic telegraph messages, and more generally messages across lines bottlenecked mainly by the capacity of a noisy telegraph line. In those cases, five minutes for a human to encode/decode messages and apply error correction would probably have been well worth the cost for many users during peak demand. (And that’s assuming they didn’t just automate the encoding/decoding; that task is simple enough that a mechanical device could probably do it.)
For the very noisy early iterations of the line, IIRC messages usually had to be sent multiple times, and in that case especially I’d expect efficient error-correcting codes to do a lot better.
I would not consider error-correcting codes one of the more central Shannon discoveries. They were a major application which Shannon’s discoveries highlighted, but not one of the more counterfactually impactful ideas in their own right.
I have a whole post here about how Shannon’s discoveries qualitatively changed the way people thought about information. Very briefly: the major idea is that information/channel capacity is fungible. We do not need different kinds of channels to carry different kinds of information efficiently. Roughly speaking, any channel just has a single quantitative “capacity” (measurable in bits), and any signal has a single quantitative “entropy” (measurable in bits), and so long as the entropy is less than the capacity we can send the signal over the channel with arbitrarily high reliability and asymptotically tiny overhead.
That’s the core idea which I expect the vast majority of technical people pre-Shannon did not see coming, and were not close to figuring out.
Computers were not especially central to information theory IIUC, and Shannon was certainly not one of the first hundred (or thousand, or probably ten thousand) people to work on cryptography.
On an outside view… we can compare Shannon to another case at Bell Labs where the nominal discoverers clearly did not have much counterfactual impact: the transistor. Here was my takeaway on that invention from reading The Idea Factory:
That’s the sort of historical evidence which strongly indicates many people were on track to the invention. And that’s the sort of thing which we notably do not have, in Shannon’s case. The absence of that sort of evidence is at least some evidence that Shannon’s work was highly counterfactual.
Probably part of the difference is that, in the case of the transistor, there clearly was a problem there waiting to be solved, and multiple groups worked on that problem. In the case of information theory, it wasn’t clear that there was a problem to be solved at all: people mostly just assumed that of course different kinds of information need different kinds of transmission hardware, sure you can transmit Morse code over telephone but it’s going to be wildly inefficient because the phone lines were optimized for something else.
As you put it, Shannon was “probably one of the first hundred humans who encountered this problem”. But that’s not because only a hundred humans had worked on cryptography; it’s because there wasn’t obviously a problem there to be solved.
Yeah, I think that’s really the crux there. Whether the problem is legible enough for there to be a way to reliably specify it to anyone with the relevant background knowledge, vs. so vague and hazy you need unusual intuition to even suspect that there may be something in that direction.
So I think we might be talking past each other a bit. I don’t really have a strong view on whether Shannon’s work represented a major theoretical advancement. The specific thing I doubt is that Shannon’s work had significant counterfactual impacts on the speed with which it became practical to do specific things with computers.
This was why I was focusing on error correcting codes. Is there some other practical task which people wanted to do before Shannon’s work but were unable to do, which Shannon’s work enabled, and which you believe would have taken at least 5 years longer had Shannon not done his theoretical work?
This is an interesting question. Let’s come at it from first principles.
Going by the model in my own comment, Shannon’s work was counterfactually impactful plausibly because most people didn’t realize there was a problem to be solved there in the first place. So, in terms of practical applications, his work would be counterfactual mainly for things which people wouldn’t even have thought to try or realized was possible prior to information theory.
With that lens in mind, let’s go back to error-correcting codes; I can see now why you were looking there for examples.
Natural guess: Shannon’s counterfactual impact on error-correcting codes was to clearly establish the limits of what’s possible, so that code-designers knew what to aim for. It’s roughly analogous to the role of Joseph Black’s theory of latent heat in the history of the steam engine: before that theory, engines were wildly inefficient; Watt’s main claim to fame was to calculate where the heat was used, realize that mostly it went to warming and cooling the cylinder rather than the steam, then figure out a way to avoid that, resulting in massive efficiency gains. That’s the sort of thing I’d a-priori expect to see in the history of error-correcting codes: people originally did wildly inefficient things (like e.g. sending messages without compression so receivers could easily correct typos, or duplicating messages). Then, post-Shannon, people figured out efficient codes.
And I think the history bears that out. Here’s the wikipedia page on error-correcting codes:
So, the first method efficient enough to merit mention on the wikipedia page at all was developed by the guy who shared an office with Shannon, within a few years of Shannon’s development of information theory. And there had been nothing even remotely as efficient/effective for centuries before. If that’s not a smoking gun for counterfactual impact, then I don’t know what is.
Look at the date though. 1944. Might there have been some other external event causing the quantity of noisy channel transmitted information (such as radio messages) to have been increased at that point in time?
To use any form of error correcting code (vs simply a repeat of the message) you need a digital computer to apply the code. ENIAC is 1945.
Counterfactual. Shannon is licking the stamp to submit the key paper for peer review when an assassin shoots him and sets fire to the office.
Are you saying that human engineers in the 1940s and 50s, at the first point in human history where ECC is even useful or possible, wouldn’t invent a similar scheme? Note you don’t need the theory, you can cook up a code like this by hand by grouping the bits into nibbles and considering all 16 possible nibbles and all 4 single bit errors. You have 64 cases.
Can you find a code that uses less than 1 full nibble of redundancy? Could any of the thousands of other people working on digital computers in that era have thought of the idea?
If someone creates a less efficient (by say 1 bit) code does this change history in a meaningful way?
Remember, sending the message twice is 2 nibbles. Hamming 4,7 is 1 nible + 3 bits. It is very slightly more efficient than double-send where you send 2 nibbles and each gets a parity bit, so:
7 bits total vs 10 bits total. (you can also go up to bytes or words and just use a parity for each, but less parity bits increases the probability of an undetected error)
Interestingly, by learning effect laws (Moore’s) there is negligible gain here. 30% is nothing if the cost of a transistor is halving every 2-3 years. This may apply to AI as well, for example any linear improvement by a small factor is nothing.
Given the lack of simultaneous discovery over at least 5 years, no, nobody else thought of that idea (or if they did, they didn’t know how to make it work efficiently).
Look, man, you can pick any date in past 200 years and there will be Significant and Plausibly-Relevant events which happened around that time. In this case, I’d say the relevant events are less plausibly counterfactually impactful than average—after all, WWII was not the first war with widespread radio use (also this was at Bell Labs, their big use case was telephone lines), and no you do not need a digital computer to apply any error-correcting codes (I’ve personally done it by hand in homework assignments, there’s a learning curve but the complexity is not prohibitive). It is not at all plausible that the 40′s and 50′s were the first point in history where error-correcting codes were possible, and unlikely that they were the first point in history where error-correcting codes would have been useful.
We can come along today and see how easy it would have been for somebody to figure this stuff out, but that’s hindsight bias. Insofar as we can glean evidence about counterfactual worlds by looking at historical evidence at all, the evidence here generally points to the role of information theory being counterfactual: there was not prior or simultaneous discovery, and the main discovery which did happen was by the guy sharing an office with Shannon, within a few years of information theory.
(Notably, evidence against those core facts are the main things which would change my mind here: if there were prior or simultaneous discovery of reasonably-efficient error-correcting codes, by someone who didn’t know about info theory, then that would be clear evidence that Shannon’s work was not counterfactual for efficient error-correcting codes.)
You need a digital computer for the code to be practical. Otherwise why not just repeat every message 2-3 times? When humans are doing it, the cost of sending a message twice is less than the time it takes a human computer to do the math to reconstruct a message that may have been corrupted. This is because a human telegraph operator is also the I/O.
When I googled for it I found memoirs from the time period that credit hamming so I can’t prove the counterfactual, just note that there are a lot of possible encoding schemes that let you reconstruct corrupt messages.
But yeah sure, I am thinking from hindsight. I have done byte level encoding for communications on a few projects so I am very familiar with this, but obviously it was 60 years later.
One concrete example of a case where I expect error-correcting codes (along with compression) would have been well worth the cost: 19th-century transatlantic telegraph messages, and more generally messages across lines bottlenecked mainly by the capacity of a noisy telegraph line. In those cases, five minutes for a human to encode/decode messages and apply error correction would probably have been well worth the cost for many users during peak demand. (And that’s assuming they didn’t just automate the encoding/decoding; that task is simple enough that a mechanical device could probably do it.)
For the very noisy early iterations of the line, IIRC messages usually had to be sent multiple times, and in that case especially I’d expect efficient error-correcting codes to do a lot better.