There’s a fairly specific subset of coding theory that I feel should exist, but I don’t know what it’s called or how to find it. It’s best characterised by needing to obliquely embed subchannels of communication in human-readable text.
Here are some examples of problems that exist in this area:
1a) How do I pass arbitrary concealed data in a body of English text? Let’s say I have a message of length n bits. What would be the most efficient way of obliquely encoding that message so that it passes for plain English text? For example, if I had the message 11001001, I could use a Markov text generator, and for each word, check and force the parity of a checksum of that word, picking eight words whose parities corresponded to [1, 1, 0, 0, 1, 0, 0, 1]. This wouldn’t be very efficient, and it probably wouldn’t be astonishingly comprehensible English, but it starts to address the problem.
1b) Say I have an existing body of well-formed English text, and want to somehow transform it so that it encodes the message 11001001. What variety of transformations would still yield well-formed English text?
2) Given a body of English text, how can I obliquely clue in a knowing observer (or a regex pattern) to the fact that this text is somehow distinguished and worthy of scrutiny for some predetermined purpose? Motivating case: I’ve just encoded the very important message 11001001 as a body of English text, using a method for solving problem #1 above. I want to do something to it so that my confederate will recognise it as containing this message. One way of doing this would be to wrap it in a known initiating and terminating phrase, a bit like ASCII-armour for a PGP key block. For example, I could begin with “Once upon a time...” and end in ”...and they all lived happily ever after”.
3) A group of conspirators want to adopt aliases on an anonymous forum. They want to be able to reliably recognise each other by their online aliases so they can collaborate, but don’t want their affiliation to be apparent to other observers. They could, ahead of time, come up with one or more highly selective rules to which their aliases must conform. The identities of all conspirators would then be known to each other.
I know that sixes_and_sevens probably wants a software solution here, but if they are willing to put up with sending lots of extra data they could just declare the third letter of every sixth word or use some other similarly arbitrary rule to determine which letters were pieces of data in the secret message.
There’s also lots of tools to steganographically embed data in images.
So, my actual motivation is solving a specific problem I set myself: if I have two or more processes running on different machines, with access to the public internet, but no guarantee of locally open ports beyond 80 and 443, no knowledge of any existing sibling processes, and no guarantee that any previously-established communication channel (like a mail server, message queue or coordinating service) is still active, how do they form a peer group?
The first not-completely-terrible solution I hit upon was for them to monitor existing chatter-heavy online services such as Reddit or Wikipedia, use some sort of pre-established scoring rule to identify esoteric topics, and obliquely pass data to one another in comments or discussion of that topic whenever it arises. Image steganography is a good inroad here, since Reddit is very image-heavy and doesn’t draw too much attention to itself, but most services you can use to anonymously host images (like imgur) strip metadata from the images, so dumping some cyphertext in the image header won’t work.
I’m still keen on putting together a text solution, because it interests me, and I find the idea of Reddit bots carrying out human conversations while passing covert information to each other highly appealing.
If cyphertext in image metadata would be acceptably well-hidden, this should be as well: you can hide four bits per pixel using the least significant bits of the RGBA channels. A human won’t be able to tell the difference between #FE007BFE and #FF017AFF.
(This assumes .png. I don’t think it would work very well, if at all, for either jpeg or gif.)
It seems like other people have built solutions that use steganography with the image pixels themselves, given that I see examples on Wikipedia.
I can’t be remotely useful with the text stuff, though, my coding skills are such that I still have trouble generating token frequency histograms from text files and my most persuasive “chatbot” relayed an entirely pregenerated script to the user, only took 1 bit of user input, and was written on a graphing calculator. (I still got my Theory of Knowledge to feel empathy for it and be unwilling to let me delete it in exchange for a cookie, though, so it did prove my point that people can feel empathy for morally irrelevant things, like a program shorter than your above comment.)
There’s a fairly specific subset of coding theory that I feel should exist, but I don’t know what it’s called or how to find it. It’s best characterised by needing to obliquely embed subchannels of communication in human-readable text.
Here are some examples of problems that exist in this area:
1a) How do I pass arbitrary concealed data in a body of English text? Let’s say I have a message of length n bits. What would be the most efficient way of obliquely encoding that message so that it passes for plain English text? For example, if I had the message 11001001, I could use a Markov text generator, and for each word, check and force the parity of a checksum of that word, picking eight words whose parities corresponded to [1, 1, 0, 0, 1, 0, 0, 1]. This wouldn’t be very efficient, and it probably wouldn’t be astonishingly comprehensible English, but it starts to address the problem.
1b) Say I have an existing body of well-formed English text, and want to somehow transform it so that it encodes the message 11001001. What variety of transformations would still yield well-formed English text?
2) Given a body of English text, how can I obliquely clue in a knowing observer (or a regex pattern) to the fact that this text is somehow distinguished and worthy of scrutiny for some predetermined purpose? Motivating case: I’ve just encoded the very important message 11001001 as a body of English text, using a method for solving problem #1 above. I want to do something to it so that my confederate will recognise it as containing this message. One way of doing this would be to wrap it in a known initiating and terminating phrase, a bit like ASCII-armour for a PGP key block. For example, I could begin with “Once upon a time...” and end in ”...and they all lived happily ever after”.
3) A group of conspirators want to adopt aliases on an anonymous forum. They want to be able to reliably recognise each other by their online aliases so they can collaborate, but don’t want their affiliation to be apparent to other observers. They could, ahead of time, come up with one or more highly selective rules to which their aliases must conform. The identities of all conspirators would then be known to each other.
You are talking about steganography.
Thank you.
I know that sixes_and_sevens probably wants a software solution here, but if they are willing to put up with sending lots of extra data they could just declare the third letter of every sixth word or use some other similarly arbitrary rule to determine which letters were pieces of data in the secret message.
There’s also lots of tools to steganographically embed data in images.
So, my actual motivation is solving a specific problem I set myself: if I have two or more processes running on different machines, with access to the public internet, but no guarantee of locally open ports beyond 80 and 443, no knowledge of any existing sibling processes, and no guarantee that any previously-established communication channel (like a mail server, message queue or coordinating service) is still active, how do they form a peer group?
The first not-completely-terrible solution I hit upon was for them to monitor existing chatter-heavy online services such as Reddit or Wikipedia, use some sort of pre-established scoring rule to identify esoteric topics, and obliquely pass data to one another in comments or discussion of that topic whenever it arises. Image steganography is a good inroad here, since Reddit is very image-heavy and doesn’t draw too much attention to itself, but most services you can use to anonymously host images (like imgur) strip metadata from the images, so dumping some cyphertext in the image header won’t work.
I’m still keen on putting together a text solution, because it interests me, and I find the idea of Reddit bots carrying out human conversations while passing covert information to each other highly appealing.
If cyphertext in image metadata would be acceptably well-hidden, this should be as well: you can hide four bits per pixel using the least significant bits of the RGBA channels. A human won’t be able to tell the difference between #FE007BFE and #FF017AFF.
(This assumes .png. I don’t think it would work very well, if at all, for either jpeg or gif.)
It seems like other people have built solutions that use steganography with the image pixels themselves, given that I see examples on Wikipedia.
I can’t be remotely useful with the text stuff, though, my coding skills are such that I still have trouble generating token frequency histograms from text files and my most persuasive “chatbot” relayed an entirely pregenerated script to the user, only took 1 bit of user input, and was written on a graphing calculator. (I still got my Theory of Knowledge to feel empathy for it and be unwilling to let me delete it in exchange for a cookie, though, so it did prove my point that people can feel empathy for morally irrelevant things, like a program shorter than your above comment.)
I think it is not very hard writing something which will encode a hidden phrase using odd and even counts. (But length is key).