Is it possible to decode a file that was deliberately constructed to be decoded, without a priori knowledge? This is vaguely what That Alien Message is about, at least in the first part of the post where aliens are sending a message to humanity.
Is it possible to decode a file that has an arbitrary binary schema, without a priori knowledge? This is the discussion point that I’ve been arguing over with regard to stuff like decoding CAMERA raw formats, or sensor data from a hardware/software system. This is also the area where I disagree with That Alien Message—I don’t think that one-shot examples allow robust generalization.
I don’t think (1) is a particularly interesting question, because last weekend I convinced myself that the answer is yes, you can transfer data in a way that it can be decoded, with very few assumptions on the part of the receiver. I do have a file I created for this purpose. If you want, I’ll send you it.
I started creating a file for (2), but I’m not really sure how to gauge what is “fair” vs “deliberately obfuscated” in terms of encoding. I am conflicted. Even if I stick to encoding techniques I’ve seen in the real world, I feel like I can make choices on this file encoding that make the likelihood of others decoding it very low. That’s exactly what we’re arguing about on (2). However, I don’t think it will be particularly interesting or fun for people trying to decode it. Maybe that’s ok?
I’m not sure either one quite captures exactly what I mean, but I think (1) is probably closer than (2), with the caveat that I don’t think the file necessarily has to be deliberately constructed to be decoded without a-priori knowledge, but it should be constructed to have as close as possible to a 1:1 mapping between the structure of the process used to capture the data and the structure of the underlying data stream.
I notice I am somewhat confused by the inclusion of camera raw formats in (2) rather than in (1) though—I would expect that moving from a file in camera raw format to a jpeg would move you substantially in the direction from (1) to (2).
It sounds like maybe you have something resembling “some sensor data of something unusual in an unconventional but not intentionally obfuscated format”? If so, that sounds pretty much exactly like what I’m looking for.
However, I don’t think it will be particularly interesting or fun for people trying to decode it. Maybe that’s ok?
I think it’s fine if it’s not interesting or fun to decode because nobody can get a handle on the structure—if that’s the case, it will be interesting to see why we are not able to do that, and especially interesting if the file ends up looking like one of the things we would have predicted ahead of time would be decodable.
Which question are we trying to answer?
Is it possible to decode a file that was deliberately constructed to be decoded, without a priori knowledge? This is vaguely what That Alien Message is about, at least in the first part of the post where aliens are sending a message to humanity.
Is it possible to decode a file that has an arbitrary binary schema, without a priori knowledge? This is the discussion point that I’ve been arguing over with regard to stuff like decoding CAMERA raw formats, or sensor data from a hardware/software system. This is also the area where I disagree with That Alien Message—I don’t think that one-shot examples allow robust generalization.
I don’t think (1) is a particularly interesting question, because last weekend I convinced myself that the answer is yes, you can transfer data in a way that it can be decoded, with very few assumptions on the part of the receiver. I do have a file I created for this purpose. If you want, I’ll send you it.
I started creating a file for (2), but I’m not really sure how to gauge what is “fair” vs “deliberately obfuscated” in terms of encoding. I am conflicted. Even if I stick to encoding techniques I’ve seen in the real world, I feel like I can make choices on this file encoding that make the likelihood of others decoding it very low. That’s exactly what we’re arguing about on (2). However, I don’t think it will be particularly interesting or fun for people trying to decode it. Maybe that’s ok?
What are your thoughts?
I’m not sure either one quite captures exactly what I mean, but I think (1) is probably closer than (2), with the caveat that I don’t think the file necessarily has to be deliberately constructed to be decoded without a-priori knowledge, but it should be constructed to have as close as possible to a 1:1 mapping between the structure of the process used to capture the data and the structure of the underlying data stream.
I notice I am somewhat confused by the inclusion of camera raw formats in (2) rather than in (1) though—I would expect that moving from a file in camera raw format to a jpeg would move you substantially in the direction from (1) to (2).
It sounds like maybe you have something resembling “some sensor data of something unusual in an unconventional but not intentionally obfuscated format”? If so, that sounds pretty much exactly like what I’m looking for.
I think it’s fine if it’s not interesting or fun to decode because nobody can get a handle on the structure—if that’s the case, it will be interesting to see why we are not able to do that, and especially interesting if the file ends up looking like one of the things we would have predicted ahead of time would be decodable.
I’ve posted it here https://www.lesswrong.com/posts/BMDfYGWcsjAKzNXGz/eavesdropping-on-aliens-a-data-decoding-challenge.