__nobody comments on Infohazards, hacking, and Bricking—how to formalize these concepts?

__nobody 10 Mar 2022 22:09 UTC
7 points
Some very quick notes, don’t have time for more:
- It looks to me as if you fail to separate formal models from concrete implementations. Most exploitability results from mismatches between the two. As a concrete example, the PDF spec says (said?) that a PDF starts with a “this is a PDF” comment, e.g. %PDF-1.4. (Right at the start, with nothing before it.) In practice, most readers are happy if that signature sits anywhere in the first 4KiB or something like that. End result are benign things like polyglot files on the one end – look for Ange Albertini and/or the International Journal of PoC||GTFO (“Proof-of-Concept or Get The F*** Out”) – or malicious results like your [guard mechanism – firewall/antivirus/…] parsing things one way and considering them safe, and [target program] parsing them in another way and [blowing up – crashing / running embedded code / …]. As long as you only look at specs / formal models, you cannot talk about the actual effects. You have to actually look at the code / program, have to get down to the base level, to see these problems.
- For more stuff on that look at LANGSEC (very short summary, but the papers quite are readable too) or the stuff that Halvar Flake does. Briefly and slightly over-simplified, the model here is that you have a universal machine (your computer) but want it to do a very specific limited thing, so you write an emulator (your program) for a finite state machine or similar. There are certain state transitions implemented by your code, and the program starts out in a “sane” state. Certain events (weird inputs, memory manipulation, …) might move the program from a sane state into a “weird state”, and after that you can still trigger the transitions intended for sane states, but instead they are run on weird states – you now have a “weird machine” that does unintended things. Another key insight from the LANGSEC stuff is that you write the program, so you probably think you decide what’s going to happen, but actually the program is effectively an interpreter for the input that behaves like a program running on your “interpreter”, so whoever provides the input (files, network packets, …) actually decides what happens. (Within the bounds set by your code, but those are often uncomfortably loose.) Not only is code data (as taught by Lisp), but data is also code.
- Formal models can’t be exploitable in this sense, as they are the specification. What can happen is that there’s logic bugs in the spec, but that is a very different class of problems. So an abstract Turing machine can’t really be hackable as it will behave exactly according to specification (as it is the spec), the program implemented by the TM might be faulty and exploitable, and a concrete implementation of the TM might be too. (Of course, logic bugs can also be very very bad.)
- Yitz 11 Mar 2022 0:26 UTC
  1 point
  Parent
  Thanks for the insightful comment! I’ll look through the links you provided, but I think we’re in agreement here (though you put it far more succinctly); formal models are not exploitable in the typical sense of the word. That’s why I’m interested in what I’m tentatively calling “Brickability” (though my hope is this concept isn’t new and it already has a name I could google)—a binary string which renders all further input to be irrelevant to the output (for a given Turing machine). For now I’m not really worried about concrete implementations, since I’m not fully sure my formal model is even consistent or non-trivial yet.
  - __nobody 12 Mar 2022 14:46 UTC
    5 points
    Parent
    I can’t recall specific names / specific treatments of this, but I’m also relatively sure that it’s kinda familiar, so I suspect that something exists there. Problem is that, in some sense, it falls right between areas.
    
    On the practical side, people don’t really care where the problem originates, they just want to fix it. (And people are swimming in abstraction layers and navigate them intuitively, so a “this is one layer up” doesn’t really stand out, as it’s still the bottom layer of some other abstraction.) So from the perspective of computer security, it just falls into the big bucket of what’s called a “denial of service” vulnerability (among lots of other things like just temporarily slowing things down, or making a system do 2^51 compute steps (i.e. locking it up effectively forever but not “truly” forever) or making it drown in traffic from outside), but there’s generally no category separation between “possible bugs” (that just happen to be in one specific implementation) and “necessary bugs” (the “logic bugs” that are in the spec, where every conforming program will end up with the same problem.)
    
    On the theoretical side, you mostly work with bounded recursion instead of possibly unbounded loops, so the problem is more-or-less impossible. (And while Turing machines exist on the theory side too, working with them sucks so it doesn’t really happen a lot, so people don’t really notice problems that occur when working with them. And so it’s mostly bounded recursion.) As an example, if you write your program in Coq, you’re forced to prove that every recursive computation terminates in order to be allowed to compile (and run) your program. So as soon as you care even a little about verification, the problem basically disappears / becomes invisible. In theory, the path to reaching a fixed point by reacting to each input in exactly the same way is still open (by messing up your state), but that’s very unlikely to hit by chance (especially if you write in the typical stateless style), so people might not stumble on that ever, so the chances of thorough exploration are rather low too. (Still feels like there was something, somewhere...)
    
    In other areas:
    
    From the perspective of dynamical systems, it’s just a fixed point.
    In temporal logic, it’s something that would be happening “eventually forever”.
    
    If you can’t find anything after using that to find more keywords, I’d suggest you ask someone at the Santa Fe Institute, they do lots of dynamical systems stuff, and if it exists at all, there’s a pretty good chance that someone there has heard of it. (I’ve seen them mention kinda nearby concepts in one of the “Complexity Explorer” courses. According to that page, Cristopher Moore does a computability etc. course soon-ish, so he should either be the right person to ask or know who to forward it to.)