This is an attempt to formulate a thought that’s been rolling around in my brain for a few years now. I want to formalize a hypothesis, but I’m not sure how to convey what the hypothesis even is. Hopefully by writing this out I’ll help make that happen (spoiler alert: I partially succeeded?), or someone reading this will be able to help me here.
Working towards the theme:
The idea starts with the classic LessWrong meme of an infohazard (defined here as some form of external sensory “input”) that when received by its “target” causes that target harm. This could be something wholly internal, like increasing depression, but might have external effects as well. A victim might either do something they wouldn’t otherwise do of their own volition, or could even collapse completely, in the case of infohazards invoking seizure or death. The idea of an infohazard is not merely theoretical. We know from experience that visual and auditory infohazards exist for individual human beings. Examples range from sad stories which make people temporarily depressed, to people “brainwashed” by cults to do stuff they would otherwise never have done, and even to people who get seizures or death at certain inputs, like flashing lights. From that basis in fact, one can generalize to arrive at the hypothesis that for every individual human, such an infohazard exists (even if unconstructable in practice). An even stronger (but somewhat less plausible) hypothesis would be that there exists singular infohazards which would work for most or even all of humanity. On a more(?) theoretical level, it seems reasonable that a being with complete knowledge of a human brain and unlimited compute power could “trivially” construct an infohazard tailored to that brain. Could such a being control the brain’s output in some arbitrary manner by giving it the right inputs, or are limitations on how far even an optimal infohazard could go?
We are far from understanding the human brain deeply enough to even begin to answer these questions, so let’s try to abstract the situation a bit by considering the field of computing, where we have (arguably) greater scientific and mathematical understanding of the field. The best analogy to an infohazard in computer science would seem to be found in the practice of hacking.
“Hacking” as a term covers many different modes of action, but one popular method of hacking can be generalized as “inserting an input through whatever the normal input channel is, with the input designed to make the machine output something different than what it’s creators intended.” That last bit can be clarified into the formulation “different than what was intended by whoever built the machine,” or “different than how the literal code should have acted.” Writing this down, I’m finding it very hard to remove the sense of intentionality from the code here. Presumably the hacker is using normal physics, and the computer is treating the input exactly as it has been “designed” to treat it, albeit not how it’s developers expected. So in a sense the hacker isn’t breaking anything other than the constraints of the designer’s imagination. This seems like a dead end, since we aren’t talking about computers anymore.
Before abandoning this line of thinking, it might be worth exploring a little bit more, however. Another way to consider hacking is by thinking about the internal state of the machine, rather than its output. A definition of hacking from that perspective might be “inserting an input through whatever the normal input channel is, with the input designed to make the machine’s internal state be different in some significant way from its creators’ intentions.” This usually involves having access to parts of the machine which you should not have access to. Ultimately the hacker (presumably) only cares about the ultimate output (which admittedly might simply be information about the computer’s internal state), so it might just be a different way of thinking about the same thing, and an equivalent definition.
(Btw, I’m specifying “inserting an input through whatever the normal input channel is,” since some hacking techniques involve finding an input channel which shouldn’t have been accepting input, was never meant to be used as such, and/or didn’t even exist until the hacker broke the machine in some glorious hackery way.)
There are some hacks that only attack and harm individual computers, but quite often these days, hacks can be effective across any computer running a certain operating system. As long as the right input is given, the hacker has free reign to mess around with the output as they wish. One can easily imagine some sort of “super-hack” — an input which not only hacks one’s OS, but all computers (of some semi-arbitrarily “significant” complexity) available on the market. The input code could be pretty ugly and glued-together, and I’m not sure why anyone would bother doing all that in a single input, but I don’t think there’s any technical reason why that couldn’t work in theory (or maybe there is, idk). Of course that wouldn’t effect computers with no (or extremely limited) input channels, but that would be the human equivalent of being blind before The Image That Makes You Crazy appears. No input, no issue! Certainly machines that are similar enough to each other in certain ways can be hacked with identical techniques, if nothing else.
Generalizing even further, what about Turing machines? What would it mean to “hack” a Turing machine? All of the conceptions of hacking developed above seem to depend upon some amount of human intentionality, and that intentionality being subverted. Turing machines do not have intentionality, as they exist platonically, and run blindly. You cannot “break” a Turing machine in any conventional sense. How do we take human intentions or design out of the picture?
Looking back at both computers and humans, one relatively objective state a hack or an infohazard can lead to is a systems crash. In a computer, that would mean “bricking,” where no further input can alter the output, and in humans that would mean death or coma, where, again, no further input will change anything. Bricking or deathly infohazards are more limited in form than hacking and mind control, respectively, but may be easier to translate into a formalized notion.
Presumably, a formalized conception of “bricking” could lead to some interesting and potentially really important theorems, which may also be of help when discussing real-world hacking, and more ambitiously, human infohazards.
In the context of a Turing machine, I would define a Brick (capitalized to indicate this particular meaning) as being “input which causes all further input to make no difference/be irrelevant to output”. A Bricked Turing machine need not halt, nor run forever, as long as there is no possible input which would alter the status of the output post-Brick. If the Turing machine is indifferent to input no matter what initial input is given (such as one which only outputs 0, or immediately halts at the first step, or outputs an infinite sequence of 1s), that should not count as a Bricked machine, since there was never an initial Brick in the first place. Therefore, a Brickable machine must be sensitive to at least some initial inputs, in the sense that by changing the input, one will receive differing outputs. I am sure there are many trivial properties of Brickable Turing machines it should be possible to talk about, but I have yet to explore this in any more depth. Hopefully, there may also be non-trivial properties unique to Brickable machines, which might shed light on some of the areas I’ve covered above.
What I would like to know is if this concept already exists in computer science, and if so, to what depth has it been studied? What can we formally prove about Brickability of various machines? If this is a novel concept, do you think this is worthy of future study?
Apologies if I just reinvented something that’s already well-known, or if there’s a fatal flaw in my formalism that renders it totally useless. I don’t have formal training in the area, so I’m sure I have a lot of blind spots.
[Question] Infohazards, hacking, and Bricking—how to formalize these concepts?
This is an attempt to formulate a thought that’s been rolling around in my brain for a few years now. I want to formalize a hypothesis, but I’m not sure how to convey what the hypothesis even is. Hopefully by writing this out I’ll help make that happen (spoiler alert: I partially succeeded?), or someone reading this will be able to help me here.
Working towards the theme: The idea starts with the classic LessWrong meme of an infohazard (defined here as some form of external sensory “input”) that when received by its “target” causes that target harm. This could be something wholly internal, like increasing depression, but might have external effects as well. A victim might either do something they wouldn’t otherwise do of their own volition, or could even collapse completely, in the case of infohazards invoking seizure or death. The idea of an infohazard is not merely theoretical. We know from experience that visual and auditory infohazards exist for individual human beings. Examples range from sad stories which make people temporarily depressed, to people “brainwashed” by cults to do stuff they would otherwise never have done, and even to people who get seizures or death at certain inputs, like flashing lights. From that basis in fact, one can generalize to arrive at the hypothesis that for every individual human, such an infohazard exists (even if unconstructable in practice). An even stronger (but somewhat less plausible) hypothesis would be that there exists singular infohazards which would work for most or even all of humanity. On a more(?) theoretical level, it seems reasonable that a being with complete knowledge of a human brain and unlimited compute power could “trivially” construct an infohazard tailored to that brain. Could such a being control the brain’s output in some arbitrary manner by giving it the right inputs, or are limitations on how far even an optimal infohazard could go?
We are far from understanding the human brain deeply enough to even begin to answer these questions, so let’s try to abstract the situation a bit by considering the field of computing, where we have (arguably) greater scientific and mathematical understanding of the field. The best analogy to an infohazard in computer science would seem to be found in the practice of hacking.
“Hacking” as a term covers many different modes of action, but one popular method of hacking can be generalized as “inserting an input through whatever the normal input channel is, with the input designed to make the machine output something different than what it’s creators intended.” That last bit can be clarified into the formulation “different than what was intended by whoever built the machine,” or “different than how the literal code should have acted.” Writing this down, I’m finding it very hard to remove the sense of intentionality from the code here. Presumably the hacker is using normal physics, and the computer is treating the input exactly as it has been “designed” to treat it, albeit not how it’s developers expected. So in a sense the hacker isn’t breaking anything other than the constraints of the designer’s imagination. This seems like a dead end, since we aren’t talking about computers anymore. Before abandoning this line of thinking, it might be worth exploring a little bit more, however. Another way to consider hacking is by thinking about the internal state of the machine, rather than its output. A definition of hacking from that perspective might be “inserting an input through whatever the normal input channel is, with the input designed to make the machine’s internal state be different in some significant way from its creators’ intentions.” This usually involves having access to parts of the machine which you should not have access to. Ultimately the hacker (presumably) only cares about the ultimate output (which admittedly might simply be information about the computer’s internal state), so it might just be a different way of thinking about the same thing, and an equivalent definition. (Btw, I’m specifying “inserting an input through whatever the normal input channel is,” since some hacking techniques involve finding an input channel which shouldn’t have been accepting input, was never meant to be used as such, and/or didn’t even exist until the hacker broke the machine in some glorious hackery way.)
There are some hacks that only attack and harm individual computers, but quite often these days, hacks can be effective across any computer running a certain operating system. As long as the right input is given, the hacker has free reign to mess around with the output as they wish. One can easily imagine some sort of “super-hack” — an input which not only hacks one’s OS, but all computers (of some semi-arbitrarily “significant” complexity) available on the market. The input code could be pretty ugly and glued-together, and I’m not sure why anyone would bother doing all that in a single input, but I don’t think there’s any technical reason why that couldn’t work in theory (or maybe there is, idk). Of course that wouldn’t effect computers with no (or extremely limited) input channels, but that would be the human equivalent of being blind before The Image That Makes You Crazy appears. No input, no issue! Certainly machines that are similar enough to each other in certain ways can be hacked with identical techniques, if nothing else.
Generalizing even further, what about Turing machines? What would it mean to “hack” a Turing machine? All of the conceptions of hacking developed above seem to depend upon some amount of human intentionality, and that intentionality being subverted. Turing machines do not have intentionality, as they exist platonically, and run blindly. You cannot “break” a Turing machine in any conventional sense. How do we take human intentions or design out of the picture?
Looking back at both computers and humans, one relatively objective state a hack or an infohazard can lead to is a systems crash. In a computer, that would mean “bricking,” where no further input can alter the output, and in humans that would mean death or coma, where, again, no further input will change anything. Bricking or deathly infohazards are more limited in form than hacking and mind control, respectively, but may be easier to translate into a formalized notion.
Presumably, a formalized conception of “bricking” could lead to some interesting and potentially really important theorems, which may also be of help when discussing real-world hacking, and more ambitiously, human infohazards.
In the context of a Turing machine, I would define a Brick (capitalized to indicate this particular meaning) as being “input which causes all further input to make no difference/be irrelevant to output”. A Bricked Turing machine need not halt, nor run forever, as long as there is no possible input which would alter the status of the output post-Brick. If the Turing machine is indifferent to input no matter what initial input is given (such as one which only outputs 0, or immediately halts at the first step, or outputs an infinite sequence of 1s), that should not count as a Bricked machine, since there was never an initial Brick in the first place. Therefore, a Brickable machine must be sensitive to at least some initial inputs, in the sense that by changing the input, one will receive differing outputs. I am sure there are many trivial properties of Brickable Turing machines it should be possible to talk about, but I have yet to explore this in any more depth. Hopefully, there may also be non-trivial properties unique to Brickable machines, which might shed light on some of the areas I’ve covered above.
What I would like to know is if this concept already exists in computer science, and if so, to what depth has it been studied? What can we formally prove about Brickability of various machines? If this is a novel concept, do you think this is worthy of future study? Apologies if I just reinvented something that’s already well-known, or if there’s a fatal flaw in my formalism that renders it totally useless. I don’t have formal training in the area, so I’m sure I have a lot of blind spots.