Eliezer has written about the notion of security mindset, and there’s an important idea that attaches to that phrase, which some people have an intuitive sense of and ability to recognize, but I don’t think Eliezer’s post quite captured the essence of the idea, or presented anything like a usable roadmap of how to acquire it.
An1lam’s recent shortform post talked about the distinction between engineering mindset and scientist mindset, and I realized that, with the exception of Eliezer and perhaps a few people he works closely with, all of the people I know of with security mindset are engineer-types rather than scientist-types. That seemed like a clue; my first theory was that the reason for this is because engineer-types get to actually write software that might have security holes, and have the feedback cycle of trying to write secure software. But I also know plenty of otherwise-decent software engineers who don’t have security mindset, at least of the type Eliezer described.
My hypothesis is that to acquire security mindset, you have to:
Practice optimizing from a red team/attacker perspective,
Practice optimizing from a defender perspective; and
Practice modeling the interplay between those two perspectives.
So a software engineer can acquire security mindset because they practice writing software which they don’t want to have vulnerabilities, they practice searching for vulnerabilities (usually as an auditor simulating an attacker rather as an actual attacker, but the cognitive algorithm is the same), and they practice going meta when they’re designing the architecture of new projects. This explains why security mindset is very common among experienced senior engineers (who have done each of the three many times), and rare among junior engineers (who haven’t yet). It explains how Eliezer can have security mindset: he alternates between roleplaying a future AI-architect trying to design AI control/alignment mechanisms, roleplaying a future misaligned-AI trying to optimize around them, and going meta on everything-in-general. It also predicts that junior AI scientists won’t have this security mindset, and probably won’t acquire it except by following a similar cognitive trajectory.
Which raises an interesting question: how much does security mindset generalize between domains? Ie, if you put Theo de Raadt onto a hypothetical future AI team, would he successfully apply the same security mindset there as he does to general computer security?
Some evidence that security mindset generalizes across at least some domains: the same white hat people who are good at finding exploits in things like kernels seem to also be quite good at finding exploits in things like web apps, real-world companies, and hardware. I don’t have a specific person to give as an example, but this observation comes from going to a CTF competition and talking to some of the people who ran it about the crazy stuff they’d done that spanned a wide array of different areas.
Another slightly different example, Wei Dai is someone who I actually knew about outside of Less Wrong from his early work on cryptocurrency stuff, so he was at least at one point involved in a security-heavy community (I’m of the opinion that early cryptocurrency folks were on average much better about security mindset than the average current cryptocurrency community member). Based on his posts and comments, he generally strikes me as having security mindset style thinking from his comments and from my perspective has contributed a lot of good stuff to AI alignment.
Theo de Raadt is notoriously… opinionated, so it would definitely be interesting to see him thrown on an AI team. That said, I suspect someone like Ralph Merkle, who’s a bona fide cryptography wizard (he invented public key cryptography and Merkle trees!) and is heavily involved in the cryonics and nanotech communities, could fairly easily get up to speed on AI control work and contribute from a unique security/cryptography-oriented perspective. In particular, now that there seems to be more alignment/control work that involves at least exploring issues with concrete proposals, I think someone like this would have less trouble finding ways to contribute. That said, having cryptography experience in addition to security experience does seem helpful. Cryptography people are probably more used to combining their security mindset with their math intuition than your average white-hat hacker.
I’m kinda confused about the relation between cryptography people and security mindset. Looking at the major cryptographic algorithm classes (hashing, symmetric-key, asymmetric-key), it seems pretty obvious that the correct standard algorithm in each class is probably a compound algorithm—hash by xor’ing the results of several highly-dissimilar hash functions, etc, so that a mathematical advance which breaks one algorithm doesn’t break the overall security of the system. But I don’t see anyone doing this in practice, and also don’t see signs of a debate on the topic. That makes me think that, to the extent they have security mindset, it’s either being defeated by political processes in the translation to practice, or it’s weirdly compartmentalized and not engaged with any practical reality or outside views.
In fairness, I’m probably over-generalizing from a few examples. For example, my biggest inspiration from the field of crypto is Daniel J. Bernstein, a cryptographer who’s in part known for building qmail, which has an impressive security track record & guarantee. He discusses principles for secure software engineering in this paper, which I found pretty helpful for my own thinking.
To your point about hashing the results of several different hash functions, I’m actually kind of surprised to hear that this might to protect against the sorts of advances I’d expect to break hash algorithms. I was under the very amateur impression that basically all modern hash functions relied on the same numerical algorithmic complexity (and number-theoretic results). If there are any resources you can point me to about this, I’d be interested in getting a basic understanding of the different assumptions hash functions can depend on.
The issue is that all cryptography depends on one-way functions, so any ability to break a cryptographic algorithm that depends on one-way functions in a scalable way means you have defeated almost all of cryptography in practice.
So in one sense, a mathematical advance on a one-way function underlying a symmetric key algorithm would be disastrous for overall cryptographic prospects.
Can you give some specific examples of me having security mindset, and why they count as having security mindset? I’m actually not entirely sure what it is or that I have it, and would be hard pressed to come up with such examples myself. (I’m pretty sure I have what Eliezer calls “ordinary paranoia” at least, but am confused/skeptical about “deep security”.)
Sure, but let me clarify that I’m probably not drawing as hard a boundary between “ordinary paranoia” and “deep security” as I should be. I think Bruce Schneier’s and Eliezer’s buckets for “security mindset” blended together in the months since I read both posts. Also, re-reading the logistic success curve post reminded me that Eliezer calls into question whether someone who lacks security mindset can identify people who have it. So it’s worth noting that my ability to identify people with security mindset is itself suspect by this criteria (there’s no public evidence that I have security mindset and I wouldn’t claim that I have a consistent ability to do “deep security”-style analysis.)
With that out of the way, here are some of the examples I was thinking of.
First of all, at a high level, I’ve noticed that you seem to consistently question assumptions other posters are making and clarify terminology when appropriate. This seems like a prerequisite for security mindset, since it’s a necessary first step towards constructing systems.
Second and more substantively, I’ve seen you consistently raise concerns about human safety problems (also here. I see this as an example of security mindset because it requires questioning the assumptions implicit in a lot of proposals. The analogy to Eliezer’s post here would be that ordinary paranoia is trying to come up with more ways to prevent the AI from corrupting the human (or something similar) whereas I think a deep security solution would look more like avoiding the assumption that humans are safe altogether and instead seeking clear guarantees that our AIs will be safe even if we ourselves aren’t.
Last, you seem to be unusually willing to point out flaws in your own proposals, the prime example being UDT. The most recent example of this is your comment about the bomb argument, but I’ve seen you do this quite a bit and could find more examples if prompted. On reflection, this may be more of an example of “ordinary paranoia” than “deep security”, but it’s still quite important in my opinion.
Let me know if that clarifies things at all. I can probably come up with more examples of each type if requested, but it will take me some time to keep digging through posts and comments so figured I’d check in to see if what I’m saying makes sense before continuing to dig.
Eliezer has written about the notion of security mindset, and there’s an important idea that attaches to that phrase, which some people have an intuitive sense of and ability to recognize, but I don’t think Eliezer’s post quite captured the essence of the idea, or presented anything like a usable roadmap of how to acquire it.
An1lam’s recent shortform post talked about the distinction between engineering mindset and scientist mindset, and I realized that, with the exception of Eliezer and perhaps a few people he works closely with, all of the people I know of with security mindset are engineer-types rather than scientist-types. That seemed like a clue; my first theory was that the reason for this is because engineer-types get to actually write software that might have security holes, and have the feedback cycle of trying to write secure software. But I also know plenty of otherwise-decent software engineers who don’t have security mindset, at least of the type Eliezer described.
My hypothesis is that to acquire security mindset, you have to:
Practice optimizing from a red team/attacker perspective,
Practice optimizing from a defender perspective; and
Practice modeling the interplay between those two perspectives.
So a software engineer can acquire security mindset because they practice writing software which they don’t want to have vulnerabilities, they practice searching for vulnerabilities (usually as an auditor simulating an attacker rather as an actual attacker, but the cognitive algorithm is the same), and they practice going meta when they’re designing the architecture of new projects. This explains why security mindset is very common among experienced senior engineers (who have done each of the three many times), and rare among junior engineers (who haven’t yet). It explains how Eliezer can have security mindset: he alternates between roleplaying a future AI-architect trying to design AI control/alignment mechanisms, roleplaying a future misaligned-AI trying to optimize around them, and going meta on everything-in-general. It also predicts that junior AI scientists won’t have this security mindset, and probably won’t acquire it except by following a similar cognitive trajectory.
Which raises an interesting question: how much does security mindset generalize between domains? Ie, if you put Theo de Raadt onto a hypothetical future AI team, would he successfully apply the same security mindset there as he does to general computer security?
I like this post!
Some evidence that security mindset generalizes across at least some domains: the same white hat people who are good at finding exploits in things like kernels seem to also be quite good at finding exploits in things like web apps, real-world companies, and hardware. I don’t have a specific person to give as an example, but this observation comes from going to a CTF competition and talking to some of the people who ran it about the crazy stuff they’d done that spanned a wide array of different areas.
Another slightly different example, Wei Dai is someone who I actually knew about outside of Less Wrong from his early work on cryptocurrency stuff, so he was at least at one point involved in a security-heavy community (I’m of the opinion that early cryptocurrency folks were on average much better about security mindset than the average current cryptocurrency community member). Based on his posts and comments, he generally strikes me as having security mindset style thinking from his comments and from my perspective has contributed a lot of good stuff to AI alignment.
Theo de Raadt is notoriously… opinionated, so it would definitely be interesting to see him thrown on an AI team. That said, I suspect someone like Ralph Merkle, who’s a bona fide cryptography wizard (he invented public key cryptography and Merkle trees!) and is heavily involved in the cryonics and nanotech communities, could fairly easily get up to speed on AI control work and contribute from a unique security/cryptography-oriented perspective. In particular, now that there seems to be more alignment/control work that involves at least exploring issues with concrete proposals, I think someone like this would have less trouble finding ways to contribute. That said, having cryptography experience in addition to security experience does seem helpful. Cryptography people are probably more used to combining their security mindset with their math intuition than your average white-hat hacker.
I’m kinda confused about the relation between cryptography people and security mindset. Looking at the major cryptographic algorithm classes (hashing, symmetric-key, asymmetric-key), it seems pretty obvious that the correct standard algorithm in each class is probably a compound algorithm—hash by xor’ing the results of several highly-dissimilar hash functions, etc, so that a mathematical advance which breaks one algorithm doesn’t break the overall security of the system. But I don’t see anyone doing this in practice, and also don’t see signs of a debate on the topic. That makes me think that, to the extent they have security mindset, it’s either being defeated by political processes in the translation to practice, or it’s weirdly compartmentalized and not engaged with any practical reality or outside views.
Combining hash functions is actually trickier than it looks, and some people are doing research in this area and deploying solutions. See https://crypto.stackexchange.com/a/328 and https://tahoe-lafs.org/trac/tahoe-lafs/wiki/OneHundredYearCryptography. It does seem that if cryptography people had more of a security mindset (that are not being defeated) then there would be more research and deployment of this already.
In fairness, I’m probably over-generalizing from a few examples. For example, my biggest inspiration from the field of crypto is Daniel J. Bernstein, a cryptographer who’s in part known for building qmail, which has an impressive security track record & guarantee. He discusses principles for secure software engineering in this paper, which I found pretty helpful for my own thinking.
To your point about hashing the results of several different hash functions, I’m actually kind of surprised to hear that this might to protect against the sorts of advances I’d expect to break hash algorithms. I was under the very amateur impression that basically all modern hash functions relied on the same numerical algorithmic complexity (and number-theoretic results). If there are any resources you can point me to about this, I’d be interested in getting a basic understanding of the different assumptions hash functions can depend on.
The issue is that all cryptography depends on one-way functions, so any ability to break a cryptographic algorithm that depends on one-way functions in a scalable way means you have defeated almost all of cryptography in practice.
So in one sense, a mathematical advance on a one-way function underlying a symmetric key algorithm would be disastrous for overall cryptographic prospects.
Can you give some specific examples of me having security mindset, and why they count as having security mindset? I’m actually not entirely sure what it is or that I have it, and would be hard pressed to come up with such examples myself. (I’m pretty sure I have what Eliezer calls “ordinary paranoia” at least, but am confused/skeptical about “deep security”.)
Sure, but let me clarify that I’m probably not drawing as hard a boundary between “ordinary paranoia” and “deep security” as I should be. I think Bruce Schneier’s and Eliezer’s buckets for “security mindset” blended together in the months since I read both posts. Also, re-reading the logistic success curve post reminded me that Eliezer calls into question whether someone who lacks security mindset can identify people who have it. So it’s worth noting that my ability to identify people with security mindset is itself suspect by this criteria (there’s no public evidence that I have security mindset and I wouldn’t claim that I have a consistent ability to do “deep security”-style analysis.)
With that out of the way, here are some of the examples I was thinking of.
First of all, at a high level, I’ve noticed that you seem to consistently question assumptions other posters are making and clarify terminology when appropriate. This seems like a prerequisite for security mindset, since it’s a necessary first step towards constructing systems.
Second and more substantively, I’ve seen you consistently raise concerns about human safety problems (also here. I see this as an example of security mindset because it requires questioning the assumptions implicit in a lot of proposals. The analogy to Eliezer’s post here would be that ordinary paranoia is trying to come up with more ways to prevent the AI from corrupting the human (or something similar) whereas I think a deep security solution would look more like avoiding the assumption that humans are safe altogether and instead seeking clear guarantees that our AIs will be safe even if we ourselves aren’t.
Last, you seem to be unusually willing to point out flaws in your own proposals, the prime example being UDT. The most recent example of this is your comment about the bomb argument, but I’ve seen you do this quite a bit and could find more examples if prompted. On reflection, this may be more of an example of “ordinary paranoia” than “deep security”, but it’s still quite important in my opinion.
Let me know if that clarifies things at all. I can probably come up with more examples of each type if requested, but it will take me some time to keep digging through posts and comments so figured I’d check in to see if what I’m saying makes sense before continuing to dig.
This comment feels relevant here (not sure if it counts as ordinary paranoia or security mindset).