Graham’s design paradox is a maxim about organizations that says that if key management isn’t competent at ${X}, it’s often impossible for an organization to hire people who are, regardless of how well resourced that organization is. It was once described thusly:
Paul Graham’s Design Paradox is that people who have good taste in UIs can tell when other people are designing good UIs, but most CEOs of big companies lack the good taste to tell who else has good taste. And that’s why big companies can’t just hire other people as talented as Steve Jobs to build nice things for them, even though Steve Jobs certainly wasn’t the best possible designer on the planet. Apple existed because of a lucky history where Steve Jobs ended up in charge. There’s no way for Samsung to hire somebody else with equal talents, because Samsung would just end up with some guy in a suit who was good at pretending to be Steve Jobs in front of a CEO who couldn’t tell the difference.
The paradox has been invoked either directly or as an analogy to explain why the alignment problem is hard, why information security is hard, why identifying good programming talent is hard, and why American car companies don’t design good looking cars.
In some cases, especially in cases where failure is irrecoverable (like in evaluating alignment solutions), overcoming the design paradox is in fact a pressing issue. However I’d argue those cases are probably rare. In practice, most appeals to it to explain organizational incompetence are misattributions of a litany of deeper problems.
One field that seems to most people like it’d be a perfect application of this maxim is information security. Major security breaches are sometimes described as black swan events, because they’re relatively rare and when they happen they can be very disruptive. Your typical large technology company may not get real-world feedback on the performance of a subpar Chief Information Security Officer for years. It’s also extremely difficult to evaluate the security of such systems in vitro if you yourself are not generally good at breaking them, and you generally want to protect them against the best computer hackers in the field and not just the average ones. Perfect example of a case where the design paradox applies, right?
It’s true that companies tend to be bad at picking CISOs, and a very common tendency in the industry is for those executives to be fired every six to eighteen months when they’re scapegoated for incidents. In my estimation, however, this is not generally because executive leadership can’t identify poor security practices. There actually happens to be a perfectly legible way for nontechnical people at most large companies to analyze their security teams’ performance: penetration testing and red teaming.
The idea behind red teaming is that you pay a company like SpectreOps to try and breach specific key assets of yours, and then give you a report with all of the security holes they were able to find during that period. These firms will generally happily give you security recommendations and elaborate on deeper underlying problems with your tech policy, in ways that are understandable to smart nonexperts. Red teaming works because for most companies, surviving a comprehensive penetration test is a good (though not unmistakable) indicator that you’ll survive the attention of the parties that are going to look at your company over the next couple years or so.
Almost every Fortune 500 company knows about this, and gets penetration tests performed regularly on their critical systems. There is not necessarily a ready made pentesting service out there for all imaginable security requirements, but it turns out most companies have similar needs and the standard occasional checkup is basically sufficient. And even when large companies can’t identify the good penetration testing services on their first try, they’re still usually able to cycle through a bunch of firms and then continue to use the ones that are able to come up with solid results. With enough money you can develop similarly conclusive testing and evaluation strategies for many other concerns, like A/B testing UX designs and paneling consumers for feedback on aesthetics.
So, why do firms still suck at picking information security executives, or struggle with these kinds of left field hires in general? Well, it’s complicated:
Performance testing might be expensive. Hiring a computer hacker at a “known good” pentesting company to hack your shit costs something like 20k/head for a five day engagement. Paying a team of hackers to simulate anything like an “APT” (a buzzword for The Chinese or The Russians), in a 3+ month engagement, is a service you can buy but costs hundreds of thousands of dollars or more.
Principal agent problems can ruin otherwise perfectly acceptable mechanisms for evaluating employee performance. One frustratingly common cause of such problems in security is companies delegating the arrangement of security tests to the CISO. Obviously, your CISO wants a report that makes them look good, and so they’ll be incentivized to pick penetration testing companies they expect won’t find anything too damaging, or skew the results of such tests by allowing their staff advanced warning or doing them on a set schedule. The pentesting company also realizes this, and wants to be hired again. So they will often habitually downplay their findings to leadership in a way that makes the security team look good, at least in the report’s abstract, which is the only thing an executive is ever going to read. Even when leadership is competent enough not to give this task to their security department, a CEO themselves might still be incentivized to inflate their security confidence to a board or shareholders.
The tests they use can ossify as a target for Goodharting over time. Maybe all your penetration tests only happen during a time limit of five days, and so when someone comes along and decides to size you up for longer than that you’re unprepared. Maybe your organization changes security posture a lot and only has the tests when it’s been doing something for a while instead of at random intervals.
The company might not have the time or the money to iterate through several CISOs; i.e., it might need to make the correct hiring decision on a first, critical try. This of course generally describes startups, which perhaps explains Paul Graham’s emphasis.
The company may have an accurate sense of how well they’re doing, and yet have a wrong impression that that’s the best it can do, and thus not expect new hires to do much better. This can of course be fixed by having sufficient “taste” and personal experience working in security, but it’s not equivalent and such taste is not always required.
Even when a company is getting cheap, quick feedback on a hire’s performance (say with the help of a competent internal red team), it might still be so hard to find people who have a relevant track record doing a niche job like “managing hundreds of security personnel inside a Fortune 500 company” that you just fail to do so anyways.
Your evaluation procedures might not identify remedies, or the remedies might not be easy to implement. Example: most of the underlying security issues in a product can turn out to be the result of architectural decisions made years ago, which it may be impractical to change later in a product’s lifecycle.
Sometimes companies will straight up allocate 100MM$ towards solving an issue and then 0MM$ toward testing if that issue is solved. Yes, it’s stupid, but it happens. There’s not always a very elaborate reason, however one is that they just aren’t aware of the one weird trick for testing information security.
Last but not least, the company may not actually care much about the problem, possibly because breaches are the type of thing that affects their customers more than them.
These can be tough issues, but they’re mostly not immune to making prudent leadership decisions and caring about the cause of concern. I’m good friends with one computer hacker who despite having extraordinary security chops and being a great person is, (I believe), mildly to moderately autistic, an alumnus of a no-name college in Texas, and generally terrible at perception management or professional networking. A while back I personally flew to Texas in an attempt to pitch them in person to be employee #1 at my attempted startup, as I was almost certain I had private information about their skills. They politely let me give my pitch, and then informed me that the salary would have to be something like 2x what I suggested, because they recently got an offer from Tesla, straight out of college, at well above Tesla’s usual rate for the job they now have.
Did Tesla manage to snipe my friend because Tesla leadership is full of people with some particular psychological aptitude for security, or because Tesla has boatloads of money and is run by generally smart people? My guess it’s that it’s the latter more than the former. Perhaps there’s some level beyond which the former becomes a bottleneck, but in my experience and in the experience of the people I’ve talked to who do pentesting at a top level, organizational competence along a specific dimension like security has more in common with leadership’s general abilities, resources, and motives, than it does with personal skill.
It certainly decomplicates hiring efforts to be great at security engineering yourself, and thus in possession of a “what would I do” oracle. But the degree to which it hurts not to have such an oracle is context and task dependent, and might be irrelevant for all practical purposes if your staff can come up with reliable protocols for testing what you want to test anyways.
“Just hiring people” is sometimes still actually possible
Graham’s design paradox is a maxim about organizations that says that if key management isn’t competent at ${X}, it’s often impossible for an organization to hire people who are, regardless of how well resourced that organization is. It was once described thusly:
The paradox has been invoked either directly or as an analogy to explain why the alignment problem is hard, why information security is hard, why identifying good programming talent is hard, and why American car companies don’t design good looking cars.
In some cases, especially in cases where failure is irrecoverable (like in evaluating alignment solutions), overcoming the design paradox is in fact a pressing issue. However I’d argue those cases are probably rare. In practice, most appeals to it to explain organizational incompetence are misattributions of a litany of deeper problems.
One field that seems to most people like it’d be a perfect application of this maxim is information security. Major security breaches are sometimes described as black swan events, because they’re relatively rare and when they happen they can be very disruptive. Your typical large technology company may not get real-world feedback on the performance of a subpar Chief Information Security Officer for years. It’s also extremely difficult to evaluate the security of such systems in vitro if you yourself are not generally good at breaking them, and you generally want to protect them against the best computer hackers in the field and not just the average ones. Perfect example of a case where the design paradox applies, right?
It’s true that companies tend to be bad at picking CISOs, and a very common tendency in the industry is for those executives to be fired every six to eighteen months when they’re scapegoated for incidents. In my estimation, however, this is not generally because executive leadership can’t identify poor security practices. There actually happens to be a perfectly legible way for nontechnical people at most large companies to analyze their security teams’ performance: penetration testing and red teaming.
The idea behind red teaming is that you pay a company like SpectreOps to try and breach specific key assets of yours, and then give you a report with all of the security holes they were able to find during that period. These firms will generally happily give you security recommendations and elaborate on deeper underlying problems with your tech policy, in ways that are understandable to smart nonexperts. Red teaming works because for most companies, surviving a comprehensive penetration test is a good (though not unmistakable) indicator that you’ll survive the attention of the parties that are going to look at your company over the next couple years or so.
Almost every Fortune 500 company knows about this, and gets penetration tests performed regularly on their critical systems. There is not necessarily a ready made pentesting service out there for all imaginable security requirements, but it turns out most companies have similar needs and the standard occasional checkup is basically sufficient. And even when large companies can’t identify the good penetration testing services on their first try, they’re still usually able to cycle through a bunch of firms and then continue to use the ones that are able to come up with solid results. With enough money you can develop similarly conclusive testing and evaluation strategies for many other concerns, like A/B testing UX designs and paneling consumers for feedback on aesthetics.
So, why do firms still suck at picking information security executives, or struggle with these kinds of left field hires in general? Well, it’s complicated:
Performance testing might be expensive. Hiring a computer hacker at a “known good” pentesting company to hack your shit costs something like 20k/head for a five day engagement. Paying a team of hackers to simulate anything like an “APT” (a buzzword for The Chinese or The Russians), in a 3+ month engagement, is a service you can buy but costs hundreds of thousands of dollars or more.
Principal agent problems can ruin otherwise perfectly acceptable mechanisms for evaluating employee performance. One frustratingly common cause of such problems in security is companies delegating the arrangement of security tests to the CISO. Obviously, your CISO wants a report that makes them look good, and so they’ll be incentivized to pick penetration testing companies they expect won’t find anything too damaging, or skew the results of such tests by allowing their staff advanced warning or doing them on a set schedule. The pentesting company also realizes this, and wants to be hired again. So they will often habitually downplay their findings to leadership in a way that makes the security team look good, at least in the report’s abstract, which is the only thing an executive is ever going to read. Even when leadership is competent enough not to give this task to their security department, a CEO themselves might still be incentivized to inflate their security confidence to a board or shareholders.
The tests they use can ossify as a target for Goodharting over time. Maybe all your penetration tests only happen during a time limit of five days, and so when someone comes along and decides to size you up for longer than that you’re unprepared. Maybe your organization changes security posture a lot and only has the tests when it’s been doing something for a while instead of at random intervals.
The company might not have the time or the money to iterate through several CISOs; i.e., it might need to make the correct hiring decision on a first, critical try. This of course generally describes startups, which perhaps explains Paul Graham’s emphasis.
The company may have an accurate sense of how well they’re doing, and yet have a wrong impression that that’s the best it can do, and thus not expect new hires to do much better. This can of course be fixed by having sufficient “taste” and personal experience working in security, but it’s not equivalent and such taste is not always required.
Even when a company is getting cheap, quick feedback on a hire’s performance (say with the help of a competent internal red team), it might still be so hard to find people who have a relevant track record doing a niche job like “managing hundreds of security personnel inside a Fortune 500 company” that you just fail to do so anyways.
Your evaluation procedures might not identify remedies, or the remedies might not be easy to implement. Example: most of the underlying security issues in a product can turn out to be the result of architectural decisions made years ago, which it may be impractical to change later in a product’s lifecycle.
Sometimes companies will straight up allocate 100MM$ towards solving an issue and then 0MM$ toward testing if that issue is solved. Yes, it’s stupid, but it happens. There’s not always a very elaborate reason, however one is that they just aren’t aware of the one weird trick for testing information security.
Last but not least, the company may not actually care much about the problem, possibly because breaches are the type of thing that affects their customers more than them.
These can be tough issues, but they’re mostly not immune to making prudent leadership decisions and caring about the cause of concern. I’m good friends with one computer hacker who despite having extraordinary security chops and being a great person is, (I believe), mildly to moderately autistic, an alumnus of a no-name college in Texas, and generally terrible at perception management or professional networking. A while back I personally flew to Texas in an attempt to pitch them in person to be employee #1 at my attempted startup, as I was almost certain I had private information about their skills. They politely let me give my pitch, and then informed me that the salary would have to be something like 2x what I suggested, because they recently got an offer from Tesla, straight out of college, at well above Tesla’s usual rate for the job they now have.
Did Tesla manage to snipe my friend because Tesla leadership is full of people with some particular psychological aptitude for security, or because Tesla has boatloads of money and is run by generally smart people? My guess it’s that it’s the latter more than the former. Perhaps there’s some level beyond which the former becomes a bottleneck, but in my experience and in the experience of the people I’ve talked to who do pentesting at a top level, organizational competence along a specific dimension like security has more in common with leadership’s general abilities, resources, and motives, than it does with personal skill.
It certainly decomplicates hiring efforts to be great at security engineering yourself, and thus in possession of a “what would I do” oracle. But the degree to which it hurts not to have such an oracle is context and task dependent, and might be irrelevant for all practical purposes if your staff can come up with reliable protocols for testing what you want to test anyways.