Addendum 2: this particular quoted comment is very wrong, and I expect this is indicative of the quality of the quoted discussion, i.e. these people do not know what they are talking about.
Luke Parrish: Microsoft designed their OS to run driver files without even a checksum and you say they aren’t responsible? They literally tried to execute a string of zeroes!
Luke Parrish: CrowdStrike is absolutely to blame, but so is Microsoft. Microsoft’s software, Windows, is failing to do extremely basic basic checks on driver files before trying to load them and give them full root access to see and do everything on your computer.
The reports I have seen (of attempted reverse-engineering of the Crowdstrike driver’s segfault) say it did not attempt to execute the zeroes from the file as code, and the crash was unrelated, likely while trying to parse the file. Context: the original workaround for the problem was to delete a file which contains only zeroes (at least on some machines, reports are inconsistent), but there’s no direct reason to think the driver is trying to execute this file as code.
And: Windows does not run drivers “without a checksum”! Drivers have to be signed by Microsoft, and drivers with early-loading permissions have to be super-duper-signed in a way you probably can’t get just by paying them a few thousand dollars.
But it’s impossible to truly review or test a compiled binary, for which you have no sourcecode or even debug symbols, and which is deliberately obfuscated in many ways (as people have been reporting when they looked at this driver crash) because it’s trying to defend itself against reverse-engineers designing attacks. And of course it’s impossible to formally prove that a program is correct. And of course it’s written in a memory-unsafe language, i.e. C++, because every single OS kernel and its drivers are written in such a language.
Furthermore, the Crowdstrike product relies on very quickly pushing out updates to (everyone else’s) production to counter new threats / vulnerabilities being exploited. Microsoft can’t test anything that quickly. Whether Crowdstrike can test anything that quickly, and whether you should allow live updates to be pushed to your production system, is a different question.
Anyway, who’s supposed to pay Microsoft for extensive testing of Crowdstrike’s driver? They’re paid to sign off on the fact that Crowdstrike are who they say they are, and at best that they’re not a deliberately malicious actor (as far as we know they aren’t). Third party Windows drivers have bugs and security vulnerabilities all the time, just like most software.
Finally, Crowdstrike to an extent competes with Microsoft’s own security products (i.e. Microsoft Defender and whatever the relevant enterprise-product branding is); we can’t expect Microsoft to invest too much in finding bugs in Crowdstrike!
It’s impossible to prove that an arbitrary program, which someone else gave you, is correct. That’s halting-problem equivalent, or Rice’s theorem, etc.
Yes, we can prove various properties of programs we carefully write to be provable, but the context here is that a black-box executable Crowdstrike submits to Microsoft cannot be proven reliable by Microsoft.
There are definitely improvements we can make. Counting just the ones made in some other (bits of) operating systems, we could:
Rewrite in a memory-safe language like Rust
Move more stuff to userspace. Drivers for e.g. USB devices can and should be written in userspace, using something like libusb. This goes for every device that doesn’t need performance-critical code or to manage device-side DMA access, which still leaves a bunch of things, but it’s a start.
Sandbox more kinds of drivers in a recoverable way, so they can do the things they need to efficiently access hardware, but are still prevented from accessing the rest of the kernel and userspace, and can ‘crash safe’. For example, Windows can recover from crashes in graphics drivers specifically—which is an amazing accomplishment! Linux eBPF can’t access stuff it shouldn’t.
Expose more kernel features via APIs so people don’t have to write drivers to do stuff that isn’t literally driving a piece of hardware, so even if Crowdstrike has super-duper-permissions, a crash in Crowdstrike itself doesn’t bring down the rest of the system, it has to do it intentionally
Of course any such changes both cost a lot and take years or decades to become ubiquitous. Windows in particular has an incredible backwards compatibility story, which in practice means backwards compatibility with every past bug they ever had. But this is a really valuable feature for many users who have old apps and, yes, drivers that rely on those bugs!