I for one can’t agree with the point that transparency does any good in security assessment if we consider implementation of a complex system (design has its own rules though). I believe you underestimate how broken a human mind really is.
Transparency == Priming
The team which does security review of the system will utterly fail the moment they get their hands on the source code, due to suggestion/priming effects.
comments in source—I won’t even argue
variable and function names will suggest what this part of code “is for”. E.g code could say “it was (initially) written to compute X for which it is correct” except that later on it was also made to compute Y and Z where it catches fire 0.01% of the time.
whitespace. E.g. missing braces because indentation suggested otherwise (yes, really, seen this one)
If you truly consider removing all metadata from the code—the code looses half of its transparency already, so “transparency” doesn’t quite apply.
Any of that metadata will cause the security team to drop some testing effort. Those tests won’t even be designed/written, not to mention “caring enough to make an actual effort”. Otoh, if a program passes serious (more below) black-box testing, it doesn’t need to pass anything else. Transparency is simply unnecessary.
Hardcore solution to security-critical systems:
Have a design of the system (this has issues on its own, not covered here)
Get two teams of programmers
Have both teams implement and debug the system, separately, without any communication
Get both teams to review the other teams binary (black-box). If bugs found goto 3
Obfuscate both binaries (tools easily available) and have both teams review their own (obfuscated) binary while believing it’s the other team’s binary. If bugs found, goto 3
At this point you could open up sources and have a final review (“transparency”) but honestly… what’s the point?
Yes, it’s paranoid. Systems can be split into smaller parts though—built and tested separately—so it’s not as monumental an effort as it seems.
I think you are thinking about transparency differently than OP.
You seem to be thinking of informal code review type stuff (hence the comments and function names gripe), and not formal, mechanical verification, which is what OP is talking about (I think).
At this point you could open up sources and have a final review (“transparency”) but honestly… what’s the point?
The point is that black box testing can only realistically verify a tiny slice of input-output space. You cannot prove theorems involving universal quantification, for example, without literally checking every input (which may not fit in the known universe). So if the system has some esoteric failure mode that you didn’t manage to test for, you don’t catch it.
On the other hand “transparent” testing is where you give eg a type checker access to the internal structure so it can immediately prove things like “nope, this function cannot match the spec, and will fail by adding a list to a number, when fed input X”.
As a serious, if trivial, example, imagine black-box testing a quicksort. You test it on 1000 large random lists and measure the average and worst case running time. You probably get O(n*log(n)) for both. You deploy the code, and someone disassembles it, and designs a killer input and pwns your system, because quicksort has rare inputs for which it goes O(n^2).
Transparency isn’t only about reading the source code or not, it’s also about whether you can do formal deduction or not.
Thus the design (i.e. “The Math”) vs implementation (i.e. “The Code”) division. I believe design verification suffers from same problems as implementation verification, albeit maybe less severely (though I never worked with really complex, novel, abstract math… it would be interesting to see how many of those, on average, are “proved” correct and then blow up).
Still, I would argue that the problem is not that black-box testing is insufficient—it is where we are currently able to apply it—but rather that we have no idea how to properly black-box-test an abstract, novel, complex system!
PS. Your trivial example is also unfair and trivializes the technique. Black-box testing in no way implies randomizing all tests and I would expect the QuickSort to blow up very very soon in serious testing.
I for one can’t agree with the point that transparency does any good in security assessment if we consider implementation of a complex system (design has its own rules though). I believe you underestimate how broken a human mind really is.
Transparency == Priming
The team which does security review of the system will utterly fail the moment they get their hands on the source code, due to suggestion/priming effects.
comments in source—I won’t even argue
variable and function names will suggest what this part of code “is for”. E.g code could say “it was (initially) written to compute X for which it is correct” except that later on it was also made to compute Y and Z where it catches fire 0.01% of the time.
whitespace. E.g. missing braces because indentation suggested otherwise (yes, really, seen this one)
If you truly consider removing all metadata from the code—the code looses half of its transparency already, so “transparency” doesn’t quite apply. Any of that metadata will cause the security team to drop some testing effort. Those tests won’t even be designed/written, not to mention “caring enough to make an actual effort”. Otoh, if a program passes serious (more below) black-box testing, it doesn’t need to pass anything else. Transparency is simply unnecessary.
Hardcore solution to security-critical systems:
Have a design of the system (this has issues on its own, not covered here)
Get two teams of programmers
Have both teams implement and debug the system, separately, without any communication
Get both teams to review the other teams binary (black-box). If bugs found goto 3
Obfuscate both binaries (tools easily available) and have both teams review their own (obfuscated) binary while believing it’s the other team’s binary. If bugs found, goto 3
At this point you could open up sources and have a final review (“transparency”) but honestly… what’s the point?
Yes, it’s paranoid. Systems can be split into smaller parts though—built and tested separately—so it’s not as monumental an effort as it seems.
I think you are thinking about transparency differently than OP.
You seem to be thinking of informal code review type stuff (hence the comments and function names gripe), and not formal, mechanical verification, which is what OP is talking about (I think).
The point is that black box testing can only realistically verify a tiny slice of input-output space. You cannot prove theorems involving universal quantification, for example, without literally checking every input (which may not fit in the known universe). So if the system has some esoteric failure mode that you didn’t manage to test for, you don’t catch it.
On the other hand “transparent” testing is where you give eg a type checker access to the internal structure so it can immediately prove things like “nope, this function cannot match the spec, and will fail by adding a list to a number, when fed input X”.
As a serious, if trivial, example, imagine black-box testing a quicksort. You test it on 1000 large random lists and measure the average and worst case running time. You probably get O(n*log(n)) for both. You deploy the code, and someone disassembles it, and designs a killer input and pwns your system, because quicksort has rare inputs for which it goes O(n^2).
Transparency isn’t only about reading the source code or not, it’s also about whether you can do formal deduction or not.
Thus the design (i.e. “The Math”) vs implementation (i.e. “The Code”) division. I believe design verification suffers from same problems as implementation verification, albeit maybe less severely (though I never worked with really complex, novel, abstract math… it would be interesting to see how many of those, on average, are “proved” correct and then blow up).
Still, I would argue that the problem is not that black-box testing is insufficient—it is where we are currently able to apply it—but rather that we have no idea how to properly black-box-test an abstract, novel, complex system!
PS. Your trivial example is also unfair and trivializes the technique. Black-box testing in no way implies randomizing all tests and I would expect the QuickSort to blow up very very soon in serious testing.