We have some code verification today—in Java bytecode programs—for example.
The properties that can be proved by the Java bytecode verifier are far weaker than the properties that can be proved by the Java source-to-bytecode compiler. If everyone were willing to distribute source code, there would be no need for a bytecode verifier in Java. So this isn’t a shining example of the usefulness of machine code verification, unfortunately.
What we can program has always been far ahead of what we can prove about our programs. Proofs may advance, but programming skills seem likely to advance further.
The power of our verification tools varies greatly with the programming style chosen. (Compare proving things about Haskell programs, to proving similar things about C code.) A research direction I’m personally interested in is this: how can we construct programming styles (languages, libraries, etc) that allow much greater verification power while at the same time not being too constraining for human programmers?
Re: “If everyone were willing to distribute source code”
Right—but that premise is incorrect. The point of having bytecode is to have a condensed format that is suitable for distribution and direct execution.
The example also illustrates the connection between provability and security.
Bytecode is not significantly more condensed than appropriately transformed sourcecode (strip comments and whitespace and compress). And the cost of compiling Java source is low; it’s not a problem except for low-end embedded platforms.
From a technical POV, source is better suited than bytecode for distribution, and indeed for everything. Not only can you prove stronger things about it than you can about bytecode, source-level compatibility is more resilient to library changes than bytecode-level ABI compatibility.
The only reason people don’t always distribute Java source is because they don’t want end-users to easily read and modify it. (Which is mostly irrational, given the quality of bytecode-to-source decomplation tools.) And this concern isn’t very relevant to issues of AI development and verification.
People actually ship bytecode—rather than compressed source code—because the runtimes expect bytecode, and compilation before execution is a time-consuming additional step.
The “proving stronger things” claim seems highly dubious. You can decomplie compiled bytecode into source code, and what comes out is of reasonable quality—unless the bytecode was compressed after compilation. Excepting comments and whitespace, these two formats contain very similar information.
People actually ship bytecode—rather than compressed source code—because the runtimes expect bytecode, and compilation before execution is a time-consuming additional step.
This is a circular argument. The compiler is small compared to the runtime and standard library, and could easily be included in the requirements for a JRE. ETA: one reason Java was designed to run bytecode is that business-types are afraid of distributing source code to end users, and were much more afraid before the relatively recent rise to fame of from-source languages like Python.
(Yes, the original Java was designed to run on very constrained platforms, but that has been changed long since. By Java 1.2, requiring JREs to compile code before running it and treating bytecode as a cache would not have seemed very onerous.)
Excepting comments and whitespace, these two formats contain very similar information.
Decompilation works as well as it does only because compilers store extra information in the bytecode. This information (generics being an infamous example) isn’t a part of the bytecode spec and can’t be used by the runtime—it’s only there to help decompilation, debugging etc. So while it’s strictly true that the bytecode format can contain all the information you want, in practice this is little different from shipping stripped-down bytecode that doesn’t contain that extra info and also the sourcecode.
You appear to have mis-read my argument before labeling it as circular. I didn’t say a compiler was too large for a JRE. I said compilation was a time-consuming additional step.
Your bytecode history seems revisionist. Java actually has bytecode for many reasons—including size issues and embedded applications:
“Java bytecodes are designed to be easy to interpret on any machine, or to dynamically translate into native machine code if required by performance demands.”
I agree that bytecode itself is useful for performance. The idea of verifying it instead of verifying the source code is what I’m arguing against. What I was suggesting is to always distribute the code, verify that on installation, and store trusted bytecode in a cache for performance.
I expect that—if you get a reply—you will hear claims that compilation to bytecode on the client would damage start-up performance, that compressed source code is less compact than compressed bytecode, hindering applet distribution, and claims of additional burdens on embedded environments.
You would still need a bytecode verifier for backwards compatibility—so that would be two verification systems to maintain.
The main reason the JRE won’t be changed today is that a lot of the code running on it isn’t written in Java. And that’s a good reason.
I was merely saying that for a single project needing the best possible code verification assistance, it was more fruitful to look at language-level tools than at bytecode-level.
ETA: one reason Java was designed to run bytecode is that business-types are afraid of distributing source code to end users, and were much more afraid before the relatively recent rise to fame of from-source languages like Python.
The properties that can be proved by the Java bytecode verifier are far weaker than the properties that can be proved by the Java source-to-bytecode compiler. If everyone were willing to distribute source code, there would be no need for a bytecode verifier in Java. So this isn’t a shining example of the usefulness of machine code verification, unfortunately.
The power of our verification tools varies greatly with the programming style chosen. (Compare proving things about Haskell programs, to proving similar things about C code.) A research direction I’m personally interested in is this: how can we construct programming styles (languages, libraries, etc) that allow much greater verification power while at the same time not being too constraining for human programmers?
Re: “If everyone were willing to distribute source code”
Right—but that premise is incorrect. The point of having bytecode is to have a condensed format that is suitable for distribution and direct execution.
The example also illustrates the connection between provability and security.
Bytecode is not significantly more condensed than appropriately transformed sourcecode (strip comments and whitespace and compress). And the cost of compiling Java source is low; it’s not a problem except for low-end embedded platforms.
From a technical POV, source is better suited than bytecode for distribution, and indeed for everything. Not only can you prove stronger things about it than you can about bytecode, source-level compatibility is more resilient to library changes than bytecode-level ABI compatibility.
The only reason people don’t always distribute Java source is because they don’t want end-users to easily read and modify it. (Which is mostly irrational, given the quality of bytecode-to-source decomplation tools.) And this concern isn’t very relevant to issues of AI development and verification.
People actually ship bytecode—rather than compressed source code—because the runtimes expect bytecode, and compilation before execution is a time-consuming additional step.
The “proving stronger things” claim seems highly dubious. You can decomplie compiled bytecode into source code, and what comes out is of reasonable quality—unless the bytecode was compressed after compilation. Excepting comments and whitespace, these two formats contain very similar information.
This is a circular argument. The compiler is small compared to the runtime and standard library, and could easily be included in the requirements for a JRE. ETA: one reason Java was designed to run bytecode is that business-types are afraid of distributing source code to end users, and were much more afraid before the relatively recent rise to fame of from-source languages like Python.
(Yes, the original Java was designed to run on very constrained platforms, but that has been changed long since. By Java 1.2, requiring JREs to compile code before running it and treating bytecode as a cache would not have seemed very onerous.)
Decompilation works as well as it does only because compilers store extra information in the bytecode. This information (generics being an infamous example) isn’t a part of the bytecode spec and can’t be used by the runtime—it’s only there to help decompilation, debugging etc. So while it’s strictly true that the bytecode format can contain all the information you want, in practice this is little different from shipping stripped-down bytecode that doesn’t contain that extra info and also the sourcecode.
You appear to have mis-read my argument before labeling it as circular. I didn’t say a compiler was too large for a JRE. I said compilation was a time-consuming additional step.
Your bytecode history seems revisionist. Java actually has bytecode for many reasons—including size issues and embedded applications:
“Java bytecodes are designed to be easy to interpret on any machine, or to dynamically translate into native machine code if required by performance demands.”
http://java.sun.com/docs/white/langenv/Neutral.doc1.html
You seem to agree—since you have now revised your comment.
I agree that bytecode itself is useful for performance. The idea of verifying it instead of verifying the source code is what I’m arguing against. What I was suggesting is to always distribute the code, verify that on installation, and store trusted bytecode in a cache for performance.
Right—well, speak to Sun/Oracle.
I expect that—if you get a reply—you will hear claims that compilation to bytecode on the client would damage start-up performance, that compressed source code is less compact than compressed bytecode, hindering applet distribution, and claims of additional burdens on embedded environments.
You would still need a bytecode verifier for backwards compatibility—so that would be two verification systems to maintain.
The main reason the JRE won’t be changed today is that a lot of the code running on it isn’t written in Java. And that’s a good reason.
I was merely saying that for a single project needing the best possible code verification assistance, it was more fruitful to look at language-level tools than at bytecode-level.