Interesting. Can you say more about how your work compares to existing VMs, such as the JVM, and what sorts of things you want to prove about executions?
Can you say more about how your work compares to existing VMs, such as the JVM?
Most commercially used VMs are not designed to the level of security required to protect humanity from an UFAI. VM escape bugs are routinely found in various JVM implementations and the complexity and structure of the VM itself precludes existing or theorized analysis tools from being able to verify the implementation to a level of detail sufficient for high-assurance systems.
To be useful in this context, a VM needs to be designed with simplicity and security as the two driving requirements. One would expect, for example, an interpreter for such a VM to occupy no more than 1,000 lines of MISRA C, which could have its security assertions proven with existing tools like Coq. The goal is to get in the core VM the simplest set of opcodes and state transitions which still allow a nearly Turing-complete (total functional, to be specific) program description language, and one which is still capable of concisely representing useful programs.
The other way in which it differs from many VM layers is that is has a non-optional strong typing system (in the spirit of Haskell, not C++).
and what sorts of things you want to prove about executions?
Type checking, mostly. Actually that aspect of the system has less to do with boxing than the design of the artificial intelligence itself. You can imagine, for example, a core evolutionary search algorithm which operated over program space by performing type-safe mutation of program elements, or achieved creative combination by substituting type-equal expressions.
It is also important to prove some properties, e.g. bounds on running time. Particularly when interacting with other untrusted agents (“sharing minds”—machine-to-machine communication is likely to be literally uploading memories from one mind-space to another).
You’re building a box to contain a strong AI out of type checking? I suggest that unless your types include “strings that can be displayed to a human without causing him to let the AI out of the box”, this is unlikely to succeed.
bounds on running time
Won’t you run into Halting Problem issues? Although perhaps that’s why it’s only nearly Turing-complete. Incidentally, can you quantify this and say how your machine is less powerful than a Turing-complete one but more powerful than regexes?
You’re building a box to contain a strong AI out of type checking?
No, where did I say that? A VM layer with strong type checking is just one layer.
I suggest that unless your types include “strings that can be displayed to a human without causing him to let the AI out of the box”, this is unlikely to succeed.
I don’t want to debate this here because (a) I haven’t the time, and (b) prior debates I’ve had on LW on the subject have been 100% unproductive, but in short: there hasn’t been a single convincing argument against the feasibility of boxing, and there is nothing of value to be learnt from the AI box “experiments”. So you don’t want to give a single isolated human a direct line to a UFAI running with no operational oversight. Well, duh. It’s a total strawman.
Won’t you run into Halting Problem issues?
No.
Incidentally, can you quantify this and say how your machine is less powerful than a Turing-complete one but more powerful than regexes?
there is nothing of value to be learnt from the AI box “experiments”.
I would disagree: even if the experiments do not deter us from running boxed AI, they might allow us to know what arguments to prepare for.
Do we know whether anyone has let the AI out of the box in experiments without the AI using simulation warfare? Because AFAICT there is a considerable amount of processing power & information required in this case, and limiting these factors while precommiting not to let the AI out could minimise the risk.
even if the experiments do not deter us from running boxed AI, they might allow us to know what arguments to prepare for.
You’re missing the point. If you ever find yourself in a situation where a human operator with the ability to let the AI out is listening to arguments from the AI, or even in a situation where the AI is doing unaudited computation, you are doing it wrong and already failed. The entire setup of the AI box game is bogus to begin with, in that it doesn’t reflect in any way a realistic oracle AI box construction or breakout scenario.
AFAICT there is a considerable amount of processing power & information required in this case, and limiting these factors while precommiting not to let the AI out could minimize the risk.
Yes, that is basically the pathway towards a realistic AI box. UFAI need not be treated as black boxes: they are performing computation, and that computation is energetically bounded and can be audited.
If the AI is meant to be much smarter than us, what auditing can we usefully do? I wouldn’t think much of our chances of getting any useful information from auditing the computation done by an ordinary human brain, never mind a possibly superhuman AI. (I can imagine that auditing might enable to tell it hadn’t been corrupted by viruses or something, but surely that’s far from the biggest concern.)
That’s a sensible sounding intuition, but it’s wrong. So what if the AI is smarter than us? We still have full access to its inner workings, and have the ability to design its internal structures in such a way as to remain transparently sensible to auditors. I turns out you can audit just about anything to whatever required level of detail with probabilistic assurances. You can use tools without general intelligence to perform the auditing task free of influence from the boxed AI, and there are cool tricks from cryptography that make it impossible for the AI to fake computational traces without doing an energetically detectable multiple of the purported work.
To better prime your intuition, think about it this way: to make a recommendation of a course of action which achieves X requires searching the space of actions for one which satisfies condition X. This has some inherent difficulty related to the size of the answer space related to the size of the search space (as well as the structure of the two spaces, but let’s simplify for now). Assuming random distributions, you can say that an a course of action requires 40 bits of identifying information, meaning that you need to perform 2^40 operations to find an instance of a solution.
Now let’s say that the answer space which contains innocuous looking, but actually evil solutions requires 60 bits. In other words, for every evil solution there is 2^20 good solutions. Then an AI which wants to deceive you into doing an action which purports to meet good requirement X, but actually meets the AI’s evil hidden requirement Y requires 2^60 operations.
You can tell whether a machine did 2^40 or 2^60 operations, and if it does the latter it better have a damn good explanation for why. (And faking that explanation would require further computation.. I hope you see where this is going.)
Is there somewhere where I can find this stuff written up, and discover exactly what is known to be (provably, or at least that’s the impression I get from what you write) achievable by auditing?
I remain gently skeptical despite your very confident tone because, e.g., we don’t currently know how to make any sort of superintelligent machine, and I at least would be rather surprised by any theorem along the lines of “given any reasonable class of superintelligent agents, there is a reasonably straightforward way to make a superintelligent agent in this class that can be shown to be innocuous by means of auditing that ordinary human beings are capable of doing reliably”.
For the avoidance of doubt, I have no difficulty at all in believing e.g. that there are auditing techniques that will guarantee (or very very nearly guarantee) that a particular agent is performing a particular computational process; I would be only modestly surprised to find that there are techniques that will verify that a particular agent is in some sense optimizing a particular objective function; but the difficulties of keeping a superintelligent AI from doing terrible things are much more complicated and include e.g. tremendous difficulty in working out what it is we really want optimized, and what computational processes we really want carried out.
Perhaps it would be useful to get a bit more concrete. Could you give an example of the sort of thing we might want a superintelligent AI to do for us, that we can’t “obviously” make it do safely without the techniques you have in mind, and explain how those techniques enable us to make it do that thing safely?
Interesting. Can you say more about how your work compares to existing VMs, such as the JVM, and what sorts of things you want to prove about executions?
Most commercially used VMs are not designed to the level of security required to protect humanity from an UFAI. VM escape bugs are routinely found in various JVM implementations and the complexity and structure of the VM itself precludes existing or theorized analysis tools from being able to verify the implementation to a level of detail sufficient for high-assurance systems.
To be useful in this context, a VM needs to be designed with simplicity and security as the two driving requirements. One would expect, for example, an interpreter for such a VM to occupy no more than 1,000 lines of MISRA C, which could have its security assertions proven with existing tools like Coq. The goal is to get in the core VM the simplest set of opcodes and state transitions which still allow a nearly Turing-complete (total functional, to be specific) program description language, and one which is still capable of concisely representing useful programs.
The other way in which it differs from many VM layers is that is has a non-optional strong typing system (in the spirit of Haskell, not C++).
Type checking, mostly. Actually that aspect of the system has less to do with boxing than the design of the artificial intelligence itself. You can imagine, for example, a core evolutionary search algorithm which operated over program space by performing type-safe mutation of program elements, or achieved creative combination by substituting type-equal expressions.
It is also important to prove some properties, e.g. bounds on running time. Particularly when interacting with other untrusted agents (“sharing minds”—machine-to-machine communication is likely to be literally uploading memories from one mind-space to another).
You’re building a box to contain a strong AI out of type checking? I suggest that unless your types include “strings that can be displayed to a human without causing him to let the AI out of the box”, this is unlikely to succeed.
Won’t you run into Halting Problem issues? Although perhaps that’s why it’s only nearly Turing-complete. Incidentally, can you quantify this and say how your machine is less powerful than a Turing-complete one but more powerful than regexes?
No, where did I say that? A VM layer with strong type checking is just one layer.
I don’t want to debate this here because (a) I haven’t the time, and (b) prior debates I’ve had on LW on the subject have been 100% unproductive, but in short: there hasn’t been a single convincing argument against the feasibility of boxing, and there is nothing of value to be learnt from the AI box “experiments”. So you don’t want to give a single isolated human a direct line to a UFAI running with no operational oversight. Well, duh. It’s a total strawman.
No.
I did: http://en.wikipedia.org/wiki/Total_functional_programming
I would disagree: even if the experiments do not deter us from running boxed AI, they might allow us to know what arguments to prepare for.
Do we know whether anyone has let the AI out of the box in experiments without the AI using simulation warfare? Because AFAICT there is a considerable amount of processing power & information required in this case, and limiting these factors while precommiting not to let the AI out could minimise the risk.
… if the people conducting the experiments ever revealed enough about what happened in them to be useful for anyone else preparing. Which they don’t.
You’re missing the point. If you ever find yourself in a situation where a human operator with the ability to let the AI out is listening to arguments from the AI, or even in a situation where the AI is doing unaudited computation, you are doing it wrong and already failed. The entire setup of the AI box game is bogus to begin with, in that it doesn’t reflect in any way a realistic oracle AI box construction or breakout scenario.
Yes, that is basically the pathway towards a realistic AI box. UFAI need not be treated as black boxes: they are performing computation, and that computation is energetically bounded and can be audited.
If the AI is meant to be much smarter than us, what auditing can we usefully do? I wouldn’t think much of our chances of getting any useful information from auditing the computation done by an ordinary human brain, never mind a possibly superhuman AI. (I can imagine that auditing might enable to tell it hadn’t been corrupted by viruses or something, but surely that’s far from the biggest concern.)
That’s a sensible sounding intuition, but it’s wrong. So what if the AI is smarter than us? We still have full access to its inner workings, and have the ability to design its internal structures in such a way as to remain transparently sensible to auditors. I turns out you can audit just about anything to whatever required level of detail with probabilistic assurances. You can use tools without general intelligence to perform the auditing task free of influence from the boxed AI, and there are cool tricks from cryptography that make it impossible for the AI to fake computational traces without doing an energetically detectable multiple of the purported work.
To better prime your intuition, think about it this way: to make a recommendation of a course of action which achieves X requires searching the space of actions for one which satisfies condition X. This has some inherent difficulty related to the size of the answer space related to the size of the search space (as well as the structure of the two spaces, but let’s simplify for now). Assuming random distributions, you can say that an a course of action requires 40 bits of identifying information, meaning that you need to perform 2^40 operations to find an instance of a solution.
Now let’s say that the answer space which contains innocuous looking, but actually evil solutions requires 60 bits. In other words, for every evil solution there is 2^20 good solutions. Then an AI which wants to deceive you into doing an action which purports to meet good requirement X, but actually meets the AI’s evil hidden requirement Y requires 2^60 operations.
You can tell whether a machine did 2^40 or 2^60 operations, and if it does the latter it better have a damn good explanation for why. (And faking that explanation would require further computation.. I hope you see where this is going.)
Is there somewhere where I can find this stuff written up, and discover exactly what is known to be (provably, or at least that’s the impression I get from what you write) achievable by auditing?
I remain gently skeptical despite your very confident tone because, e.g., we don’t currently know how to make any sort of superintelligent machine, and I at least would be rather surprised by any theorem along the lines of “given any reasonable class of superintelligent agents, there is a reasonably straightforward way to make a superintelligent agent in this class that can be shown to be innocuous by means of auditing that ordinary human beings are capable of doing reliably”.
For the avoidance of doubt, I have no difficulty at all in believing e.g. that there are auditing techniques that will guarantee (or very very nearly guarantee) that a particular agent is performing a particular computational process; I would be only modestly surprised to find that there are techniques that will verify that a particular agent is in some sense optimizing a particular objective function; but the difficulties of keeping a superintelligent AI from doing terrible things are much more complicated and include e.g. tremendous difficulty in working out what it is we really want optimized, and what computational processes we really want carried out.
Perhaps it would be useful to get a bit more concrete. Could you give an example of the sort of thing we might want a superintelligent AI to do for us, that we can’t “obviously” make it do safely without the techniques you have in mind, and explain how those techniques enable us to make it do that thing safely?