I addressed this point specifically. There is no ghost in the machine. Should it decide to perform such counter-measures, both that decision and execution of the counter-measures are themselves cognitive processes, their execution observable to the outside world.
With a proper machine architecture, the auditors could have access to all sorts of information. They could have records of process execution and summaries of memory access patterns, which would have to match the audit log provided by the AI. The entire mental space of the AI would be available for inspection from outside; regions which show activity but don’t show up in the audit logs would be a large red flag. We can take higher resolution logs of random moments, and check if that activity pattern matches some portion of the final high-level audit.
There are probably a dozen other ways in which unfriendly activity could be detected, beyond what I’ve written here and what Nick Bostrom et al have explored in their AI boxing paper. These would be defenses in depth: the UFAI would have to overcome all defenses, including ones it may not know about on its first attempt in order to successfully deceive its human operators.
UFAI isn’t necessarily about deception. You also have to worry that the AI will perform its assigned task in a way inimical to human values, that jumps through constraints intended to prevent this, through sheer ingenuity… Suppose the AI is designed to do X, something that human beings want, but that humans also care about Y and Z. And suppose the AI isn’t designed to intrinsically respect Y and Z. Instead there are constraints C that it knows about, the violation of which is also monitored by human beings, and these constraints are supposed to protect values Y and Z from violation. You have to worry that the AI will achieve X in a way which satisfies C but still violates Y and Z.
Auditing has the potential to slow down the AI—the AI may be paused regularly for forensic analysis and/or it may go slow in order to satisfy the safety constraints. Audited AI projects may be overtaken by others with a different methodology.
You want humans to “take us through the singularity”. But we aren’t through the singularity until superhuman intelligence exists. Is your plan, therefore, to suppress development of superhuman AI, until there are humans with superhumanly augmented intelligence? Do you plan to audit their development as well?
I am not opposed to the auditing concept, for AI or for augmented humans, but eventually one must directly answer the question, what is the design of a trustworthy superintelligence, in terms that make no reference to human supervision.
Oracle / tool AI is. The usual premise is that questions are asked to the superhuman AI, and responses only implemented if they are comprehensible, sane, and morally acceptable. Your example of satisfies C but still violates Y and Z would be picked up by the human oversight (or, the output is too complicated to be understood, and is shelved). Blindly following the AI’s directives is a failure mode the oracle AI path is meant to avoid. Further, search processes do not happen across solutions which are seemingly ok but deviously setup an AI breakout or kill-all-humans scenario just by random chance—the probability of that is astronomically low. So really, the only likely ways in which the AI says to do X, but ends up violating unstated constraints Y and Z is if (a) the human overseers failed at their one and only job, or (b) deception.
Auditing has the potential to slow down the AI.
Yup, it does. This is a race, but the question is not “is this approach faster than straight-up UFAI?” but rather “is this approach faster than other pathways to friendly AI?” FAI is a strict subset of the UFAI problem: there is no approach to FAI which is faster than a straight sprint to UFAI, consequences be damned.
My own informed opinion is that (UF)AGI is only 10-20 years away, max. Provably-friendly AI is not even a well defined problem, but by any definition it is strictly harder. The only estimates I’ve seen come out of MIRI for their approach puts FAI decades further out (I remember Luke saying 50-70 years). Such a date makes sense when compared with progress in verifiable computing in other fields. But 2nd place doesn’t count for anything here.
Oracle / tool AGI has the advantage of making safeguards a parallel development. The core AGI is not provably friendly, and can be developed at the same breakneck pace as one would expect of a hedge fund exploring this area. The security controls can be developed and put in place in parallel, without holding up work on the AGI itself. It does require choosing a particular architecture amenable to auditing, but that’s not really a disadvantage as it makes development & testing easier.
You want humans to “take us through the singularity”. But we aren’t through the singularity until superhuman intelligence exists. Is your plan, therefore, to suppress development of superhuman AI, until there are humans with superhumanly augmented intelligence? Do you plan to audit their development as well?
I’m not sure I understand the question. The point of FAI, CEV, etc., as I understand it, is to encode human morality into something a machine can understand because that machine, not us, will be making the decisions. But if progress comes not from ceding the keys to the kingdom to a machine intelligence, but rather by augmentation of real humans, then why is morality a problem we must solve now? Superhuman humans are still human, and have access to human morality through introspection, the same as we do. Why would you “audit” the mind of a human? That doesn’t make any sense, even aside from the plausibility.
As to suppressing development of AGI… no, I don’t think that’s a wise choice even if it’s possible. Mostly because I see no realistic way of doing that short of totalitarian control, and the ends do not justify those means. But I also don’t think it would be too hard to transition from oracle AI to human augmentation, especially with the help of a superhuman AGI to develop tools and decipher brain biology.
I am not opposed to the auditing concept, for AI or for augmented humans, but eventually one must directly answer the question, what is the design of a trustworthy superintelligence, in terms that make no reference to human supervision.
Um.. no. That’s completely unsubstantiated. The whole point of oracle / tool AI and confinement is to relinquish the need for provably trustworthy superintelligence.
Don’t use a neural net (or variants like deep belief networks). The field has advanced quite a bit since the 60′s, and since the late 80′s there have been machine learning and knowledge representation structures which are human and/or auditor comprehensible, such as probabilistic graphical models. This would have to be first class types of the virtual machine which implements the AGI if you are using auditing as a confinement mechanism. But that’s not really a restriction as many AI techniques are already phrased in terms of these models (including Eliezer’s own TDT, for example), and others have simple adaptations.
I addressed this point specifically. There is no ghost in the machine. Should it decide to perform such counter-measures, both that decision and execution of the counter-measures are themselves cognitive processes, their execution observable to the outside world.
With a proper machine architecture, the auditors could have access to all sorts of information. They could have records of process execution and summaries of memory access patterns, which would have to match the audit log provided by the AI. The entire mental space of the AI would be available for inspection from outside; regions which show activity but don’t show up in the audit logs would be a large red flag. We can take higher resolution logs of random moments, and check if that activity pattern matches some portion of the final high-level audit.
There are probably a dozen other ways in which unfriendly activity could be detected, beyond what I’ve written here and what Nick Bostrom et al have explored in their AI boxing paper. These would be defenses in depth: the UFAI would have to overcome all defenses, including ones it may not know about on its first attempt in order to successfully deceive its human operators.
So here are some more problems I have:
UFAI isn’t necessarily about deception. You also have to worry that the AI will perform its assigned task in a way inimical to human values, that jumps through constraints intended to prevent this, through sheer ingenuity… Suppose the AI is designed to do X, something that human beings want, but that humans also care about Y and Z. And suppose the AI isn’t designed to intrinsically respect Y and Z. Instead there are constraints C that it knows about, the violation of which is also monitored by human beings, and these constraints are supposed to protect values Y and Z from violation. You have to worry that the AI will achieve X in a way which satisfies C but still violates Y and Z.
Auditing has the potential to slow down the AI—the AI may be paused regularly for forensic analysis and/or it may go slow in order to satisfy the safety constraints. Audited AI projects may be overtaken by others with a different methodology.
You want humans to “take us through the singularity”. But we aren’t through the singularity until superhuman intelligence exists. Is your plan, therefore, to suppress development of superhuman AI, until there are humans with superhumanly augmented intelligence? Do you plan to audit their development as well?
I am not opposed to the auditing concept, for AI or for augmented humans, but eventually one must directly answer the question, what is the design of a trustworthy superintelligence, in terms that make no reference to human supervision.
Oracle / tool AI is. The usual premise is that questions are asked to the superhuman AI, and responses only implemented if they are comprehensible, sane, and morally acceptable. Your example of satisfies C but still violates Y and Z would be picked up by the human oversight (or, the output is too complicated to be understood, and is shelved). Blindly following the AI’s directives is a failure mode the oracle AI path is meant to avoid. Further, search processes do not happen across solutions which are seemingly ok but deviously setup an AI breakout or kill-all-humans scenario just by random chance—the probability of that is astronomically low. So really, the only likely ways in which the AI says to do X, but ends up violating unstated constraints Y and Z is if (a) the human overseers failed at their one and only job, or (b) deception.
Yup, it does. This is a race, but the question is not “is this approach faster than straight-up UFAI?” but rather “is this approach faster than other pathways to friendly AI?” FAI is a strict subset of the UFAI problem: there is no approach to FAI which is faster than a straight sprint to UFAI, consequences be damned.
My own informed opinion is that (UF)AGI is only 10-20 years away, max. Provably-friendly AI is not even a well defined problem, but by any definition it is strictly harder. The only estimates I’ve seen come out of MIRI for their approach puts FAI decades further out (I remember Luke saying 50-70 years). Such a date makes sense when compared with progress in verifiable computing in other fields. But 2nd place doesn’t count for anything here.
Oracle / tool AGI has the advantage of making safeguards a parallel development. The core AGI is not provably friendly, and can be developed at the same breakneck pace as one would expect of a hedge fund exploring this area. The security controls can be developed and put in place in parallel, without holding up work on the AGI itself. It does require choosing a particular architecture amenable to auditing, but that’s not really a disadvantage as it makes development & testing easier.
I’m not sure I understand the question. The point of FAI, CEV, etc., as I understand it, is to encode human morality into something a machine can understand because that machine, not us, will be making the decisions. But if progress comes not from ceding the keys to the kingdom to a machine intelligence, but rather by augmentation of real humans, then why is morality a problem we must solve now? Superhuman humans are still human, and have access to human morality through introspection, the same as we do. Why would you “audit” the mind of a human? That doesn’t make any sense, even aside from the plausibility.
As to suppressing development of AGI… no, I don’t think that’s a wise choice even if it’s possible. Mostly because I see no realistic way of doing that short of totalitarian control, and the ends do not justify those means. But I also don’t think it would be too hard to transition from oracle AI to human augmentation, especially with the help of a superhuman AGI to develop tools and decipher brain biology.
Um.. no. That’s completely unsubstantiated. The whole point of oracle / tool AI and confinement is to relinquish the need for provably trustworthy superintelligence.
How do you decide whether some interaction of a complex neural net is friendly or unfriendly?
It’s very hard to tell what a neural net or complex algorithm is doing even if you have logs.
Don’t use a neural net (or variants like deep belief networks). The field has advanced quite a bit since the 60′s, and since the late 80′s there have been machine learning and knowledge representation structures which are human and/or auditor comprehensible, such as probabilistic graphical models. This would have to be first class types of the virtual machine which implements the AGI if you are using auditing as a confinement mechanism. But that’s not really a restriction as many AI techniques are already phrased in terms of these models (including Eliezer’s own TDT, for example), and others have simple adaptations.