Oracle / tool AI is. The usual premise is that questions are asked to the superhuman AI, and responses only implemented if they are comprehensible, sane, and morally acceptable. Your example of satisfies C but still violates Y and Z would be picked up by the human oversight (or, the output is too complicated to be understood, and is shelved). Blindly following the AI’s directives is a failure mode the oracle AI path is meant to avoid. Further, search processes do not happen across solutions which are seemingly ok but deviously setup an AI breakout or kill-all-humans scenario just by random chance—the probability of that is astronomically low. So really, the only likely ways in which the AI says to do X, but ends up violating unstated constraints Y and Z is if (a) the human overseers failed at their one and only job, or (b) deception.
Auditing has the potential to slow down the AI.
Yup, it does. This is a race, but the question is not “is this approach faster than straight-up UFAI?” but rather “is this approach faster than other pathways to friendly AI?” FAI is a strict subset of the UFAI problem: there is no approach to FAI which is faster than a straight sprint to UFAI, consequences be damned.
My own informed opinion is that (UF)AGI is only 10-20 years away, max. Provably-friendly AI is not even a well defined problem, but by any definition it is strictly harder. The only estimates I’ve seen come out of MIRI for their approach puts FAI decades further out (I remember Luke saying 50-70 years). Such a date makes sense when compared with progress in verifiable computing in other fields. But 2nd place doesn’t count for anything here.
Oracle / tool AGI has the advantage of making safeguards a parallel development. The core AGI is not provably friendly, and can be developed at the same breakneck pace as one would expect of a hedge fund exploring this area. The security controls can be developed and put in place in parallel, without holding up work on the AGI itself. It does require choosing a particular architecture amenable to auditing, but that’s not really a disadvantage as it makes development & testing easier.
You want humans to “take us through the singularity”. But we aren’t through the singularity until superhuman intelligence exists. Is your plan, therefore, to suppress development of superhuman AI, until there are humans with superhumanly augmented intelligence? Do you plan to audit their development as well?
I’m not sure I understand the question. The point of FAI, CEV, etc., as I understand it, is to encode human morality into something a machine can understand because that machine, not us, will be making the decisions. But if progress comes not from ceding the keys to the kingdom to a machine intelligence, but rather by augmentation of real humans, then why is morality a problem we must solve now? Superhuman humans are still human, and have access to human morality through introspection, the same as we do. Why would you “audit” the mind of a human? That doesn’t make any sense, even aside from the plausibility.
As to suppressing development of AGI… no, I don’t think that’s a wise choice even if it’s possible. Mostly because I see no realistic way of doing that short of totalitarian control, and the ends do not justify those means. But I also don’t think it would be too hard to transition from oracle AI to human augmentation, especially with the help of a superhuman AGI to develop tools and decipher brain biology.
I am not opposed to the auditing concept, for AI or for augmented humans, but eventually one must directly answer the question, what is the design of a trustworthy superintelligence, in terms that make no reference to human supervision.
Um.. no. That’s completely unsubstantiated. The whole point of oracle / tool AI and confinement is to relinquish the need for provably trustworthy superintelligence.
Oracle / tool AI is. The usual premise is that questions are asked to the superhuman AI, and responses only implemented if they are comprehensible, sane, and morally acceptable. Your example of satisfies C but still violates Y and Z would be picked up by the human oversight (or, the output is too complicated to be understood, and is shelved). Blindly following the AI’s directives is a failure mode the oracle AI path is meant to avoid. Further, search processes do not happen across solutions which are seemingly ok but deviously setup an AI breakout or kill-all-humans scenario just by random chance—the probability of that is astronomically low. So really, the only likely ways in which the AI says to do X, but ends up violating unstated constraints Y and Z is if (a) the human overseers failed at their one and only job, or (b) deception.
Yup, it does. This is a race, but the question is not “is this approach faster than straight-up UFAI?” but rather “is this approach faster than other pathways to friendly AI?” FAI is a strict subset of the UFAI problem: there is no approach to FAI which is faster than a straight sprint to UFAI, consequences be damned.
My own informed opinion is that (UF)AGI is only 10-20 years away, max. Provably-friendly AI is not even a well defined problem, but by any definition it is strictly harder. The only estimates I’ve seen come out of MIRI for their approach puts FAI decades further out (I remember Luke saying 50-70 years). Such a date makes sense when compared with progress in verifiable computing in other fields. But 2nd place doesn’t count for anything here.
Oracle / tool AGI has the advantage of making safeguards a parallel development. The core AGI is not provably friendly, and can be developed at the same breakneck pace as one would expect of a hedge fund exploring this area. The security controls can be developed and put in place in parallel, without holding up work on the AGI itself. It does require choosing a particular architecture amenable to auditing, but that’s not really a disadvantage as it makes development & testing easier.
I’m not sure I understand the question. The point of FAI, CEV, etc., as I understand it, is to encode human morality into something a machine can understand because that machine, not us, will be making the decisions. But if progress comes not from ceding the keys to the kingdom to a machine intelligence, but rather by augmentation of real humans, then why is morality a problem we must solve now? Superhuman humans are still human, and have access to human morality through introspection, the same as we do. Why would you “audit” the mind of a human? That doesn’t make any sense, even aside from the plausibility.
As to suppressing development of AGI… no, I don’t think that’s a wise choice even if it’s possible. Mostly because I see no realistic way of doing that short of totalitarian control, and the ends do not justify those means. But I also don’t think it would be too hard to transition from oracle AI to human augmentation, especially with the help of a superhuman AGI to develop tools and decipher brain biology.
Um.. no. That’s completely unsubstantiated. The whole point of oracle / tool AI and confinement is to relinquish the need for provably trustworthy superintelligence.