I’ll have to review Ch. 22 later as it is quite long.
If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
For example, take oracle AI. The sequences do a good job of showing how a black box AI can’t be safely boxed, nor can any of its recommendations be trusted. But those arguments don’t generalize to when we can see and understand the inner workings of the AI. Yes engineering challenges apply as you can’t demand a computational trace of the entire returned result, as that would require an even more powerful AI to analyze, and then it’d be turtles all the way down. However you can do something like the Fiat-Shamir transform for selecting branches of the computational trace to audit. In essence, use the cryptographic hash of the result in order to choose which traces of the audit log to reveal. This allows the audit log to be only a tiny, tiny slice of the entire computation, yet it can be shown that faking such an audit log is computationally infeasible, meaning that it requires a large multiple more computation than went into the original result, which means it would be energetically detectable.
Of course you would also have to have the AI be written for a virtual machine which is pure, strongly typed and directly supports the native abstraction primitives of the AGI design (e.g. if it were OpenCog, the virtual machine would exist at the abstraction level of the atomspace), and have a computational stack underneath that which prevents VM breakout, and other protective measures. But these are surmountable engineering challenges, the difficulty of which can be realistically quantified.
So how much more or less difficult would it be to accomplish such an untrusted oracle AI vs the stable self-modifying agent and value-loading approach? Which one is more likely to occur before the “competition”?
I’m not demanding a full waterfall project plan, but even agile requires convincing arguments about critical paths and relative priorities. I for one am not convinced.
If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
I’ll have to review Ch. 22 later as it is quite long.
If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
For example, take oracle AI. The sequences do a good job of showing how a black box AI can’t be safely boxed, nor can any of its recommendations be trusted. But those arguments don’t generalize to when we can see and understand the inner workings of the AI. Yes engineering challenges apply as you can’t demand a computational trace of the entire returned result, as that would require an even more powerful AI to analyze, and then it’d be turtles all the way down. However you can do something like the Fiat-Shamir transform for selecting branches of the computational trace to audit. In essence, use the cryptographic hash of the result in order to choose which traces of the audit log to reveal. This allows the audit log to be only a tiny, tiny slice of the entire computation, yet it can be shown that faking such an audit log is computationally infeasible, meaning that it requires a large multiple more computation than went into the original result, which means it would be energetically detectable.
Of course you would also have to have the AI be written for a virtual machine which is pure, strongly typed and directly supports the native abstraction primitives of the AGI design (e.g. if it were OpenCog, the virtual machine would exist at the abstraction level of the atomspace), and have a computational stack underneath that which prevents VM breakout, and other protective measures. But these are surmountable engineering challenges, the difficulty of which can be realistically quantified.
So how much more or less difficult would it be to accomplish such an untrusted oracle AI vs the stable self-modifying agent and value-loading approach? Which one is more likely to occur before the “competition”?
I’m not demanding a full waterfall project plan, but even agile requires convincing arguments about critical paths and relative priorities. I for one am not convinced.
Well that makes three of us...