Sorry some clarification is necessary. I didn’t mean to imply that a human being manually inspect the AI and understand its workings, but rather that humans assisted by non-recursively self-improving technology (e.g. theorem provers, probabalistic code checkers, sandbox environments etc.) checks during execution or afterwards that the code behaves in respect of whatever properties the human cares about.
For example, the input/output channels which handle a combination of long-term memory, self-modifications, and actions may require that complete computational traces be provided. Now obviously the traces must be pruned in order to keep the auditing workload manageable, but using cryptographic techniques this can be done such that the AI cannot know ahead of time which branches will be checked, and therefore cannot cheat by creating fraudulent traces while escaping probabalistic or energetic detection.
The AI itself is executing on a framework that includes Bayesian probability graphs as its fundamental unit of computation, like a list in Lisp. The auditor’s job then it to make sure that the updates or actions make sense in terms of the data provided, and that choices wree made in alignment with its claimed utility function (remember the AI can’t make up data without that fraud being energetically detectable).
The above basically acts as a lie detector to the question “what’s your highest-level utility function?” and followup questions like “give examples of why this clause of the utility function is necessary”.
--
Actually, maybe that’s a simpler way of framing the issue: assume that you have access to a lie detector. I posit that an accurate lie detector is sufficient to create boxable AI, or maybe even enable some procedural processes for the creation of true friendly AI (e.g. some extension of indirect normativity). The above is basically a technical outline of how one would create such a lie detector for certain AGI architectures.
humans assisted by non-recursively self-improving technology (e.g. theorem provers, probabalistic code checkers, sandbox environments etc.) checks during execution or afterwards that the code behaves in respect of whatever properties the human cares about.
I reiterate my point: regardless of tools used, you still have a human brain at the end of the line and that brain’s complexity limit is not high.
complete computational traces be provided
Going back to my example, what does a “complete computational trace” mean in the context of a few million threads running different code asynchronously on distributed hardware, all interacting?
Bayesian probability graphs as its fundamental unit of computation, like a list in Lisp
A list in Lisp is just a data structure. What is a Bayesian probability graph as a unit of computation? I don’t even understand what it means.
assume that you have access to a lie detector
Sure. Give that lie detector to someone with the IQ of 60 and see how well they together will figure out something that a guy with the IQ of 150 wants to keep hidden from them, say, his true intentions and capabilities.
Sorry some clarification is necessary. I didn’t mean to imply that a human being manually inspect the AI and understand its workings, but rather that humans assisted by non-recursively self-improving technology (e.g. theorem provers, probabalistic code checkers, sandbox environments etc.) checks during execution or afterwards that the code behaves in respect of whatever properties the human cares about.
For example, the input/output channels which handle a combination of long-term memory, self-modifications, and actions may require that complete computational traces be provided. Now obviously the traces must be pruned in order to keep the auditing workload manageable, but using cryptographic techniques this can be done such that the AI cannot know ahead of time which branches will be checked, and therefore cannot cheat by creating fraudulent traces while escaping probabalistic or energetic detection.
The AI itself is executing on a framework that includes Bayesian probability graphs as its fundamental unit of computation, like a list in Lisp. The auditor’s job then it to make sure that the updates or actions make sense in terms of the data provided, and that choices wree made in alignment with its claimed utility function (remember the AI can’t make up data without that fraud being energetically detectable).
The above basically acts as a lie detector to the question “what’s your highest-level utility function?” and followup questions like “give examples of why this clause of the utility function is necessary”.
--
Actually, maybe that’s a simpler way of framing the issue: assume that you have access to a lie detector. I posit that an accurate lie detector is sufficient to create boxable AI, or maybe even enable some procedural processes for the creation of true friendly AI (e.g. some extension of indirect normativity). The above is basically a technical outline of how one would create such a lie detector for certain AGI architectures.
I reiterate my point: regardless of tools used, you still have a human brain at the end of the line and that brain’s complexity limit is not high.
Going back to my example, what does a “complete computational trace” mean in the context of a few million threads running different code asynchronously on distributed hardware, all interacting?
A list in Lisp is just a data structure. What is a Bayesian probability graph as a unit of computation? I don’t even understand what it means.
Sure. Give that lie detector to someone with the IQ of 60 and see how well they together will figure out something that a guy with the IQ of 150 wants to keep hidden from them, say, his true intentions and capabilities.
A human brain is at the end of all the alternative strategies as well.