Why? That’s not a necessary logical consequence. These aren’t (or don’t have to be) chaotical systems, so there is no reason that scaling up the size of the computation results in an unauditable mess. The techniques used depend very much on the AGI architecture, but there are designs which allow for tracing thought patterns and answering questions about its operation in ways which are computationally tractable.
Because there are limits on complexity that a human brain can process. There are tools and techniques which help, sure, but at some point you will hit the limit anyway.
there is no reason that scaling up the size of the computation results in an unauditable mess.
I am pretty sure there is. Imagine a few million threads running different code asynchronously on distributed hardware, all interacting. And if the scenario involves an AI which bootstrapped itself through self-modification, your chances of understanding what’s happening by looking at pieces of code are… very low.
Sorry some clarification is necessary. I didn’t mean to imply that a human being manually inspect the AI and understand its workings, but rather that humans assisted by non-recursively self-improving technology (e.g. theorem provers, probabalistic code checkers, sandbox environments etc.) checks during execution or afterwards that the code behaves in respect of whatever properties the human cares about.
For example, the input/output channels which handle a combination of long-term memory, self-modifications, and actions may require that complete computational traces be provided. Now obviously the traces must be pruned in order to keep the auditing workload manageable, but using cryptographic techniques this can be done such that the AI cannot know ahead of time which branches will be checked, and therefore cannot cheat by creating fraudulent traces while escaping probabalistic or energetic detection.
The AI itself is executing on a framework that includes Bayesian probability graphs as its fundamental unit of computation, like a list in Lisp. The auditor’s job then it to make sure that the updates or actions make sense in terms of the data provided, and that choices wree made in alignment with its claimed utility function (remember the AI can’t make up data without that fraud being energetically detectable).
The above basically acts as a lie detector to the question “what’s your highest-level utility function?” and followup questions like “give examples of why this clause of the utility function is necessary”.
--
Actually, maybe that’s a simpler way of framing the issue: assume that you have access to a lie detector. I posit that an accurate lie detector is sufficient to create boxable AI, or maybe even enable some procedural processes for the creation of true friendly AI (e.g. some extension of indirect normativity). The above is basically a technical outline of how one would create such a lie detector for certain AGI architectures.
humans assisted by non-recursively self-improving technology (e.g. theorem provers, probabalistic code checkers, sandbox environments etc.) checks during execution or afterwards that the code behaves in respect of whatever properties the human cares about.
I reiterate my point: regardless of tools used, you still have a human brain at the end of the line and that brain’s complexity limit is not high.
complete computational traces be provided
Going back to my example, what does a “complete computational trace” mean in the context of a few million threads running different code asynchronously on distributed hardware, all interacting?
Bayesian probability graphs as its fundamental unit of computation, like a list in Lisp
A list in Lisp is just a data structure. What is a Bayesian probability graph as a unit of computation? I don’t even understand what it means.
assume that you have access to a lie detector
Sure. Give that lie detector to someone with the IQ of 60 and see how well they together will figure out something that a guy with the IQ of 150 wants to keep hidden from them, say, his true intentions and capabilities.
Just as an example of something a human couldn’t understand that a sufficiently smart computer might—writing code directly to binary, without the intermediate step of a programming language.
After a certain level of complexity you can look but you wouldn’t understand what you are seeing.
Why? That’s not a necessary logical consequence. These aren’t (or don’t have to be) chaotical systems, so there is no reason that scaling up the size of the computation results in an unauditable mess. The techniques used depend very much on the AGI architecture, but there are designs which allow for tracing thought patterns and answering questions about its operation in ways which are computationally tractable.
Because there are limits on complexity that a human brain can process. There are tools and techniques which help, sure, but at some point you will hit the limit anyway.
I am pretty sure there is. Imagine a few million threads running different code asynchronously on distributed hardware, all interacting. And if the scenario involves an AI which bootstrapped itself through self-modification, your chances of understanding what’s happening by looking at pieces of code are… very low.
Sorry some clarification is necessary. I didn’t mean to imply that a human being manually inspect the AI and understand its workings, but rather that humans assisted by non-recursively self-improving technology (e.g. theorem provers, probabalistic code checkers, sandbox environments etc.) checks during execution or afterwards that the code behaves in respect of whatever properties the human cares about.
For example, the input/output channels which handle a combination of long-term memory, self-modifications, and actions may require that complete computational traces be provided. Now obviously the traces must be pruned in order to keep the auditing workload manageable, but using cryptographic techniques this can be done such that the AI cannot know ahead of time which branches will be checked, and therefore cannot cheat by creating fraudulent traces while escaping probabalistic or energetic detection.
The AI itself is executing on a framework that includes Bayesian probability graphs as its fundamental unit of computation, like a list in Lisp. The auditor’s job then it to make sure that the updates or actions make sense in terms of the data provided, and that choices wree made in alignment with its claimed utility function (remember the AI can’t make up data without that fraud being energetically detectable).
The above basically acts as a lie detector to the question “what’s your highest-level utility function?” and followup questions like “give examples of why this clause of the utility function is necessary”.
--
Actually, maybe that’s a simpler way of framing the issue: assume that you have access to a lie detector. I posit that an accurate lie detector is sufficient to create boxable AI, or maybe even enable some procedural processes for the creation of true friendly AI (e.g. some extension of indirect normativity). The above is basically a technical outline of how one would create such a lie detector for certain AGI architectures.
I reiterate my point: regardless of tools used, you still have a human brain at the end of the line and that brain’s complexity limit is not high.
Going back to my example, what does a “complete computational trace” mean in the context of a few million threads running different code asynchronously on distributed hardware, all interacting?
A list in Lisp is just a data structure. What is a Bayesian probability graph as a unit of computation? I don’t even understand what it means.
Sure. Give that lie detector to someone with the IQ of 60 and see how well they together will figure out something that a guy with the IQ of 150 wants to keep hidden from them, say, his true intentions and capabilities.
A human brain is at the end of all the alternative strategies as well.
Just as an example of something a human couldn’t understand that a sufficiently smart computer might—writing code directly to binary, without the intermediate step of a programming language.
That would be read as decompiled assembler which humans can understand, though not in large quantities.
Interesting. Consider me corrected.
For anything nontrivial, we need software support to do that—and it still won’t work very well. You might not be absolutely correct, but you’re close.
IDA is a wonderful piece of software, though. A heck of a lot better than working manually.