I am assuming that the AI that engages in out-of-the-box thinking is not fast, and that the conjunction of fast *and* unpredictable is the central problem.
The market will demand AI that’s faster than humans, and at least as capable of creative, unpredictable thinking. However, the same AI does not have to be both. This approach to AI safety is copied from a widespread organisational principal, where the higher levels do the abstract strategic thinking, the least predictable stuff, the middle levels do the concrete, tactical thinking and the lowest levels do what they are told. The fastest and most fine grained actions are at the lowest level. The higher level can only communicate with the lower levels by communicating an amended strategy or policy: they are not able interrupt fine-grained decisions, and only hear about fine grained actions after they have happenned. I have given an abstract description of this organising principle because there are multiple concrete examples: large businesses, militaries, and the human brain/CNS. Businesses already use fast but not very flexible systems to do things faster than humans, notably in high frequency trading. The question is whether more advanced AI’s will be responsible for fine-grained trading decisions, the all-in-one approach, or whether advanced AI will substitute for or assist business analysts and market strategists.
A standard objection to Tool AI is that having a human check all the TAI’s decisions would slow things up too much. The above architecture allows an alternative, where human checking occurs between levels. In particular, communication from the highest level to the lower ones is slow anyway. The main requisite for this apprach to AI safety is a human readable communications protocol.
Making it predictable” at a high-level requires translating high-level “predictability” into some low-level specification, which just brings us back to the original problem: translation is hard.
If you are checking your high level AI as you go along, you need a high level language that is human comprehensible.
I’m pretty sure none of this actually affects what I said: the low-level behavior still needs produce results which are predictable to humans in order for predictability to be useful, and that’s still hard.
The problem is that making an AI predictable to a human is hard. This is true regardless of whether or not it’s doing any outside-the-box thinking. Having a human double-check the instructions given to a fast low-level AI does not make the problem any easier; the low-level AI’s behavior still has to be understood by a human in order for that to be useful.
As you say toward the end, you’d need something like a human-readable communications protocol. That brings us right back to the original problem: it’s hard to translate between humans’ high-level abstractions and low-level structure. That’s why AI is unpredictable to humans in the first place.
The rules it’s given are, presumably, at a low level themselves. (Even if that’s not the case, the rules it’s given are definitely not human-intelligible unless we’ve already solved the translation problem in full.)
The question is not whether the low-level AI will follow those rules, the question is what actually happens when something follows those rules. A python interpreter will not ever deviate from the simple rules of python, yet it still does surprising-to-a-human things all the time. The problem is accurately translating between human-intelligible structure and the rules given to the AI.
The problem is not that the AI might deviate from the given rules. The problem is that the rules don’t always mean what we want them to mean.
The rules it’s given are, presumably, at a low level themselves.
The rules that the low level AI runs on could be medium level. There is no point in giving it very low level rules, since its job is to fill in the details. But the point is that I am stipulating that the rules should be high level enough to be human-readable.
The question is not whether the low-level AI will follow those rules, the question is what actually happens when something follows those rules. A python interpreter will not ever deviate from the simple rules of python, yet it still does surprising-to-a-human things all the time.
But the world hasn’t ended. A python interpreter doesn’t do surprisingly intelligent things, because it is not intelligent.
The problem is not that the AI might deviate from the given rules. The problem is that the rules don’t always mean what we want them to mean.
In your framing of the problem , you create one superpowerful AI that has to be programmed perfectly, which is impossible. In my solution, you reduce the problem to more manageable chunks. My solution is already partially implemented.
But the point is that I am stipulating that the rules should be high level enough to be human-readable.
If the rules are high level enough to be human readable, then translating them into something a computer can run while still maintaining the original intent is hard. That’s basically the whole alignment problem. If an AI is doing that translation, then writing/training that AI is as hard as the whole alignment problem.
A python interpreter doesn’t do surprisingly intelligent things, because it is not intelligent.
If a system is doing large, fast, irreversible things, then it does not matter whether those things are surprisingly intelligent. If they’re surprising, then that’s sufficient for it to be a problem.
In your framing of the problem , you create one superpowerful AI that has to be programmed perfectly, which is impossible.
I’m not sure what gave you that impression, but I definitely do not intend to assume any of that.
If the rules are high level enough to be human readable, then translating them into something a computer can run while still maintaining the original intent is hard.
It’s not harder than AGI, because NL is a central part of AGI.
That’s basically the whole alignment problem.
No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
If a system is doing large, fast, irreversible things, then it does not matter whether those things are surprisingly intelligent. If they’re surprising, then that’s sufficient for it to be a problem.
Yes, but that’s a problem we already have, with solutions we already have. For instance, high frequency trading systems can be shut down [automatically] if the market moves too much.
Yes, but that’s a problem we already have, with solutions we already have.
It is a problem we already have, but the solutions we already have are all based on the assumption that either (a) we know in advance what kind of problems can happen, or (b) the problem doesn’t kill us all in one shot. For instance, in your HFT system shutdown example, we already know that “market moves too much” is something which makes a lot of HFT systems not work very well. But how did we learn that? Either we had a prior idea of what problems could happen (implying some transparency of the system), or the problem happened at least once and we learned from that (implying it didn’t kill us the first time—see e.g. Knight capital).
With AI, it’s the same old problem, but on hard mode (i.e. the system is very opaque) and high stakes (i.e. we don’t necessarily the survive the first big mistake). That’s exactly the sort of scenario where our current solutions do not work.
It’s not harder than AGI, because NL is a central part of AGI.
NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
I’m not even convinced that higher-level systems necessarily need goals. Pure goal-free tool AI is one possible path; the OP was written to be agnostic to such considerations.
Indeed, that’s a big part of why I say translation is the central piece of the alignment problem: it’s the piece that’s agnostic. It’s the piece that has to be there, in every scheme, under a wide range of assumptions about how the world works. Tool AI? Still needs to solve the translation problem in order to be safe and useful, even without any notion of values or preferences. Utility-maximizing AI? Needs to solve the translation problem in order to be safe and useful. Hierarchical scheme? Translation still needs to be handled somewhere in order to be safe and useful. Humans-consulting-humans or variations thereof? Full system needs to solve the translation problem in order to be safe and useful. Etc.
NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
Presumably “natural language”, which often gets called NLP for “natural language processing” in AI.
I think the right response there is something like “suppose you have an AGI that can understand what a human means as well as another human does; now you still have all the difficulty of interpretation that makes law a complicated and contentious field.” It’d be nice to be able to write a Constitution and recognize it after the AI has thought about it while having adversarial pressure on how to interpret it for 300 years, for example.
I am assuming that the AI that engages in out-of-the-box thinking is not fast, and that the conjunction of fast *and* unpredictable is the central problem.
The market will demand AI that’s faster than humans, and at least as capable of creative, unpredictable thinking.
However, the same AI does not have to be both. This approach to AI safety is copied from a widespread organisational
principal, where the higher levels do the abstract strategic thinking, the least predictable stuff,
the middle levels do the concrete, tactical thinking and the lowest levels do what they are told.
The fastest and most fine grained actions are at the lowest level. The higher level can only communicate with the lower levels by communicating an amended strategy or policy: they are not able interrupt fine-grained decisions, and only hear about fine grained actions after they have happenned. I have given an abstract description of this organising principle because there are multiple concrete examples: large businesses, militaries, and the human brain/CNS. Businesses already use fast but not very flexible systems to do things faster than humans, notably in high frequency trading. The question is whether
more advanced AI’s will be responsible for fine-grained trading decisions, the all-in-one approach, or whether advanced AI will substitute for or assist business analysts and market strategists.
A standard objection to Tool AI is that having a human check all the TAI’s decisions would slow things up too much. The above architecture allows an alternative, where human checking occurs between levels. In particular, communication from the highest level to the lower ones is slow anyway. The main requisite for this apprach to AI safety is a human readable communications protocol.
If you are checking your high level AI as you go along, you need a high level language that is human comprehensible.
I’m pretty sure none of this actually affects what I said: the low-level behavior still needs produce results which are predictable to humans in order for predictability to be useful, and that’s still hard.
The problem is that making an AI predictable to a human is hard. This is true regardless of whether or not it’s doing any outside-the-box thinking. Having a human double-check the instructions given to a fast low-level AI does not make the problem any easier; the low-level AI’s behavior still has to be understood by a human in order for that to be useful.
As you say toward the end, you’d need something like a human-readable communications protocol. That brings us right back to the original problem: it’s hard to translate between humans’ high-level abstractions and low-level structure. That’s why AI is unpredictable to humans in the first place.
If you know in general that a low level AI will follow the rule si has been given, you don’t need to keep re-checking.
The rules it’s given are, presumably, at a low level themselves. (Even if that’s not the case, the rules it’s given are definitely not human-intelligible unless we’ve already solved the translation problem in full.)
The question is not whether the low-level AI will follow those rules, the question is what actually happens when something follows those rules. A python interpreter will not ever deviate from the simple rules of python, yet it still does surprising-to-a-human things all the time. The problem is accurately translating between human-intelligible structure and the rules given to the AI.
The problem is not that the AI might deviate from the given rules. The problem is that the rules don’t always mean what we want them to mean.
The rules that the low level AI runs on could be medium level. There is no point in giving it very low level rules, since its job is to fill in the details. But the point is that I am stipulating that the rules should be high level enough to be human-readable.
But the world hasn’t ended. A python interpreter doesn’t do surprisingly intelligent things, because it is not intelligent.
In your framing of the problem , you create one superpowerful AI that has to be programmed perfectly, which is impossible. In my solution, you reduce the problem to more manageable chunks. My solution is already partially implemented.
If the rules are high level enough to be human readable, then translating them into something a computer can run while still maintaining the original intent is hard. That’s basically the whole alignment problem. If an AI is doing that translation, then writing/training that AI is as hard as the whole alignment problem.
If a system is doing large, fast, irreversible things, then it does not matter whether those things are surprisingly intelligent. If they’re surprising, then that’s sufficient for it to be a problem.
I’m not sure what gave you that impression, but I definitely do not intend to assume any of that.
It’s not harder than AGI, because NL is a central part of AGI.
No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
Yes, but that’s a problem we already have, with solutions we already have. For instance, high frequency trading systems can be shut down [automatically] if the market moves too much.
It is a problem we already have, but the solutions we already have are all based on the assumption that either (a) we know in advance what kind of problems can happen, or (b) the problem doesn’t kill us all in one shot. For instance, in your HFT system shutdown example, we already know that “market moves too much” is something which makes a lot of HFT systems not work very well. But how did we learn that? Either we had a prior idea of what problems could happen (implying some transparency of the system), or the problem happened at least once and we learned from that (implying it didn’t kill us the first time—see e.g. Knight capital).
With AI, it’s the same old problem, but on hard mode (i.e. the system is very opaque) and high stakes (i.e. we don’t necessarily the survive the first big mistake). That’s exactly the sort of scenario where our current solutions do not work.
NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
I’m not even convinced that higher-level systems necessarily need goals. Pure goal-free tool AI is one possible path; the OP was written to be agnostic to such considerations.
Indeed, that’s a big part of why I say translation is the central piece of the alignment problem: it’s the piece that’s agnostic. It’s the piece that has to be there, in every scheme, under a wide range of assumptions about how the world works. Tool AI? Still needs to solve the translation problem in order to be safe and useful, even without any notion of values or preferences. Utility-maximizing AI? Needs to solve the translation problem in order to be safe and useful. Hierarchical scheme? Translation still needs to be handled somewhere in order to be safe and useful. Humans-consulting-humans or variations thereof? Full system needs to solve the translation problem in order to be safe and useful. Etc.
Presumably “natural language”, which often gets called NLP for “natural language processing” in AI.
I think the right response there is something like “suppose you have an AGI that can understand what a human means as well as another human does; now you still have all the difficulty of interpretation that makes law a complicated and contentious field.” It’d be nice to be able to write a Constitution and recognize it after the AI has thought about it while having adversarial pressure on how to interpret it for 300 years, for example.