If the rules are high level enough to be human readable, then translating them into something a computer can run while still maintaining the original intent is hard.
It’s not harder than AGI, because NL is a central part of AGI.
That’s basically the whole alignment problem.
No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
If a system is doing large, fast, irreversible things, then it does not matter whether those things are surprisingly intelligent. If they’re surprising, then that’s sufficient for it to be a problem.
Yes, but that’s a problem we already have, with solutions we already have. For instance, high frequency trading systems can be shut down [automatically] if the market moves too much.
Yes, but that’s a problem we already have, with solutions we already have.
It is a problem we already have, but the solutions we already have are all based on the assumption that either (a) we know in advance what kind of problems can happen, or (b) the problem doesn’t kill us all in one shot. For instance, in your HFT system shutdown example, we already know that “market moves too much” is something which makes a lot of HFT systems not work very well. But how did we learn that? Either we had a prior idea of what problems could happen (implying some transparency of the system), or the problem happened at least once and we learned from that (implying it didn’t kill us the first time—see e.g. Knight capital).
With AI, it’s the same old problem, but on hard mode (i.e. the system is very opaque) and high stakes (i.e. we don’t necessarily the survive the first big mistake). That’s exactly the sort of scenario where our current solutions do not work.
It’s not harder than AGI, because NL is a central part of AGI.
NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
I’m not even convinced that higher-level systems necessarily need goals. Pure goal-free tool AI is one possible path; the OP was written to be agnostic to such considerations.
Indeed, that’s a big part of why I say translation is the central piece of the alignment problem: it’s the piece that’s agnostic. It’s the piece that has to be there, in every scheme, under a wide range of assumptions about how the world works. Tool AI? Still needs to solve the translation problem in order to be safe and useful, even without any notion of values or preferences. Utility-maximizing AI? Needs to solve the translation problem in order to be safe and useful. Hierarchical scheme? Translation still needs to be handled somewhere in order to be safe and useful. Humans-consulting-humans or variations thereof? Full system needs to solve the translation problem in order to be safe and useful. Etc.
NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
Presumably “natural language”, which often gets called NLP for “natural language processing” in AI.
I think the right response there is something like “suppose you have an AGI that can understand what a human means as well as another human does; now you still have all the difficulty of interpretation that makes law a complicated and contentious field.” It’d be nice to be able to write a Constitution and recognize it after the AI has thought about it while having adversarial pressure on how to interpret it for 300 years, for example.
It’s not harder than AGI, because NL is a central part of AGI.
No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
Yes, but that’s a problem we already have, with solutions we already have. For instance, high frequency trading systems can be shut down [automatically] if the market moves too much.
It is a problem we already have, but the solutions we already have are all based on the assumption that either (a) we know in advance what kind of problems can happen, or (b) the problem doesn’t kill us all in one shot. For instance, in your HFT system shutdown example, we already know that “market moves too much” is something which makes a lot of HFT systems not work very well. But how did we learn that? Either we had a prior idea of what problems could happen (implying some transparency of the system), or the problem happened at least once and we learned from that (implying it didn’t kill us the first time—see e.g. Knight capital).
With AI, it’s the same old problem, but on hard mode (i.e. the system is very opaque) and high stakes (i.e. we don’t necessarily the survive the first big mistake). That’s exactly the sort of scenario where our current solutions do not work.
NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
I’m not even convinced that higher-level systems necessarily need goals. Pure goal-free tool AI is one possible path; the OP was written to be agnostic to such considerations.
Indeed, that’s a big part of why I say translation is the central piece of the alignment problem: it’s the piece that’s agnostic. It’s the piece that has to be there, in every scheme, under a wide range of assumptions about how the world works. Tool AI? Still needs to solve the translation problem in order to be safe and useful, even without any notion of values or preferences. Utility-maximizing AI? Needs to solve the translation problem in order to be safe and useful. Hierarchical scheme? Translation still needs to be handled somewhere in order to be safe and useful. Humans-consulting-humans or variations thereof? Full system needs to solve the translation problem in order to be safe and useful. Etc.
Presumably “natural language”, which often gets called NLP for “natural language processing” in AI.
I think the right response there is something like “suppose you have an AGI that can understand what a human means as well as another human does; now you still have all the difficulty of interpretation that makes law a complicated and contentious field.” It’d be nice to be able to write a Constitution and recognize it after the AI has thought about it while having adversarial pressure on how to interpret it for 300 years, for example.