Thus the argument that there are people using DL without understanding it—and moreover that this is dangerous—is specious and weak because these people are not the ones actually likely to develop AGI let alone superintelligence.
Yes, but I don’t think that’s an argument anyone has actually made. Nobody, to my knowledge, sincerely believes that we are right around the corner from superintelligent, self-improving AGI built out of deep neural networks, such that any old machine-learning professor experimenting with how to get a lower error rate in classification tasks is going to suddenly get the Earth covered in paper-clips.
Actually, no, I can think of one person who believed that: a radically underinformed layperson on reddit who, for some strange reason, believed that LessWrong is the only site with people doing “real AI” and that “[machine-learning researchers] build optimizers! They’ll destroy us all!”
Hopefully he was messing with me. Nobody else has ever made such ridiculous claims.
Sorry, wait, I’m forgetting to count sensationalistic journalists as people again. But that’s normal.
Instead of thinking of ‘safety’ or ‘alignment’ as some absolute binary property we can guarantee, it is more profitable to think of a complex distribution over the relative amounts of ‘safety’ or ‘alignment’ in an AI population
No, “guarantees” in this context meant PAC-style guarantees: “We guarantee that with probability 1-\delta, the system will only ‘go wrong’ from what its sample data taught it 1-\epsilon fraction of the time.” You then need to plug in the epsilons and deltas you want and solve for how much sample data you need to feed the learner. The links for intro PAC lectures in the other comment given to you were quite good, by the way, although I do recommend taking a rigorous introductory machine learning class (new grad-student level should be enough to inflict the PAC foundations on you).
we can at least influence or steer the distribution by selecting for agent types that are more safe/altruistic
“Altruistic” is already a social behavior, requiring the agent to have a theory of mind and care about the minds it believes it observes in its environment. It also assumes that we can build in some way to learn what the hypothesized minds want, learn how they (ie: human beings) think, and separate the map (of other minds) from the territory (of actual people).
Note that “don’t disturb this system over there (eg: a human being) because you need to receive data from it untainted by your own causal intervention in any way” is a constraint that at least I, personally, do not know how to state in computational terms.
I think you are overhyping the PAC model. It surely is an important foundation for probabilistic guarantees in machine learning, but there are some serious limitations when you want to use it to constrain something like an AGI:
It only deals with supervised learning
Simple things like finite automata are not learnable, but in practice it seems like humans pick them up fairly easily.
It doesn’t deal with temporal aspects of learning.
However, there are some modification of the PAC model that can ameliorate these problems, like learning with membership queries (item 2).
It’s also perhaps a bit optimistic to say that PAC-style bounds on a possibly very complex system like an AGI would be “quite doable”. We don’t even know, for example, whether DNF is learnable in polynomial time under the distribution free assumption.
I would definitely call it an open research problem to provide PAC-style bounds for more complicated hypothesis spaces and learning settings. But that doesn’t mean it’s impossible or un-doable, just that it’s an open research problem. I want a limitary theorem proved before I go calling things impossible.
Yes, but I don’t think that’s an argument anyone has actually made. Nobody, to my knowledge, sincerely believes that we are right around the corner from superintelligent, self-improving AGI built out of deep neural networks, such that any old machine-learning professor experimenting with how to get a lower error rate in classification tasks is going to suddenly get the Earth covered in paper-clips.
Actually, no, I can think of one person who believed that: a radically underinformed layperson on reddit who, for some strange reason, believed that LessWrong is the only site with people doing “real AI” and that “[machine-learning researchers] build optimizers! They’ll destroy us all!”
Hopefully he was messing with me. Nobody else has ever made such ridiculous claims.
Sorry, wait, I’m forgetting to count sensationalistic journalists as people again. But that’s normal.
No, “guarantees” in this context meant PAC-style guarantees: “We guarantee that with probability 1-\delta, the system will only ‘go wrong’ from what its sample data taught it 1-\epsilon fraction of the time.” You then need to plug in the epsilons and deltas you want and solve for how much sample data you need to feed the learner. The links for intro PAC lectures in the other comment given to you were quite good, by the way, although I do recommend taking a rigorous introductory machine learning class (new grad-student level should be enough to inflict the PAC foundations on you).
“Altruistic” is already a social behavior, requiring the agent to have a theory of mind and care about the minds it believes it observes in its environment. It also assumes that we can build in some way to learn what the hypothesized minds want, learn how they (ie: human beings) think, and separate the map (of other minds) from the territory (of actual people).
Note that “don’t disturb this system over there (eg: a human being) because you need to receive data from it untainted by your own causal intervention in any way” is a constraint that at least I, personally, do not know how to state in computational terms.
I think you are overhyping the PAC model. It surely is an important foundation for probabilistic guarantees in machine learning, but there are some serious limitations when you want to use it to constrain something like an AGI:
It only deals with supervised learning
Simple things like finite automata are not learnable, but in practice it seems like humans pick them up fairly easily.
It doesn’t deal with temporal aspects of learning.
However, there are some modification of the PAC model that can ameliorate these problems, like learning with membership queries (item 2).
It’s also perhaps a bit optimistic to say that PAC-style bounds on a possibly very complex system like an AGI would be “quite doable”. We don’t even know, for example, whether DNF is learnable in polynomial time under the distribution free assumption.
I would definitely call it an open research problem to provide PAC-style bounds for more complicated hypothesis spaces and learning settings. But that doesn’t mean it’s impossible or un-doable, just that it’s an open research problem. I want a limitary theorem proved before I go calling things impossible.