k-nearest-neighbors seems to be a reasonable method of interpolation, but what about extrapolation? I’m having trouble seeing how nonparametric methods can deal with regions far away from existing data points.
I’m having trouble seeing how nonparametric methods can deal with regions far away from existing data points.
With very wide predictive distributions, if they are Bayesian nonparametric methods. See the 95% credible intervals (shaded pink) in Figure 2 on page 4, and in Figure 3 on page 5, of Mark Ebden’s Gaussian Processes for Regression: A Quick Introduction.
(Carl Edward Rasmussen at Cambridge and Arman Melkumyan at the University of Sydney maintain sites with more links about Gaussian processes and Bayesian nonparametric regression. Also see Bayesian neural networks which can justifiably extrapolate sharper predictive distributions than Gaussian process priors can.)
[. . .] we look at two quantitative tests of Gaussian processes as an account of human function learning: reproducing the order of difficulty of learning functions of different types, and extrapolation performance. [. . .]
Predicting and explaining people’s capacity for generalization – from stimulus-response pairs to judgments about a functional relationship between variables – is the second key component of our account. This capacity is assessed in the way in which people extrapolate, making judgments about stimuli they have not encountered before. [. . .] Both people and the model extrapolate near optimally on the linear function, and reasonably accurate extrapolation also occurs for the exponential and quadratic function. However, there is a bias towards a linear slope in the extrapolation of the exponential and quadratic functions[. . .]
The first author, Tom Griffiths, is the director of the Computational Cognitive Science Lab at UC Berkeley, and Lucas and Williams are graduate students there. The work of the Computational Cognitive Science Lab is very close to the mission of Less Wrong:
The basic goal of our research is understanding the computational and statistical foundations of human inductive inference, and using this understanding to develop both better accounts of human behavior and better automated systems [. . .]
For inductive problems, this usually means developing models based on the principles of probability theory, and exploring how ideas from artificial intelligence, machine learning, and statistics (particularly Bayesian statistics) connect to human cognition. We test these models through experiments with human subjects[. . .]
Probabilistic models provide a way to explore many of the questions that are at the heart of cognitive science. [. . .]
If the data is actually linear or anything remotely resembling linear, then on distant points a linear model will do much better than a nearest-neighbor estimator. Whereas on nearby points, a nearest-neighbor estimator will do as well as a linear model given enough data. So on distant points nearest-neighbor only works if the curve is a particular shape (constant), while on near points it works so long as the curve has anything resembling local neighborhoods.
Well, yes. Nonparametric methods use similarity of neighbors. To predict that which has never been seen before—which is not, on its surface, like things seen before—you need modular and causal models of what’s going on behind the scenes. At that point it’s parametric or bust.
Your use of the terms parametric vs. nonparametric doesn’t seem to be that used by people working in nonparametric Bayesian statistics, where the distinction is more like whether your statistical model has a fixed finite number of parameters or has no such bound. Methods such as Dirichlet processes, and its many variants (Hierarchical DP, HDP-HMM, etc), go beyond simple modeling of surface similarities using similarity of neighbours.
See, for example, this list of publications coauthored by Michael Jordan:
Parametric methods aren’t any better at extrapolation. They are arguably worse, in that they make strong unjustified assumptions in regions with no data. The rule is “don’t extrapolate if you can possibly avoid it”. (And you avoid it by collecting relevant data.)
I don’t see any examples of nonparametric extrapolation that have similar success.
A major problem in Friendly AI is how to extrapolate human morality into transhuman realms. I don’t know of any parametric approach to this problem that isn’t without serious difficulties, but “nonparametric” doesn’t really seem to help either. What does your advice “don’t extrapolate if you can possibly avoid it” imply in this case? Pursue a non-AI path instead?
What does your advice “don’t extrapolate if you can possibly avoid it” imply in this case?
I distinguish “extrapolation” in the sense of an extending an empirical regularity (as in Moore’s law) from inferring a logical consequence of of well-supported theory (as in the black hole prediction). This is really a difference of degree, not kind, but for human science, this distinction is a good abstraction. For FAI, I’d say the implication is that an FAI’s morality-predicting component should be a working model of human brains in action.
I’m in essential agreement with Wei here. Nonparametric extrapolation sounds like a contradiction to me (though I’m open to counterexamples).
The “nonparametric” part of the FAI process is where you capture a detailed picture of human psychology as a starting point for extrapolation, instead of trying to give the AI Four Great Moral Principles. Applying extrapolative processes like “reflect to obtain self-judgments” or “update for the AI’s superior knowledge” to this picture is not particularly nonparametric—in a sense it’s not an estimator at all, it’s a constructor. But yes, the “extrapolation” part is definitely not a nonparametric extrapolation, I’m not really sure what that would mean.
But every extrapolation process starts with gathering detailed data points, so it confused me that you focused on “nonparametric” as a response to Robin’s argument. If Robin is right, an FAI should discard most of the detailed picture of human psychology it captures during its extrapolation process as errors and end up with a few simple moral principles on its own.
Can you clarify which of the following positions you agree with?
An FAI will end up with a few simple moral principles on its own.
We might as well do the extrapolation ourselves and program the results into the FAI.
Robin’s argument is wrong or doesn’t apply to the kind of moral extrapolation an FAI would do. It will end up with a transhuman morality that’s no less complex than human morality.
(Presumably you don’t agree with 2. I put it in just for completeness.)
2, certainly disagree. 1 vs. 3, don’t know in advance. But an FAI should not discard its detailed psychology as “error”; an AI is not subject to most of the “error” that we are talking about here. It could, however, discard various conclusions as specifically erroneous after having actually judged the errors, which is not at all the sort of correction represented by using simple models or smoothed estimators.
A major problem in Friendly AI is how to extrapolate human morality into >transhuman realms. I don’t know of any parametric approach to this problem that >isn’t without serious difficulties, but “nonparametric” doesn’t really seem to help >either. What does your advice “don’t extrapolate if you can possibly avoid it” imply in >this case? Pursue a non-AI path instead?
I think it implies that a Friendly sysop should not dream up a transhuman society and then try to reshape humanity into that society, but rather let us evolve at our own pace just attending to things that are relevant at each time.
k-nearest-neighbors seems to be a reasonable method of interpolation, but what about extrapolation? I’m having trouble seeing how nonparametric methods can deal with regions far away from existing data points.
With very wide predictive distributions, if they are Bayesian nonparametric methods. See the 95% credible intervals (shaded pink) in Figure 2 on page 4, and in Figure 3 on page 5, of Mark Ebden’s Gaussian Processes for Regression: A Quick Introduction.
(Carl Edward Rasmussen at Cambridge and Arman Melkumyan at the University of Sydney maintain sites with more links about Gaussian processes and Bayesian nonparametric regression. Also see Bayesian neural networks which can justifiably extrapolate sharper predictive distributions than Gaussian process priors can.)
See also Modeling human function learning with Gaussian processes, by Tom Griffiths, Chris Lucas, Joseph Jay Williams, and Michael Kalish, in NIPS 21:
The first author, Tom Griffiths, is the director of the Computational Cognitive Science Lab at UC Berkeley, and Lucas and Williams are graduate students there. The work of the Computational Cognitive Science Lab is very close to the mission of Less Wrong:
Griffiths’s page recommends the foundations section of the lab publication list.
There are always “nearest” neighbors. You might wish for more data than you have, but you must make do with what you have.
If the data is actually linear or anything remotely resembling linear, then on distant points a linear model will do much better than a nearest-neighbor estimator. Whereas on nearby points, a nearest-neighbor estimator will do as well as a linear model given enough data. So on distant points nearest-neighbor only works if the curve is a particular shape (constant), while on near points it works so long as the curve has anything resembling local neighborhoods.
Well, yes. Nonparametric methods use similarity of neighbors. To predict that which has never been seen before—which is not, on its surface, like things seen before—you need modular and causal models of what’s going on behind the scenes. At that point it’s parametric or bust.
Your use of the terms parametric vs. nonparametric doesn’t seem to be that used by people working in nonparametric Bayesian statistics, where the distinction is more like whether your statistical model has a fixed finite number of parameters or has no such bound. Methods such as Dirichlet processes, and its many variants (Hierarchical DP, HDP-HMM, etc), go beyond simple modeling of surface similarities using similarity of neighbours.
See, for example, this list of publications coauthored by Michael Jordan:
Bayesian Nonparametrics http://www.cs.berkeley.edu/~jordan/bnp.html
Parametric methods aren’t any better at extrapolation. They are arguably worse, in that they make strong unjustified assumptions in regions with no data. The rule is “don’t extrapolate if you can possibly avoid it”. (And you avoid it by collecting relevant data.)
Parametric extrapolation actually works quite well in some cases. I’ll cite a few examples that I’m familiar with:
prediction of black holes by the general theory of relativity
Moore’s Law
predicting the number of operations needed to factor an integer of given size
I don’t see any examples of nonparametric extrapolation that have similar success.
A major problem in Friendly AI is how to extrapolate human morality into transhuman realms. I don’t know of any parametric approach to this problem that isn’t without serious difficulties, but “nonparametric” doesn’t really seem to help either. What does your advice “don’t extrapolate if you can possibly avoid it” imply in this case? Pursue a non-AI path instead?
I distinguish “extrapolation” in the sense of an extending an empirical regularity (as in Moore’s law) from inferring a logical consequence of of well-supported theory (as in the black hole prediction). This is really a difference of degree, not kind, but for human science, this distinction is a good abstraction. For FAI, I’d say the implication is that an FAI’s morality-predicting component should be a working model of human brains in action.
I’m in essential agreement with Wei here. Nonparametric extrapolation sounds like a contradiction to me (though I’m open to counterexamples).
The “nonparametric” part of the FAI process is where you capture a detailed picture of human psychology as a starting point for extrapolation, instead of trying to give the AI Four Great Moral Principles. Applying extrapolative processes like “reflect to obtain self-judgments” or “update for the AI’s superior knowledge” to this picture is not particularly nonparametric—in a sense it’s not an estimator at all, it’s a constructor. But yes, the “extrapolation” part is definitely not a nonparametric extrapolation, I’m not really sure what that would mean.
But every extrapolation process starts with gathering detailed data points, so it confused me that you focused on “nonparametric” as a response to Robin’s argument. If Robin is right, an FAI should discard most of the detailed picture of human psychology it captures during its extrapolation process as errors and end up with a few simple moral principles on its own.
Can you clarify which of the following positions you agree with?
An FAI will end up with a few simple moral principles on its own.
We might as well do the extrapolation ourselves and program the results into the FAI.
Robin’s argument is wrong or doesn’t apply to the kind of moral extrapolation an FAI would do. It will end up with a transhuman morality that’s no less complex than human morality.
(Presumably you don’t agree with 2. I put it in just for completeness.)
2, certainly disagree. 1 vs. 3, don’t know in advance. But an FAI should not discard its detailed psychology as “error”; an AI is not subject to most of the “error” that we are talking about here. It could, however, discard various conclusions as specifically erroneous after having actually judged the errors, which is not at all the sort of correction represented by using simple models or smoothed estimators.
I think connecting this to FAI is far-fetched. To talk technically about FAI you need to introduce more tools first.
I think it implies that a Friendly sysop should not dream up a transhuman society and then try to reshape humanity into that society, but rather let us evolve at our own pace just attending to things that are relevant at each time.