Shane, I think we agree on essential Bayesian principles—there’s structure that’s useful for generic prediction, which is sensitive only to the granularity of your sensory information; and then there’s structure that’s useful for decision-making. In principle, all structure worth thinking about is decision-making structure, but in practice we can usually factor out the predictive structure just as we factor out probabilities in decision-making.
But I would further say that decision-making structure can be highly sensitive to terminal values in a way that contradicts the most natural predictive structure. Not always, but sometimes.
If I handed you a set of ingestible substances, the “poisons” would not be described by any of the most natural local categorizations. Now, this doesn’t make “poison” an unnatural, value-sensitive category, because you might be interested in the “poison” category for purely predictive purposes, and the boundary can be tested experimentally.
But it illustrates the general idea: the potential poison, in interacting with the complicated human machine, takes on a complicated boundary that doesn’t match the grain of any local boundaries you would draw around substances.
In the same way, if you regard human morality as a complicated machine (and don’t forget the runtime redefinition of terminal values when confronted with new borderline cases a la Terry Schiavo), then the boundaries of human instrumental values are only going to be understandable by reference to the complicated idealized abstract dynamic of human morality, and not to any structure outside that. In the same way that poisons cause death, instrumental values cause rightness.
The boundaries we need, won’t emerge just from trying to predict things that are not interactions with the idealized abstract dynamic of human morality.
Sure, an AI might learn to predict positive and negative reactions from human programmers. But that’s not the same as the idealized abstract dynamic we want. Humans have a positive reaction to things like cocaine, and rationalized arguments containing flaws they don’t know about. Those also get humans to say “Yes” instead of “No”.
In general, categories formed just to predict human behavior are going to treat what we would regard as “invalid” alterations of the humans, like reprogramming them, as being among “the causes of saying-yes behavior”. Otherwise you’re going to make the wrong prediction!
There’s no predictive motive to idealize out the part that we would regard as morality, to distinguish “right” from “what a human says is right”, and thereby distinguish morality from “things that make humans say yes” in ways that include “invalid” manipulations like drugs.
You’re not going to get something like CEV as a natural predictive category. The main reason to think about that particular idealized computation is if your terminal values care specifically about it.
Shane, I think we agree on essential Bayesian principles—there’s structure that’s useful for generic prediction, which is sensitive only to the granularity of your sensory information; and then there’s structure that’s useful for decision-making. In principle, all structure worth thinking about is decision-making structure, but in practice we can usually factor out the predictive structure just as we factor out probabilities in decision-making.
But I would further say that decision-making structure can be highly sensitive to terminal values in a way that contradicts the most natural predictive structure. Not always, but sometimes.
If I handed you a set of ingestible substances, the “poisons” would not be described by any of the most natural local categorizations. Now, this doesn’t make “poison” an unnatural, value-sensitive category, because you might be interested in the “poison” category for purely predictive purposes, and the boundary can be tested experimentally.
But it illustrates the general idea: the potential poison, in interacting with the complicated human machine, takes on a complicated boundary that doesn’t match the grain of any local boundaries you would draw around substances.
In the same way, if you regard human morality as a complicated machine (and don’t forget the runtime redefinition of terminal values when confronted with new borderline cases a la Terry Schiavo), then the boundaries of human instrumental values are only going to be understandable by reference to the complicated idealized abstract dynamic of human morality, and not to any structure outside that. In the same way that poisons cause death, instrumental values cause rightness.
The boundaries we need, won’t emerge just from trying to predict things that are not interactions with the idealized abstract dynamic of human morality.
Sure, an AI might learn to predict positive and negative reactions from human programmers. But that’s not the same as the idealized abstract dynamic we want. Humans have a positive reaction to things like cocaine, and rationalized arguments containing flaws they don’t know about. Those also get humans to say “Yes” instead of “No”.
In general, categories formed just to predict human behavior are going to treat what we would regard as “invalid” alterations of the humans, like reprogramming them, as being among “the causes of saying-yes behavior”. Otherwise you’re going to make the wrong prediction!
There’s no predictive motive to idealize out the part that we would regard as morality, to distinguish “right” from “what a human says is right”, and thereby distinguish morality from “things that make humans say yes” in ways that include “invalid” manipulations like drugs.
You’re not going to get something like CEV as a natural predictive category. The main reason to think about that particular idealized computation is if your terminal values care specifically about it.