Normative AI—Solve all of the philosophical problems ahead of time, and code the solutions into the AI. Black-Box Metaphilosophical AI—Program the AI to use the minds of one or more human philosophers as a black box to help it solve philosophical problems, without the AI builders understanding what “doing philosophy” actually is. White-Box Metaphilosophical AI—Understand the nature of philosophy well enough to specify “doing philosophy” as an algorithm and code it into the AI.
So after giving this issue some thought: I’m not sure to what extent a white-box metaphilosopical AI will actually be possible.
For instance, consider the Repugnant Conclusion. Derek Parfit considered some dilemmas in population ethics, put together possible solutions at them, and then noted that the solutions led to an outcome which again seemed unacceptable—but also unavoidable. Once his results had become known, a number of other thinkers started considering the problem and trying to find a way way around those results.
Now, why was the Repugnant Conclusion considered unacceptable? For that matter, why were the dilemmas whose solutions led to the RC considered “dilemmas” in the first place? Not because any of them would have violated any logical rules of inference. Rather, we looked at them and thought “no, my morality says that that is wrong”, and then (engaging in motivated cognition) began looking for a consistent way to avoid having to accept the result. In effect, our minds contained dynamics which rejected the RC as a valid result, but that rejection came from our subconscious values, not from any classical reasoning rule that you could implement in an algorithm. Or you could conceivably implement the rule in the algorithm if you had a thorough understanding of our values, but that’s not of much help if the algorithm is supposed to figure out our values.
You can generalize this problem to all kinds of philosophy. In decision theory, we already have an intuitive value of what “winning” means, and are trying to find a way to formalize it in a way that fits our value. In epistemology, we have some standards about the kind of “truth” that we value, and are trying to come up with a system that obeys those standards. Etc.
The root problem is that classification and inference require values. As Watanabe (1974) writes:
According to the theorem of the Ugly Duckling, any pair of
nonidentical objects share an equal number of predicates as any other
pair of nonidentical objects, insofar as the number of predicates is
finite [10], [12]. That is to say, from a logical point of view there
is no such thing as a natural kind. In the case of pattern
recognition, the new arrival shares the same number of predicates with
any other paradigm of any class. This shows that pattern recognition
is a logically indeterminate problem. The class-defining properties
are generalizations of certain of the properties shared by the
paradigms of the class. Which of the properties should be used for
generalization is not logically defined. If it were logically
determinable, then pattern recognition would have a definite answer in
violation of the theorem of the Ugly Duckling.
This conclusion is somewhat disturbing because our empirical
knowledge is based on natural kinds of objects. The source of the
trouble lies in the fact that we were just counting the number of
predicates in the foregoing, treating them as if they were all equally
important. The fact is that some predicates are more important than
some others. Objects are similar if they share a large number of
important predicates.
Important in what scale? We have to conclude that a predicate is
important if it leads to a classification that is useful for some
purpose. From a logical point of view, a whale can be put together in
the same box with a fish or with an elephant. However, for the purpose
of building an elegant zoological theory, it is better to put it
together with the elephant, and for classifying industries it is
better to put it together with the fish. The property characterizing
mammals is important for the purpose of theory building in biology,
while the property of living in water is more important for the
purpose of classification of industries.
The conclusion is that classification is a value-dependent task and
pattern recognition is mechanically possible only if we smuggle into
the machine the scale of importance of predicates. Alternatively, we
can introduce into the machine the scale of distance or similarity
between objects. This seems to be an innocuous set of auxiliary data,
but in reality we are thereby telling the machine our value judgment,
which is of an entirely extra-logical nature. The human mind has an
innate scale of importance of predicates closely related to the
sensory organs. This scale of importance seems to have been developed
during the process of evolution in such a way as to help maintain and
expand life [12], [14].
“Progress” in philosophy essentially means “finding out more about the kinds of things that we value, drawing such conclusions that our values say are correct and useful”. I am not sure how one could make an AI make progress in philosophy if we didn’t already have a clear understanding of what our values were, so “white-box metaphilosophy” seems to just reduce back to a combination of “normative AI” and “black-box metaphilosophy”.
Coincidentally, I ended up reading Evolutionary Psychology: Controversies, Questions, Prospects, and Limitations today, and noticed that it makes a number of points that could be interpreted in a similar light: in that humans do not really have a “domain-general rationality”, and that instead we have specialized learning and reasoning mechanisms, each of which are carrying out a specific evolutionary purpose and which are specialized for extracting information that’s valuable in light of the evolutionary pressures that (used to) prevail. In other words, each of them carries out inferences that are designed to further some specific evolutionary value that helped contribute to our inclusive fitness.
The paper doesn’t spell out the obvious implication, since that isn’t its topic, but it seems pretty clear to me: since our various learning and reasoning systems are based on furthering specific values, our philosophy has also been generated as a combination of such various value-laden systems, and we can’t expect an AI reasoner to develop a philosophy that we’d approve of unless its reasoning mechanisms also embody the same values.
That said, it does suggest a possible avenue of attack on the metaphilosophy issue… figure out exactly what various learning mechanisms we have and which evolutionary purposes they had, and then use that data to construct learning mechanisms that carry out similar inferences as humans do.
Quotes:
Hypotheses about motivational priorities are required to
explain empirically discovered phenomena, yet they are not
contained within domain-general rationality theories. A
mechanism of domain-general rationality, in the case of
jealousy, cannot explain why it should be “rational” for
men to care about cues to paternity certainty or for women
to care about emotional cues to resource diversion. Even
assuming that men “rationally” figured out that other men
having sex with their mates would lead to paternity uncertainty, why should men care about
cuckoldry
to begin
with? In order to explain sex differences in motivational
concerns, the “rationality” mechanism must be coupled
with auxiliary hypotheses that specify the origins of the sex
differences in motivational priorities. [...]
The problem of combinatorial explosion.
Domain-general theories of rationality imply a deliberate cal-
culation of ends and a sample space of means to achieve those
ends. Performing the computations needed to sift through that
sample space requires more time than is available for solving
many adaptive problems, which must be solved in real time.
Consider a man coming home from work early and discovering his wife in bed with another man. This circumstance
typically leads to immediate jealousy, rage, violence, and
sometimes murder (Buss, 2000; Daly & Wilson, 1988). Are
men pausing to rationally deliberate over whether this act
jeopardizes their paternity in future offspring and ultimate
reproductive fitness, and then becoming enraged as a consequence of this rational deliberation? The predictability and
rapidity of men’s jealousy in response to cues of threats to
paternity points to a specialized psychological circuit rather
than a response caused by deliberative domain-general rational thought. Dedicated psychological adaptations, because
they are activated in response to cues to their corresponding
adaptive problems, operate more efficiently and effectively for
many adaptive problems. A domain-general mechanism
“must evaluate all alternatives it can define. Permutations
being what they are, alternatives increase exponentially as the
problem complexity increases” (Cosmides & Tooby, 1994, p.
94). Consequently, combinatorial explosion paralyzes a truly
domain-general mechanism (Frankenhuis & Ploeger, 2007). [...]
In sum, domain-general mechanisms such as “rationality” fail to provide plausible alternative explanations for
psychological phenomena discovered by evolutionary psychologists. They are invoked post hoc, fail to generate
novel empirical predictions, fail to specify underlying motivational priorities, suffer from paralyzing combinatorial
explosion, and imply the detection of statistical regularities
that cannot be, or are unlikely to be, learned or deduced
ontogenetically. It is important to note that there is no
single criterion for rationality that is independent of adaptive domain. [...]
The term
learning
is sometimes used as an explana-
tion for an observed effect and is the simple claim that
something in the organism changes as a consequence of
environmental input. Invoking “learning” in this sense,
without further specification, provides no additional explanatory value for the observed phenomenon but only
regresses its cause back a level. Learning requires evolved
psychological adaptations, housed in the brain, that
enable
learning to occur: “After all, 3-pound cauliflowers do not
learn, but 3-pound brains do” (Tooby & Cosmides, 2005, p.
31). The key explanatory challenge is to identify the nature
of the underlying learning adaptations that enable humans
to change their behavior in functional ways as a consequence of particular forms of environmental input.
Although the field of psychology lacks a complete
understanding of the nature of these learning adaptations,
enough evidence exists to draw a few reasonable conclu-
sions. Consider three concrete examples: (a) People learn
to avoid having sex with their close genetic relatives (learned incest avoidance); (b) people learn to avoid eating foods that may contain toxins (learned food aversions); (c)
people learn from their local peer group which actions lead
to increases in status and prestige (learned prestige criteria). There are compelling theoretical arguments and empirical evidence that each of these forms of learning is best
explained by evolved learning adaptations that have at least
some specialized design features, rather than by a single
all-purpose general learning adaptation (Johnston, 1996).
Stated differently, evolved learning adaptations must have
at least some content-specialized attributes, even if they
share some components. [...]
These three forms of learning—incest avoidance, food
aversion, and prestige criteria—require at least some content-specific specializations to function properly. Each op-
erates on the basis of inputs from different sets of cues:
coresidence during development, nausea paired with food
ingestion, and group attention structure. Each has different
functional output: avoidance of relatives as sexual partners,
disgust at the sight and smell of specific foods, and emulation of those high in prestige. It is important to note that
each form of learning solves a different adaptive problem.
There are four critical conclusions to draw from this
admittedly brief and incomplete analysis. First, labeling
something as “learned” does not, by itself, provide a satisfactory scientific explanation any more than labeling
something as “evolved” does; it is simply the claim that
environmental input is one component of the causal process
by which change occurs in the organism in some way.
Second, “learned” and “evolved” are not competing explanations; rather, learning requires evolved psychological
mechanisms, without which learning could not occur.
Third, evolved learning mechanisms are likely to be more
numerous than traditional conceptions have held in psychology, which typically have been limited to a few highly
general learning mechanisms such as classical and operant
conditioning. Operant and classical conditioning are important, of course, but they contain many specialized adaptive
design features rather than being domain general (Ohman
& Mineka, 2003). And fourth, evolved learning mechanisms are at least somewhat specific in nature, containing
particular design features that correspond to evolved solutions to qualitatively distinct adaptive problems.
I always suspected that natural kinds depended on an underdetermined choice of properties, but I had no idea there was or could be a theorem saying so. Thanks for pointing this out.
Does a similar point apply to Solomonoff Induction? How does the minimum length of the program necessary to generate a proposition, vary when we vary the properties our descriptive language uses?
So after giving this issue some thought: I’m not sure to what extent a white-box metaphilosopical AI will actually be possible.
For instance, consider the Repugnant Conclusion. Derek Parfit considered some dilemmas in population ethics, put together possible solutions at them, and then noted that the solutions led to an outcome which again seemed unacceptable—but also unavoidable. Once his results had become known, a number of other thinkers started considering the problem and trying to find a way way around those results.
Now, why was the Repugnant Conclusion considered unacceptable? For that matter, why were the dilemmas whose solutions led to the RC considered “dilemmas” in the first place? Not because any of them would have violated any logical rules of inference. Rather, we looked at them and thought “no, my morality says that that is wrong”, and then (engaging in motivated cognition) began looking for a consistent way to avoid having to accept the result. In effect, our minds contained dynamics which rejected the RC as a valid result, but that rejection came from our subconscious values, not from any classical reasoning rule that you could implement in an algorithm. Or you could conceivably implement the rule in the algorithm if you had a thorough understanding of our values, but that’s not of much help if the algorithm is supposed to figure out our values.
You can generalize this problem to all kinds of philosophy. In decision theory, we already have an intuitive value of what “winning” means, and are trying to find a way to formalize it in a way that fits our value. In epistemology, we have some standards about the kind of “truth” that we value, and are trying to come up with a system that obeys those standards. Etc.
The root problem is that classification and inference require values. As Watanabe (1974) writes:
“Progress” in philosophy essentially means “finding out more about the kinds of things that we value, drawing such conclusions that our values say are correct and useful”. I am not sure how one could make an AI make progress in philosophy if we didn’t already have a clear understanding of what our values were, so “white-box metaphilosophy” seems to just reduce back to a combination of “normative AI” and “black-box metaphilosophy”.
Coincidentally, I ended up reading Evolutionary Psychology: Controversies, Questions, Prospects, and Limitations today, and noticed that it makes a number of points that could be interpreted in a similar light: in that humans do not really have a “domain-general rationality”, and that instead we have specialized learning and reasoning mechanisms, each of which are carrying out a specific evolutionary purpose and which are specialized for extracting information that’s valuable in light of the evolutionary pressures that (used to) prevail. In other words, each of them carries out inferences that are designed to further some specific evolutionary value that helped contribute to our inclusive fitness.
The paper doesn’t spell out the obvious implication, since that isn’t its topic, but it seems pretty clear to me: since our various learning and reasoning systems are based on furthering specific values, our philosophy has also been generated as a combination of such various value-laden systems, and we can’t expect an AI reasoner to develop a philosophy that we’d approve of unless its reasoning mechanisms also embody the same values.
That said, it does suggest a possible avenue of attack on the metaphilosophy issue… figure out exactly what various learning mechanisms we have and which evolutionary purposes they had, and then use that data to construct learning mechanisms that carry out similar inferences as humans do.
Quotes:
I always suspected that natural kinds depended on an underdetermined choice of properties, but I had no idea there was or could be a theorem saying so. Thanks for pointing this out.
Does a similar point apply to Solomonoff Induction? How does the minimum length of the program necessary to generate a proposition, vary when we vary the properties our descriptive language uses?