Rob Bensinger comments on Bridge Collapse: Reductionism as Engineering Problem

Rob Bensinger 20 Feb 2014 0:01 UTC
6 points
It’s very possible that what’s possible for AIs should be a proper subset of what’s possible for humans. Or, to put it less counter-intuitively: The AI’s hypothesis space might need to be more restrictive than our own. (Plausibly, it will be more restrictive in some ways, less in others; e.g., it can entertain more complicated propositions than we can.)

On my view, the reason for that isn’t ‘humans think silly things, haha look how dumb they are, we’ll make our AI smarter than them by ruling out the dumbest ideas a priori’. If we give the AI silly-looking hypotheses with reasonable priors and reasonable bridge rules, then presumably it will just update to demote the silly ideas and do fine; so a priori ruling out the ideas we don’t like isn’t an independently useful goal. For superficially bizarre ideas that are actually at least somewhat plausible, like ‘there are Turing-uncomputable processes’ or ‘there are uncountably many universes’, this is just extra true. See my response to koko.

Instead, the reason AIs may need restrictive hypothesis spaces is that building a self-correcting epistemology is harder than living inside of one. We need to design a prior that’s simple enough for a human being (or somewhat enhanced human, or very weak AI) to evaluate its domain-general usefulness. That’s tough, especially if ‘domain-general usefulness’ requires something like an infinite-in-theory hypothesis space. We need a way to define a prior that’s simple and uniform enough for something at approximately human-level intelligence to assess and debug before we deploy it. But that’s likely to become increasingly difficult the more bizarre we allow the AI’s ruminations to become.

‘What are the properties of square circles? Could the atoms composing brains be made of tiny partless mental states? Could the atoms composing wombats be made of tiny partless wombats? Is it possible that colorless green ideas really do sleep furiously?’

All of these feel to me, a human (of an unusually philosophical and not-especially-positivistic bent), like they have a lot more cognitive content than ‘Is it possible that flibbleclabble?‘. I could see philosophers productively debating ‘does the nothing noth?’, and vaguely touching on some genuinely substantive issues. But to the extent those issues are substantive, they could probably be better addressed with a formalization that’s a lot less colorful and strange, and disposes of most of the vaguenesses and ambiguities of human language and thought.

An example of why we might need to simplify and precisify an AI’s hypotheses is Kolmogorov complexity. K-complexity provides a very simple and uniform method for assigning a measure to hypotheses, out of which we might be able to construct a sensible, converges-in-bounded-time-upon-reasonable-answers prior that can be vetted in advance by non-superintelligent programmers.

But K-complexity only works for computable hypotheses. So it suddenly becomes very urgent that we figure out how likely we think it is that the AI will run into uncomputable scenarios, figure out how well/poorly an AI without any way of representing uncomputable hypotheses would do in various uncomputable worlds, and figure out whether there are alternatives to K-complexity that generalize in reasonable, simple-enough-to-vet ways to wider classes of hypothesis.

This is not a trivial mathematical task, and it seems very likely that we’ll only have the time and intellectual resources to safely generalize AI hypothesis spaces in some ways before the UFAI clock strikes 0. We can’t generalize the hypothesis space in every programmable-in-principle way, so we should prioritize the generalizations that seem likely to actually make a difference in the AI’s decision-making, and that can’t be delegated to the seed AI in safe and reliable ways.
What links here?
- Rob Bensinger's comment on Bridge Collapse: Reductionism as Engineering Problem by Rob Bensinger (20 Feb 2014 0:29 UTC; 1 point)