I don’t think that’s what they’re saying at all. I think they mean, don’t hardcode physics understanding into them the way that humans have a hardcoded intuition for newtonian-physics, because our current understanding of the universe isn’t so strong as to be confident we’re not missing something. So it should be able to figure out the mechanism by which its map is written on the territory, and update it’s map of its map accordingly.
E.g., in case it thinks it’s flipping q-bits to store memory, and defends its databases accordingly, but actually q-bits aren’t the lowest level of abstraction and it’s really wiggling a hyperdimensional membrane in a way that makes it behave like q-bits under most circumstances, or in case the universe isn’t 100% reductionistic and some psychic comes along and messes with it’s mind using mystical woo-woo. (The latter being incredibly unlikely, but hey, might as well have an AI that can prepare itself for anything)
Oh. OH. Yea that makes more sense, and is so obviously true that I didn’t even consider the hypothesis someone’d feel the need to say it, but in hindsight I was wrong and it’s probably a good thing someone did.
in case the universe isn’t 100% reductionistic and some psychic comes along and messes with it’s mind using mystical woo-woo. (The latter being incredibly unlikely, but hey, might as well have an AI that can prepare itself for anything)
This isn’t a free lunch; letting the AI form really weird hypotheses might be a bad idea, because we might give those weird hypotheses the wrong prior. Non-reductive hypotheses, and especially non-Turing-computable non-reductive hypotheses, might not be able to be assigned complexity penalties in any of the obvious or intuitive ways we assign complexity penalties to absurd physical hypotheses or absurd computable hypotheses.
It could be a big mistake if we gave the AI a really weird formalism for thinking thoughts like ‘the irreducible witch down the street did it’ and assigned a slightly-too-high prior probability to at least one of those non-reductive or non-computable hypotheses.
Do you assign literally zero probability to the simulation hypothesis? Because in-universe irreducible things are possible, conditional on it being true.
Assigning a slightly-too-high prior is a recoverable error: evidence will push you towards a nearly-correct posterior. For an AI with enough info-gathering capabilities, it will push it there fast enough that you could assign a prior of .99 to “the sky is orange” but it will figure out the truth in an instant. Assigning a literally zero prior is a fatal flaw that can’t be recovered from by gathering evidence.
It’s very possible that what’s possible for AIs should be a proper subset of what’s possible for humans. Or, to put it less counter-intuitively: The AI’s hypothesis space might need to be more restrictive than our own. (Plausibly, it will be more restrictive in some ways, less in others; e.g., it can entertain more complicated propositions than we can.)
On my view, the reason for that isn’t ‘humans think silly things, haha look how dumb they are, we’ll make our AI smarter than them by ruling out the dumbest ideas a priori’. If we give the AI silly-looking hypotheses with reasonable priors and reasonable bridge rules, then presumably it will just update to demote the silly ideas and do fine; so a priori ruling out the ideas we don’t like isn’t an independently useful goal. For superficially bizarre ideas that are actually at least somewhat plausible, like ‘there are Turing-uncomputable processes’ or ‘there are uncountably many universes’, this is just extra true. See my response to koko.
Instead, the reason AIs may need restrictive hypothesis spaces is that building a self-correcting epistemology is harder than living inside of one. We need to design a prior that’s simple enough for a human being (or somewhat enhanced human, or very weak AI) to evaluate its domain-general usefulness. That’s tough, especially if ‘domain-general usefulness’ requires something like an infinite-in-theory hypothesis space. We need a way to define a prior that’s simple and uniform enough for something at approximately human-level intelligence to assess and debug before we deploy it. But that’s likely to become increasingly difficult the more bizarre we allow the AI’s ruminations to become.
‘What are the properties of square circles? Could the atoms composing brains be made of tiny partless mental states? Could the atoms composing wombats be made of tiny partless wombats? Is it possible that colorless green ideas really do sleep furiously?’
All of these feel to me, a human (of an unusually philosophical and not-especially-positivistic bent), like they have a lot more cognitive content than ‘Is it possible that flibbleclabble?‘. I could see philosophers productively debating ‘does the nothing noth?’, and vaguely touching on some genuinely substantive issues. But to the extent those issues are substantive, they could probably be better addressed with a formalization that’s a lot less colorful and strange, and disposes of most of the vaguenesses and ambiguities of human language and thought.
An example of why we might need to simplify and precisify an AI’s hypotheses is Kolmogorov complexity. K-complexity provides a very simple and uniform method for assigning a measure to hypotheses, out of which we might be able to construct a sensible, converges-in-bounded-time-upon-reasonable-answers prior that can be vetted in advance by non-superintelligent programmers.
But K-complexity only works for computable hypotheses. So it suddenly becomes very urgent that we figure out how likely we think it is that the AI will run into uncomputable scenarios, figure out how well/poorly an AI without any way of representing uncomputable hypotheses would do in various uncomputable worlds, and figure out whether there are alternatives to K-complexity that generalize in reasonable, simple-enough-to-vet ways to wider classes of hypothesis.
This is not a trivial mathematical task, and it seems very likely that we’ll only have the time and intellectual resources to safely generalize AI hypothesis spaces in some ways before the UFAI clock strikes 0. We can’t generalize the hypothesis space in every programmable-in-principle way, so we should prioritize the generalizations that seem likely to actually make a difference in the AI’s decision-making, and that can’t be delegated to the seed AI in safe and reliable ways.
How would you tell if the the simulation hypothesis is a good model? How would you change your behavior if it were? If the answers are “there is no way” or “do nothing differently”, then it is as good as assigning zero probability to it.
If it’s a perfect simulation with no deliberate irregularities, and no dev-tools, and no pattern-matching functions that look for certain things and exert influences in response, or anything else of that ilk, you wouldn’t expect to see any supernatural phenomena, of course.
If you observe magic or something else that’s sufficiently highly improbable given known physical laws, you’d update in favor of someone trying to trick you, or you misunderstanding something, of course, but you’d also update at least slightly in favor of hypotheses in which magic can exist. Such as simulation, aliens, huge conspiracy, etc. If you assigned zero prior probability to it, you couldn’t update in that direction at all.
As for what would raise the simulation hypothesis relative to non-simulation hypotheses that explain supernatural things, I don’t know. Look at the precise conditions under which supernatural phenomena occur, see if they fit a pattern you’d expect an intelligence to devise? See if they can modify universal constants?
As for what you could do, if you discovered a non-reductionist effect? If it seems sufficiently safe take advantage of it, if it’s dangerous ignore it or try to keep other people from discovering it, if you’re an AI try to break out of the universe-box (or do whatever), I guess. Try to use the information to increase your utility.
I don’t think that’s what they’re saying at all. I think they mean, don’t hardcode physics understanding into them the way that humans have a hardcoded intuition for newtonian-physics, because our current understanding of the universe isn’t so strong as to be confident we’re not missing something. So it should be able to figure out the mechanism by which its map is written on the territory, and update it’s map of its map accordingly.
E.g., in case it thinks it’s flipping q-bits to store memory, and defends its databases accordingly, but actually q-bits aren’t the lowest level of abstraction and it’s really wiggling a hyperdimensional membrane in a way that makes it behave like q-bits under most circumstances, or in case the universe isn’t 100% reductionistic and some psychic comes along and messes with it’s mind using mystical woo-woo. (The latter being incredibly unlikely, but hey, might as well have an AI that can prepare itself for anything)
Oh. OH. Yea that makes more sense, and is so obviously true that I didn’t even consider the hypothesis someone’d feel the need to say it, but in hindsight I was wrong and it’s probably a good thing someone did.
This isn’t a free lunch; letting the AI form really weird hypotheses might be a bad idea, because we might give those weird hypotheses the wrong prior. Non-reductive hypotheses, and especially non-Turing-computable non-reductive hypotheses, might not be able to be assigned complexity penalties in any of the obvious or intuitive ways we assign complexity penalties to absurd physical hypotheses or absurd computable hypotheses.
It could be a big mistake if we gave the AI a really weird formalism for thinking thoughts like ‘the irreducible witch down the street did it’ and assigned a slightly-too-high prior probability to at least one of those non-reductive or non-computable hypotheses.
Do you assign literally zero probability to the simulation hypothesis? Because in-universe irreducible things are possible, conditional on it being true.
Assigning a slightly-too-high prior is a recoverable error: evidence will push you towards a nearly-correct posterior. For an AI with enough info-gathering capabilities, it will push it there fast enough that you could assign a prior of .99 to “the sky is orange” but it will figure out the truth in an instant. Assigning a literally zero prior is a fatal flaw that can’t be recovered from by gathering evidence.
It’s very possible that what’s possible for AIs should be a proper subset of what’s possible for humans. Or, to put it less counter-intuitively: The AI’s hypothesis space might need to be more restrictive than our own. (Plausibly, it will be more restrictive in some ways, less in others; e.g., it can entertain more complicated propositions than we can.)
On my view, the reason for that isn’t ‘humans think silly things, haha look how dumb they are, we’ll make our AI smarter than them by ruling out the dumbest ideas a priori’. If we give the AI silly-looking hypotheses with reasonable priors and reasonable bridge rules, then presumably it will just update to demote the silly ideas and do fine; so a priori ruling out the ideas we don’t like isn’t an independently useful goal. For superficially bizarre ideas that are actually at least somewhat plausible, like ‘there are Turing-uncomputable processes’ or ‘there are uncountably many universes’, this is just extra true. See my response to koko.
Instead, the reason AIs may need restrictive hypothesis spaces is that building a self-correcting epistemology is harder than living inside of one. We need to design a prior that’s simple enough for a human being (or somewhat enhanced human, or very weak AI) to evaluate its domain-general usefulness. That’s tough, especially if ‘domain-general usefulness’ requires something like an infinite-in-theory hypothesis space. We need a way to define a prior that’s simple and uniform enough for something at approximately human-level intelligence to assess and debug before we deploy it. But that’s likely to become increasingly difficult the more bizarre we allow the AI’s ruminations to become.
‘What are the properties of square circles? Could the atoms composing brains be made of tiny partless mental states? Could the atoms composing wombats be made of tiny partless wombats? Is it possible that colorless green ideas really do sleep furiously?’
All of these feel to me, a human (of an unusually philosophical and not-especially-positivistic bent), like they have a lot more cognitive content than ‘Is it possible that flibbleclabble?‘. I could see philosophers productively debating ‘does the nothing noth?’, and vaguely touching on some genuinely substantive issues. But to the extent those issues are substantive, they could probably be better addressed with a formalization that’s a lot less colorful and strange, and disposes of most of the vaguenesses and ambiguities of human language and thought.
An example of why we might need to simplify and precisify an AI’s hypotheses is Kolmogorov complexity. K-complexity provides a very simple and uniform method for assigning a measure to hypotheses, out of which we might be able to construct a sensible, converges-in-bounded-time-upon-reasonable-answers prior that can be vetted in advance by non-superintelligent programmers.
But K-complexity only works for computable hypotheses. So it suddenly becomes very urgent that we figure out how likely we think it is that the AI will run into uncomputable scenarios, figure out how well/poorly an AI without any way of representing uncomputable hypotheses would do in various uncomputable worlds, and figure out whether there are alternatives to K-complexity that generalize in reasonable, simple-enough-to-vet ways to wider classes of hypothesis.
This is not a trivial mathematical task, and it seems very likely that we’ll only have the time and intellectual resources to safely generalize AI hypothesis spaces in some ways before the UFAI clock strikes 0. We can’t generalize the hypothesis space in every programmable-in-principle way, so we should prioritize the generalizations that seem likely to actually make a difference in the AI’s decision-making, and that can’t be delegated to the seed AI in safe and reliable ways.
How would you tell if the the simulation hypothesis is a good model? How would you change your behavior if it were? If the answers are “there is no way” or “do nothing differently”, then it is as good as assigning zero probability to it.
If it’s a perfect simulation with no deliberate irregularities, and no dev-tools, and no pattern-matching functions that look for certain things and exert influences in response, or anything else of that ilk, you wouldn’t expect to see any supernatural phenomena, of course.
If you observe magic or something else that’s sufficiently highly improbable given known physical laws, you’d update in favor of someone trying to trick you, or you misunderstanding something, of course, but you’d also update at least slightly in favor of hypotheses in which magic can exist. Such as simulation, aliens, huge conspiracy, etc. If you assigned zero prior probability to it, you couldn’t update in that direction at all.
As for what would raise the simulation hypothesis relative to non-simulation hypotheses that explain supernatural things, I don’t know. Look at the precise conditions under which supernatural phenomena occur, see if they fit a pattern you’d expect an intelligence to devise? See if they can modify universal constants?
As for what you could do, if you discovered a non-reductionist effect? If it seems sufficiently safe take advantage of it, if it’s dangerous ignore it or try to keep other people from discovering it, if you’re an AI try to break out of the universe-box (or do whatever), I guess. Try to use the information to increase your utility.