Imagine pondering creation of artificial life-force from combinations of mechanical parts; that sounds incredibly dangerous, and like a worthwhile area of study. One could spend a lot of time thinking in terms of life-force—how do we ensure that the life-force goo wont eat everything in it’s path? Should we stop research into steam locomotive to avoid such scenario?
Would you want to know if you are thinking in terms of irrelevant abstraction? We humans have capability of abstract thought; we love abstract thinking; some concepts are just abstraction porn though, only useful for tickling our ‘grand insights feel good’ thing.
If people had reasoned that way in the 18th century, they would have correctly predicted the risks of nanotech and maybe biotech. So I guess you should conclude that unfriendly AI risk is real, though far in the future… Anyway, how do you tell which concepts are “abstraction porn” and which aren’t?
If people had reasoned that way in the 18th century, they would have correctly predicted the risks of nanotech and maybe biotech.
How useful that would have been, though? I don’t think you can have a single useful insight about making safe nanotech or biotech from thinking in terms of abstract ‘life force’. edit: also one can end up predicting a lot of invalid stuff this way, like zombies...
Anyway, how do you tell which concepts are “abstraction porn” and which aren’t?
Concepts that are not built bottom up are usually abstraction porn.
This monolithic “intelligence” concept where you can have grand-feeling insights without having to be concerned with any hard details like algorithmic complexities, existing problem solving algorithms, the different aspects of intelligence such as problem solving, world model, sensory processing, without being consideration that the intelligence has to work in decentralized manner due to speed of light lag (and the parts of it have to implement some sort of efficient protocol for cooperation… mankind got such protocol, we call it morality), etc, is as suspicious as a concept can be. Ditto the ‘utility’ as per LW (shouldn’t be confused with utility as in a mathematical function inside some of current software).
edit: Actually, do you guys even use any concept built from bottom up to think of AI ?
How useful would knowing the AI would be using, say, A* search, as opposed to meta reasonings on what it is likely to be searching for? We know both from computer science and our own minds that effective heuristics exist to approximately solve most problems. The precise bottom up knowledge you refer to is akin to knowing that the travelling salesman problem can only be solved (assuming NP not P) in exponential time; the meta-knowledge “good polymomial time heuristics exist for most problems” is much more useful to predicting the future of AI.
The issue is not merely that you don’t have ground up definitions which respect the time constraints. The issue is that you don’t seem to have any ground-up definitions at all, i.e. not even for something like AIXI. The goals themselves lack any bottom up definitions.
Worst of all you build stuff from the dubious concepts like that monolithic “intelligence”.
Say, we want to make better microchips. We the engineers have to build it from bottom up, so we make some partially implemented intelligence to achieve such a goal, omitting the definition of what exactly is ‘best microchip’, omitting the real world goals, focussing the search (heuristics are about where you search!) and instead making it design smaller logical gates and then route up the chip, and perhaps figure out manufacturing. All doable with same methods, all to the point, strongly superhuman performance on subhuman hardware.
You build Oracle AI out of that monolithic “intelligence” concept, and tell it—I want a better microchip. This monolithic intelligence figures out how to take over the world to do so. You think, how do we prevent this monolithic intelligence concept from thinking about taking over the world? That looks like an incredibly difficult problem.
Or the orthogonality thesis. You think—are the goals of that monolithic intelligence arbitrary?
Meanwhile if you try to build bottom up or at least from the concepts with known bottom up definitions, well, something like number of paperclips in the universe is clearly more difficult than f(x,n) where x is the output on the screen at the step n and f is 1 if the operator responds with the reward button, 0 otherwise (note that the state if the computer is unplugged has to be explicitly modelled, and its not trivial to build bottom-up the concept of mathematical function ‘disappearing’, may actually be impossible).
Worst of all you build stuff from the dubious concepts like that monolithic “intelligence”.
The reason for that is that the AIs that are worrying are those of human-like levels of ability. And humans have shown skill in becoming intelligent in many different domains and the ability to build machine intelligence in the domains we have little skill in. So whatever its design, a AGI (artificial general intelligence) will have a broad pallet of abilities, and probably the ability to acquire others—hence the details of its design are less important than meta considerations. This is not the case for non-AGI AIs.
Or the orthogonality thesis. You think—are the goals of that monolithic intelligence arbitrary?
I think “how to convince philosophers that high intelligence will not automatically imply certain goals”—ie that they are being incorrectly meta.
Meanwhile if you try to build bottom up or at least from the concepts with known bottom up definitions, well...
Moor’s law is a better way of predicting the future than knowing the exact details of the current research into microprocessors. Since we don’t have any idea how the first AGI will be built (assuming it can be built), why bother focusing down on the current details when we’re pretty certain they won’t be relevant?
The reason for that is that the AIs that are worrying are those of human-like levels of ability.
The AIs that are worrying have to beat the (potentially much simpler) partial AIs, that can be of autistic savant—like levels of ability and beyond, in the fields very relevant to being actually powerful. You can’t focus just on the human-level AGIs when you consider the risks. The AGIs have to be able to technologically outperform contemporary human civilization to significant extent. Which would not happen if both AGIs and humans are substantially bottlenecked on running essentially same highly optimized (possibly self optimized) non general purpose algorithms to solve domain specific problems.
hence the details of its design are less important than meta considerations.
The meta considerations in question look identical to the least effective branches of philosophy.
I think “how to convince philosophers that high intelligence will not automatically imply certain goals”—ie that they are being incorrectly meta.
I think so too, albeit in different way: I do not think that high intelligence will automatically imply that the goals are within the class of “goals which we have no clue how to define mathematically but which are really easy to imagine”.
Since we don’t have any idea how the first AGI will be built (assuming it can be built),
why bother focusing down on the current details when we’re pretty certain they won’t be relevant?
I have trouble parsing logical structure of this argument. Since we don’t have any idea how the first AGI will be built, wouldn’t make reasoning that employs faulty concepts relevant. Furthermore, being certain that something is irrelevant without having studied it, is a very Dunning-Kruger prone form of thought.
Furthermore, I can’t see how in the world can you be certain that it is irrelevant that (for example) the AI has to work in peer to peer topology with very substantial lag, efficiently (i.e. no needless simulation of other nodes of itself, significant ignorance of content of other nodes, the local nodes lacking sight of the global picture, etc), when it comes to how it will interact with the other intelligences. We truly do not know that the hyper-morality won’t fall out of this as a technological solution, considering that our own morality was produced as the solution for cooperation.
Also, I can’t see how it can be irrelevant that (a good guess) AGI is ultimately a mathematical function that calculates outputs from inputs using elementary operations, and the particular instance of the AGI is a machine that’s computing this function. That’s a meta consideration built from ground up rather than from concept of monolithic ‘intelligence’. The ‘symbol grounding’ may be a logical impossibility (and the feeling that symbols are grounded may well be a delusion that works via fallacies); in any case we don’t see how it can be solved. Like free will; a lot of people feel very sure that they have something in their mind, that’s clearly not compatible with reductionism. Well, I think there can be a lot of other things that we feel very sure we have, which are not compatible with reductionism in less obvious ways.
edit: to summarize, my opinion is that everything is far, far, far too speculative to warrant investigation. It’s like trying to prevent Hindenburg disaster, bombing of Dresden, atomic bombing of Hiroshima and Nagasaki, and the risk of nuclear war, by thinking of the flying carpet as the flying vehicle (because bird-morphizing is not cool).
AI has to work in peer to peer topology with very substantial lag, efficiently (i.e. no needless simulation of other nodes of itself, significant ignorance of content of other nodes, the local nodes lacking sight of the global picture, etc), when it comes to how it will interact with the other intelligences. We truly do not know that the hyper-morality won’t fall out of this as a technological solution, considering that our own morality was produced as the solution for cooperation.
Yes. The SIAI world view doesn’t seem to pay much particular attention to how morality necessarily evolved as the cooperative glue necessary for the social super-organism meta-transition.
edit: to summarize, my opinion is that everything is far, far, far too speculative to warrant investigation.
Well, my opinion is that this is far too dangerous (as compared with other risks to humanity) to not investigate it. Philosophical tools are weak, but they’ve yet to prove weak enough that we should shelve the ongoing project.
It seems to me that you are a: grossly over estimating the productivity of symbolic manipulation on a significant number of symbols with highly speculative meanings, and b: there is the issue that you do not seem to dedicate due effort to investigating existing software or trying to verify relevance of the symbols or improve it. The symbolic manipulation is only as relevant as the symbols being manipulated.
If you are ignorant about the nature of superintelligence, then you don’t know whether or not it entails certain goals.
Ignorance does not allow you to hold confidence in the proposition that “high intelligence will not automatically imply certain goals”.
Adopting this argument from ignorance puts you in the unfortunate position of being like the uninformed layman attempting to convince particle physicists of the grave dangers of supercolliders destroying the earth.
For in fact there is knowledge to be had about intelligence and the nature of future AI, and recognized experts in the field (Norvig, Kurzweil, Hawkins etc) are not dismissing the SIA position out of ignorance.
EDIT: I meant to say “Yes more or less, but the closer mapping of that analogy is . ..”
No, but the closer mapping of that analogy is:
does it make sense to be confident that the “roll” of an unknown object will be 12 when you don’t even know that it’s a die?
To answer your question as I understand it: it does not make sense for me to be confident that the result of some unspecified operation performed on some unknown object will be 12. It does make sense to be confident that it won’t be 12. (I might, of course, be confident and wrong. It’s just unlikely.)
I consider the latter a more apposite analogy for the argument you challenge here. Being confident that an unspecified process (e.g., AGI) won’t value paperclips makes more sense than being confident that it will, in the same way that being confident that it won’t return “12” makes more sense than being confident than it will.
Perhaps we are using a different notion of ‘confidence’. The uses of that term that I am familiar with have a separate specific meaning apart from probability. Confidence to me implies meta-level knowledge about the potential error in one’s probability estimate, intervals, or something of that nature.
So in your analogy, I can’t be confident about any properties of an unspecified process. I can of course assign a low probability estimate to the proposition that this unspecified process won’t value paperclips purely from priors, but that will not be a high confidence estimate. The ‘process’ could be a paperclip factory for all I know.
If you then map this analogy back to the original SIAI frame, I assume that the ‘die’ maps loosely to AGI development, and the 12 is AGI’s values being human values. And then no, it does not make sense to be confident that the roll won’t be 12, given that we supposedly don’t know what kind of die it is. It very well could be a ‘die’ with only 12s.
Priors are really just evidence you have already accumulated, so in reality one is never in a state of complete ignorance.
For example, I know that AGI will be created by humans (high confidence), humans create things which they value or things which help fulfill their values, and that humans for reasons both anthropocentric and economic are more likely to thus create AGI that shares human values.
I don’t think it’s useful to talk about whether we can have confidence in statements about the outcome of an AGI process while we still disagree about whether we can have confidence in statements about the outcome of rolling a hundred-sided die.
So, OK. Given two statements, P1 (“my next roll of this hundred-sided die will not be 12”) and P2 (“my next roll of this hundred-sided die will be 12″), I consider it sensible to be confident of P1 but not P2, you don’t consider it sensible to be confident of either statement. This may be because of different uses of the term “confident”, or it might be something more substantive.
Would you agree that there’s a 99% chance of P1 being true, and a 99% chance of P2 being false, given a fair die toss?
If so, can you say more about the class of statements like P1, where I estimate a 99% chance of it being true but it’s inappropriate for me to be confident in it?
while we still disagree about whether we can have confidence in statements about the outcome of rolling a hundred-sided die.
Ok. I’ll attempt to illustrate confidence vs probability as I understand it.
Lets start with your example. Starting with the certain knowledge that there is an object which is a 100-sided die, you are correct to infer that P(roll(D) != 12 | D=100) = 99⁄100.
Further, you are correct (in this example) to have complete confidence in that estimate.
We can think of confidence as how closely one’s probability estimate approaches the true frequency if we iterated the experiment to infinity, or alternatively summed across the multiverse.
If we roll that die an infinite number of times, (or summed across the multiverse), the observed frequency of (roll(D) != 12 | D=100) is more or less guaranteed to converge to the probability estimate of 99%. This is thus a high confidence estimate.
But this high confidence is conditional on your knowledge (and really, your confidence in this knowledge) that there is a die, and the die has 100 sides, and the die is fair, and so on.
Now if you remove all this knowledge, the situation changes dramatically.
Imagine that you know only that there is a die, but not how many sides the die has. You could still make some sort of an estimate. You could guesstimate using your brain’s internal heuristics, which wouldn’t be so terrible, or you could research dice and make a more informed prior about the unknown number of sides.
From that you might make an informed estimate of 98.7% for P(roll(D) != 12 | D = ???), but this will be a low confidence estimate. In fact once we roll this unknown die a large number of times, we can be fairly certain that the observed frequency will not converge to 98.7%.
So that is the difference between probability and confidence, at least in intuitive english. There are several more concrete algorithmic schemes for dealing with confidence or epistemic uncertainty, but that’s the general idea. (We could even take it up a whole new meta level by considering probability distributions of probability functions (one for each possible die type), and this would be a more accurate model, but it is of course no more confident)
So, if I understand you correctly, and returning to my original question… given the statement “my next roll of this hundred-sided die will not be 12” (P1), and a bunch of background knowledge (K1) about how hundred-sided dice typically work, and a bunch of background knowledge (K2) relevant to how likely it is that my next roll of this hundred-dided die will be typical (for example, how likely this die is to be loaded), I could in principle be confident in P1.
However, since K2 is not complete, I cannot in practice be confident in P1.
The best I can do is make an informed estimate of the likelihood of P1, but this will be a low confidence estimate.
Have I correctly generalized your reasoning and applied it to the case I asked about?
Have I correctly generalized your reasoning and applied it to the case I asked about?
Yeah, kind of.
However, your P1 statement already implies the most important parts of K1 and K2; as just by inserting the adjective “hundred-sided” into P1 loads it with this knowledge. Beyond that the K1 and K2 stuff is cumbersome background detail that most human brains will have (but of course also necessary for understanding ‘dice’).
By including “hundred-sided’ in the analogy, you are importing a ton of implicit confidence in the true probability distribution in question. Your ‘analogy’ assumes you already know the answer with complete confidence.
That analogy would map to an argument (for AI risk) written out in labyrinthine explicit well grounded detail, probably to the point of encoding complete tested/proven working copies of the entire range of future AGI designs.
In other words, your probability estimate in the dice analogy is only high confidence because of the confidence in understanding how dice work, and that the object in question actually is a hundred sided die.
We don’t have AGI yet, so we can’t understand them in the complete engineering sense that we understand dice. Moreover, stuart above claimed we don’t even understand how AGI will be built.
However, your P1 statement already implies the most important parts of K1 and K2; as just by inserting the adjective “hundred-sided” into P1 loads it with this knowledge.
I disagree.
For example, I can confirm that something is a hundred-sided die by the expedient of counting its sides. But if a known conman bets me $1000 that P1 is false, I will want to do more than count the sides of the die before I take that bet. (For example, I will want to roll it a few times to ensure it’s not loaded.) That suggests that there are important facts in K2 other than the definition of a hundred-sided die (e.g., whether the die is fair, whether the speaker is a known conman, etc.) that factor into my judgment of P1.
In other words, your probability estimate in the dice analogy is only high confidence because of the confidence in understanding how dice work, and that the object in question actually is a hundred sided die.
And a bunch of other things, as above. Which is why I mentioned K1 and K2 in the first place.
...your probability estimate in the dice analogy is only high confidence …
Wait, what?
First, nowhere in here have I made a probability estimate. I’ve made a prediction about what will happen on the next roll of this die. You are inferring that i made that prediction on the basis of a probability estimate, and you admittedly have good reasons to infer that.
Second… are you now saying that I can be confident in P1? Because when I asked you that in the first place you answered no. I suspect I’ve misunderstood you somewhere.
First, nowhere in here have I made a probability estimate. I’ve made a prediction about what will happen on the next roll of this die. You are inferring that i made that prediction on the basis of a probability estimate, and you admittedly have good reasons to infer that.
Yes. I have explained (in some amount of detail) what I mean by confidence, such that it is distinct from probability, as relates to prediction.
And yes, as a human you are in fact constrained (in practice) to making predictions based on internal probability estimates (based on my understanding of neuroscience).
Confidence, like probability, is not binary.
You can have fairly high confidence in the implied probability of P1 given K1 and K2, and likewise little confidence in a probability estimate of P1 in the case of a die of unknown dimension—this should be straightforward.
Second… are you now saying that I can be confident in P1? Because when I asked you that in the first place you answered no. I suspect I’ve misunderstood you somewhere.
Yes, and the mistake is on my part: wow that first comment was a partial brainfart. I was agreeing with you, and meant to say yes but … I’ll edit in a comment to that effect.
There’s the less cheerful possibility: the risks are real but your work is totally irrelevant to their reduction.
That would indeed be less cheerful. (but still useful to know)
Imagine pondering creation of artificial life-force from combinations of mechanical parts; that sounds incredibly dangerous, and like a worthwhile area of study. One could spend a lot of time thinking in terms of life-force—how do we ensure that the life-force goo wont eat everything in it’s path? Should we stop research into steam locomotive to avoid such scenario?
Would you want to know if you are thinking in terms of irrelevant abstraction? We humans have capability of abstract thought; we love abstract thinking; some concepts are just abstraction porn though, only useful for tickling our ‘grand insights feel good’ thing.
If people had reasoned that way in the 18th century, they would have correctly predicted the risks of nanotech and maybe biotech. So I guess you should conclude that unfriendly AI risk is real, though far in the future… Anyway, how do you tell which concepts are “abstraction porn” and which aren’t?
How useful that would have been, though? I don’t think you can have a single useful insight about making safe nanotech or biotech from thinking in terms of abstract ‘life force’. edit: also one can end up predicting a lot of invalid stuff this way, like zombies...
Concepts that are not built bottom up are usually abstraction porn.
This monolithic “intelligence” concept where you can have grand-feeling insights without having to be concerned with any hard details like algorithmic complexities, existing problem solving algorithms, the different aspects of intelligence such as problem solving, world model, sensory processing, without being consideration that the intelligence has to work in decentralized manner due to speed of light lag (and the parts of it have to implement some sort of efficient protocol for cooperation… mankind got such protocol, we call it morality), etc, is as suspicious as a concept can be. Ditto the ‘utility’ as per LW (shouldn’t be confused with utility as in a mathematical function inside some of current software).
edit: Actually, do you guys even use any concept built from bottom up to think of AI ?
How useful would knowing the AI would be using, say, A* search, as opposed to meta reasonings on what it is likely to be searching for? We know both from computer science and our own minds that effective heuristics exist to approximately solve most problems. The precise bottom up knowledge you refer to is akin to knowing that the travelling salesman problem can only be solved (assuming NP not P) in exponential time; the meta-knowledge “good polymomial time heuristics exist for most problems” is much more useful to predicting the future of AI.
The issue is not merely that you don’t have ground up definitions which respect the time constraints. The issue is that you don’t seem to have any ground-up definitions at all, i.e. not even for something like AIXI. The goals themselves lack any bottom up definitions.
Worst of all you build stuff from the dubious concepts like that monolithic “intelligence”.
Say, we want to make better microchips. We the engineers have to build it from bottom up, so we make some partially implemented intelligence to achieve such a goal, omitting the definition of what exactly is ‘best microchip’, omitting the real world goals, focussing the search (heuristics are about where you search!) and instead making it design smaller logical gates and then route up the chip, and perhaps figure out manufacturing. All doable with same methods, all to the point, strongly superhuman performance on subhuman hardware.
You build Oracle AI out of that monolithic “intelligence” concept, and tell it—I want a better microchip. This monolithic intelligence figures out how to take over the world to do so. You think, how do we prevent this monolithic intelligence concept from thinking about taking over the world? That looks like an incredibly difficult problem.
Or the orthogonality thesis. You think—are the goals of that monolithic intelligence arbitrary?
Meanwhile if you try to build bottom up or at least from the concepts with known bottom up definitions, well, something like number of paperclips in the universe is clearly more difficult than f(x,n) where x is the output on the screen at the step n and f is 1 if the operator responds with the reward button, 0 otherwise (note that the state if the computer is unplugged has to be explicitly modelled, and its not trivial to build bottom-up the concept of mathematical function ‘disappearing’, may actually be impossible).
The reason for that is that the AIs that are worrying are those of human-like levels of ability. And humans have shown skill in becoming intelligent in many different domains and the ability to build machine intelligence in the domains we have little skill in. So whatever its design, a AGI (artificial general intelligence) will have a broad pallet of abilities, and probably the ability to acquire others—hence the details of its design are less important than meta considerations. This is not the case for non-AGI AIs.
I think “how to convince philosophers that high intelligence will not automatically imply certain goals”—ie that they are being incorrectly meta.
Moor’s law is a better way of predicting the future than knowing the exact details of the current research into microprocessors. Since we don’t have any idea how the first AGI will be built (assuming it can be built), why bother focusing down on the current details when we’re pretty certain they won’t be relevant?
The AIs that are worrying have to beat the (potentially much simpler) partial AIs, that can be of autistic savant—like levels of ability and beyond, in the fields very relevant to being actually powerful. You can’t focus just on the human-level AGIs when you consider the risks. The AGIs have to be able to technologically outperform contemporary human civilization to significant extent. Which would not happen if both AGIs and humans are substantially bottlenecked on running essentially same highly optimized (possibly self optimized) non general purpose algorithms to solve domain specific problems.
The meta considerations in question look identical to the least effective branches of philosophy.
I think so too, albeit in different way: I do not think that high intelligence will automatically imply that the goals are within the class of “goals which we have no clue how to define mathematically but which are really easy to imagine”.
I have trouble parsing logical structure of this argument. Since we don’t have any idea how the first AGI will be built, wouldn’t make reasoning that employs faulty concepts relevant. Furthermore, being certain that something is irrelevant without having studied it, is a very Dunning-Kruger prone form of thought.
Furthermore, I can’t see how in the world can you be certain that it is irrelevant that (for example) the AI has to work in peer to peer topology with very substantial lag, efficiently (i.e. no needless simulation of other nodes of itself, significant ignorance of content of other nodes, the local nodes lacking sight of the global picture, etc), when it comes to how it will interact with the other intelligences. We truly do not know that the hyper-morality won’t fall out of this as a technological solution, considering that our own morality was produced as the solution for cooperation.
Also, I can’t see how it can be irrelevant that (a good guess) AGI is ultimately a mathematical function that calculates outputs from inputs using elementary operations, and the particular instance of the AGI is a machine that’s computing this function. That’s a meta consideration built from ground up rather than from concept of monolithic ‘intelligence’. The ‘symbol grounding’ may be a logical impossibility (and the feeling that symbols are grounded may well be a delusion that works via fallacies); in any case we don’t see how it can be solved. Like free will; a lot of people feel very sure that they have something in their mind, that’s clearly not compatible with reductionism. Well, I think there can be a lot of other things that we feel very sure we have, which are not compatible with reductionism in less obvious ways.
edit: to summarize, my opinion is that everything is far, far, far too speculative to warrant investigation. It’s like trying to prevent Hindenburg disaster, bombing of Dresden, atomic bombing of Hiroshima and Nagasaki, and the risk of nuclear war, by thinking of the flying carpet as the flying vehicle (because bird-morphizing is not cool).
Yes. The SIAI world view doesn’t seem to pay much particular attention to how morality necessarily evolved as the cooperative glue necessary for the social super-organism meta-transition.
Well, my opinion is that this is far too dangerous (as compared with other risks to humanity) to not investigate it. Philosophical tools are weak, but they’ve yet to prove weak enough that we should shelve the ongoing project.
It seems to me that you are a: grossly over estimating the productivity of symbolic manipulation on a significant number of symbols with highly speculative meanings, and b: there is the issue that you do not seem to dedicate due effort to investigating existing software or trying to verify relevance of the symbols or improve it. The symbolic manipulation is only as relevant as the symbols being manipulated.
Contradicts:
If you don’t have any idea how AGI will be built, how can you be so confident about the distribution of its goals?
Ignorance widens the space of possible outcomes, it doesn’t narrow it.
i.e. it makes no sense to make arguments like “we know nothing about the mind of god, but he doesn’t like gay sex”
If you are ignorant about the nature of superintelligence, then you don’t know whether or not it entails certain goals.
Ignorance does not allow you to hold confidence in the proposition that “high intelligence will not automatically imply certain goals”.
Adopting this argument from ignorance puts you in the unfortunate position of being like the uninformed layman attempting to convince particle physicists of the grave dangers of supercolliders destroying the earth.
For in fact there is knowledge to be had about intelligence and the nature of future AI, and recognized experts in the field (Norvig, Kurzweil, Hawkins etc) are not dismissing the SIA position out of ignorance.
Just to make sure I understand you: does it make sense to be confident that the roll of a hundred-sided die won’t be 12?
EDIT: I meant to say “Yes more or less, but the closer mapping of that analogy is . ..”
No, but the closer mapping of that analogy is: does it make sense to be confident that the “roll” of an unknown object will be 12 when you don’t even know that it’s a die?
OK, I think I understand you now. Thanks.
To answer your question as I understand it: it does not make sense for me to be confident that the result of some unspecified operation performed on some unknown object will be 12.
It does make sense to be confident that it won’t be 12. (I might, of course, be confident and wrong. It’s just unlikely.)
I consider the latter a more apposite analogy for the argument you challenge here. Being confident that an unspecified process (e.g., AGI) won’t value paperclips makes more sense than being confident that it will, in the same way that being confident that it won’t return “12” makes more sense than being confident than it will.
Perhaps we are using a different notion of ‘confidence’. The uses of that term that I am familiar with have a separate specific meaning apart from probability. Confidence to me implies meta-level knowledge about the potential error in one’s probability estimate, intervals, or something of that nature.
So in your analogy, I can’t be confident about any properties of an unspecified process. I can of course assign a low probability estimate to the proposition that this unspecified process won’t value paperclips purely from priors, but that will not be a high confidence estimate. The ‘process’ could be a paperclip factory for all I know.
If you then map this analogy back to the original SIAI frame, I assume that the ‘die’ maps loosely to AGI development, and the 12 is AGI’s values being human values. And then no, it does not make sense to be confident that the roll won’t be 12, given that we supposedly don’t know what kind of die it is. It very well could be a ‘die’ with only 12s.
Priors are really just evidence you have already accumulated, so in reality one is never in a state of complete ignorance.
For example, I know that AGI will be created by humans (high confidence), humans create things which they value or things which help fulfill their values, and that humans for reasons both anthropocentric and economic are more likely to thus create AGI that shares human values.
I don’t think it’s useful to talk about whether we can have confidence in statements about the outcome of an AGI process while we still disagree about whether we can have confidence in statements about the outcome of rolling a hundred-sided die.
So, OK.
Given two statements, P1 (“my next roll of this hundred-sided die will not be 12”) and P2 (“my next roll of this hundred-sided die will be 12″), I consider it sensible to be confident of P1 but not P2, you don’t consider it sensible to be confident of either statement. This may be because of different uses of the term “confident”, or it might be something more substantive.
Would you agree that there’s a 99% chance of P1 being true, and a 99% chance of P2 being false, given a fair die toss?
If so, can you say more about the class of statements like P1, where I estimate a 99% chance of it being true but it’s inappropriate for me to be confident in it?
Ok. I’ll attempt to illustrate confidence vs probability as I understand it.
Lets start with your example. Starting with the certain knowledge that there is an object which is a 100-sided die, you are correct to infer that P(roll(D) != 12 | D=100) = 99⁄100.
Further, you are correct (in this example) to have complete confidence in that estimate.
We can think of confidence as how closely one’s probability estimate approaches the true frequency if we iterated the experiment to infinity, or alternatively summed across the multiverse.
If we roll that die an infinite number of times, (or summed across the multiverse), the observed frequency of (roll(D) != 12 | D=100) is more or less guaranteed to converge to the probability estimate of 99%. This is thus a high confidence estimate.
But this high confidence is conditional on your knowledge (and really, your confidence in this knowledge) that there is a die, and the die has 100 sides, and the die is fair, and so on.
Now if you remove all this knowledge, the situation changes dramatically.
Imagine that you know only that there is a die, but not how many sides the die has. You could still make some sort of an estimate. You could guesstimate using your brain’s internal heuristics, which wouldn’t be so terrible, or you could research dice and make a more informed prior about the unknown number of sides.
From that you might make an informed estimate of 98.7% for P(roll(D) != 12 | D = ???), but this will be a low confidence estimate. In fact once we roll this unknown die a large number of times, we can be fairly certain that the observed frequency will not converge to 98.7%.
So that is the difference between probability and confidence, at least in intuitive english. There are several more concrete algorithmic schemes for dealing with confidence or epistemic uncertainty, but that’s the general idea. (We could even take it up a whole new meta level by considering probability distributions of probability functions (one for each possible die type), and this would be a more accurate model, but it is of course no more confident)
OK.
So, if I understand you correctly, and returning to my original question… given the statement “my next roll of this hundred-sided die will not be 12” (P1), and a bunch of background knowledge (K1) about how hundred-sided dice typically work, and a bunch of background knowledge (K2) relevant to how likely it is that my next roll of this hundred-dided die will be typical (for example, how likely this die is to be loaded), I could in principle be confident in P1.
However, since K2 is not complete, I cannot in practice be confident in P1.
The best I can do is make an informed estimate of the likelihood of P1, but this will be a low confidence estimate.
Have I correctly generalized your reasoning and applied it to the case I asked about?
Yeah, kind of.
However, your P1 statement already implies the most important parts of K1 and K2; as just by inserting the adjective “hundred-sided” into P1 loads it with this knowledge. Beyond that the K1 and K2 stuff is cumbersome background detail that most human brains will have (but of course also necessary for understanding ‘dice’).
By including “hundred-sided’ in the analogy, you are importing a ton of implicit confidence in the true probability distribution in question. Your ‘analogy’ assumes you already know the answer with complete confidence.
That analogy would map to an argument (for AI risk) written out in labyrinthine explicit well grounded detail, probably to the point of encoding complete tested/proven working copies of the entire range of future AGI designs.
In other words, your probability estimate in the dice analogy is only high confidence because of the confidence in understanding how dice work, and that the object in question actually is a hundred sided die.
We don’t have AGI yet, so we can’t understand them in the complete engineering sense that we understand dice. Moreover, stuart above claimed we don’t even understand how AGI will be built.
I disagree.
For example, I can confirm that something is a hundred-sided die by the expedient of counting its sides.
But if a known conman bets me $1000 that P1 is false, I will want to do more than count the sides of the die before I take that bet. (For example, I will want to roll it a few times to ensure it’s not loaded.)
That suggests that there are important facts in K2 other than the definition of a hundred-sided die (e.g., whether the die is fair, whether the speaker is a known conman, etc.) that factor into my judgment of P1.
And a bunch of other things, as above. Which is why I mentioned K1 and K2 in the first place.
Wait, what?
First, nowhere in here have I made a probability estimate. I’ve made a prediction about what will happen on the next roll of this die. You are inferring that i made that prediction on the basis of a probability estimate, and you admittedly have good reasons to infer that.
Second… are you now saying that I can be confident in P1? Because when I asked you that in the first place you answered no. I suspect I’ve misunderstood you somewhere.
Yes. I have explained (in some amount of detail) what I mean by confidence, such that it is distinct from probability, as relates to prediction.
And yes, as a human you are in fact constrained (in practice) to making predictions based on internal probability estimates (based on my understanding of neuroscience).
Confidence, like probability, is not binary.
You can have fairly high confidence in the implied probability of P1 given K1 and K2, and likewise little confidence in a probability estimate of P1 in the case of a die of unknown dimension—this should be straightforward.
Yes, and the mistake is on my part: wow that first comment was a partial brainfart. I was agreeing with you, and meant to say yes but … I’ll edit in a comment to that effect.
Ah!
I feel much better now.
I should go through this discussion again and re-evaluate what I think you’re saying based on that clarification before I try to reply .