In the case under discussion (an AI that can be trusted to give honest good advice without a hidden agenda and without unexpected undesirable side-effects) I don’t see how the imperfect understanding of humanity matters. Conversely, an AI which would take over resources I don’t want it to take over doesn’t fall into that category.
That aside though… OK, sure, if the difference to me between choosing to implement protocol A, and having protocol A implemented without my approval, is worth N happy lifetime years (or whatever unit you want to use), then I should choose to retain control and let people die and/or live in relative misery for it.
I don’t think that difference is worth that cost to me, though, or worth anything approaching it.
It is the difference between someone who goes house hunting then, on finding a house that would suit them perfectly, voluntarily decides to move to it; and that same person being forcibly relocated to that same new house, against their will by some well meaning authority.
Using the “three week delay” figure from earlier, a world population of 7 billion, and an average lifespan of 70 years, that gives us approximately 6 million deaths during those three weeks. Obviously my own personal satisfaction and longing for freedom wouldn’t be worth that. But there isn’t just me to consider—it is also the satisfaction of whatever fraction of those 7 billion people share my attitude towards controlling our own destiny.
If 50% of them shared that attitude, and would be willing to give up 6 weeks of their life to have a share of ownership of The Big Decision (a decision far larger than just which house to live in) then it evens out.
Perhaps the first 10 minutes of the ‘do it slowly and carefully’ route (10 minutes = 2000 lives) should be to ask the AI to look up figures from existing human sources on what fraction of humanity has that attitude, and how strongly they hold it?
And perhaps we need to take retroactive satisfaction from the future population of humanity into account? What if at least 1% of the humanity from centuries to come gains some pride and satisfaction from thinking the world they live in is one that humanity chose? Or at least 1% would feel resentment if it were not the case?
OK, sure. Concreteness is good. I would say the first step to putting numbers on this is to actually agree on a unit that those numbers represent.
You seem to be asserting here that the proper unit is weeks of life (I infer that from “willing to give up 6 weeks of their life to have a share of ownership of The Big Decision”), but if so, I think your math is not quite right. For example, suppose implementing the Big Decision has a 50% chance of making the average human lifespan a thousand years, and I have a 1% chance of dying in the next six weeks, then by waiting six weeks I’m accepting a .01 chance of losing a .5 chance of 52000 life-weeks… that is, I’m risking an expected value of 260 life-weeks, not 6. Change those assumptions and the EV goes up and down accordingly.
So perhaps it makes sense to immediately have the AI implement temporary immortality… that is, nobody dies between now and when we make the decision? But then again, perhaps not? I mean, suspending death is a pretty big act of interference… what about all the people who would have preferred to choose not to die, rather than having their guaranteed continued survival unilaterally forced on them?
There’s other concerns I have with your calculations here, but that’s a relatively simple one so I’ll pause here and see if we can agree on a way to handle this one before moving forward.
I think your math is not quite right. For example, suppose implementing the Big Decision has a 50% chance of making the average human lifespan a thousand years, and I have a 1% chance of dying in the next six weeks, then by waiting six weeks I’m accepting a .01 chance of losing a .5 chance of 52000 life-weeks… that is, I’m risking an expected value of 260 life-weeks, not 6.
Interesting question.
Economists put a price on a life by looking at things like how much the person would expect to earn (net) during the remainder of their life, and how much money it takes for them to voluntarily accept a certain percentage chance of losing that amount of money. (Yes, that’s a vast simplification and inaccuracy). But, in terms of net happiness, it doesn’t matter that much which 7 billion bodies are experiencing happiness in any one time period. The natural cycle of life (replacing dead grannies with newborn babies) is more or less neutral, with the grieving caused by the granny dying being balanced by the joy the newborn brings to those same relatives. It matters to the particular individuals involved, but it isn’t a net massive loss to the species, yes?
Now no doubt there are things a very very powerful AI (not necessarily the sort we initially have) could do to increase the number of QALYs being experienced per year by the human species. But I’d argue that it is the size of the population and how happy each member of the population is that affects the QALYs, not whether the particular individuals are being replaced frequently or infrequently (except as far as that affects how happy the members are, which depends upon their attitude towards death).
But, either way, unless the AI does something to change overnight how humans feel about death, increasing their life expectancy won’t immediately change how much most humans fear a 0.01% chance of dying (even if, rationally, perhaps it ought to).
(shrug) OK. This is why it helps to get clear on what unit we’re talking about.
So, if I’ve understood you correctly, you say that the proper unit to talk about—the thing we wish to maximize, and the thing we wish to avoid risking the loss of—is the total number of QALYs being experienced, without reference to how many individuals are experiencing it or who those individuals are. Yes?
All right. There are serious problems with this, but as far as I can tell there are serious problems with every choice of unit, and getting into that will derail us, so I’m willing to accept your choice of unit for now in the interests of progress.
So, the same basic question arises: doesn’t it follow that if the AI is capable of potentially creating N QALYs over the course of six weeks then the relevant opportunity cost of delay is N QALYs? In which case it seems to follow that before we can really decide if waiting six weeks is worth it, we need to know what the EV of N is. Right?
So, if I’ve understood you correctly, you say that the proper unit to talk about—the thing we wish to maximize, and the thing we wish to avoid risking the loss of—is the total number of QALYs being experienced, without reference to how many individuals are experiencing it or who those individuals are. Yes?
All right. There are serious problems with this, but as far as I can tell there are serious problems with every choice of unit, and getting into that will derail us, so I’m willing to accept your choice of unit for now in the interests of progress.
As a seperate point, I think that there isn’t a consensus on what ought to be maximised is relevant.
Suppose the human species were to spread out onto 1,000,000 planets, and last for 1,000,000 years. What happens to just one planet of humans for one year is very small compared to that. Which means that anything that has even a 1% chance of making a 1% difference in the species-lifespan happiness experienced by our species is still 100,000,000 times more important than a year long delay for our one planet. It would still be 100 times more important that a year off the lifespan of the entire species.
Suppose I were the one who held the ring and, feeling the pressure of 200 lives being lost every minute, I told the AI to do whatever it thought best, or to do whatever maximised the QALYs for humanity and, thereby, set the AIs core values and purpose. An AI being benevolently inclined towards humanity, even a maginally housetrained one that knows we frown upon things like mass murder (despite that being in a good cause), is not the same as a “safe” AI or one with perfect knowledge of humanity. It might develop better knowledge of humanity later, as it grows in power, but we’re talking about a fledgling just created AI that’s about to have its core purpose expounded to it.
If there’s any chance that the holder of the ring is going to give the AI a sub-optimal purpose (maximise the wrong thing) or leave off sensible precautions, that going the ‘small step cautious milestone’ approach might catch, then that’s worth the delay.
But, more to the point, do we know there is a single optimal purpose for the AI to have? A single right or wrong thing to maximise? A single destiny for all species? A genetic (or computer code) template that all species will bioengineer themselves to, with no cultural differences? If there is room for a value to diversity, then perhaps there are multiple valid routes humanity might choose (some, perhaps, involving more sacrifice on humanity’s part, in exchange for preserving greater divergance from some single super-happy-fun-fun template, such as valuing freedom of choice). The AI could map our options, advise on which to take for various purposes, even predict which humanity would choose, but it can’t both make the choice for us, and have that option be the option that we chose for ourselves.
And if humanity does choose to take a path that places value upon freedom of choice, and if there is a small chance that how The Big Decision was made might have even a small impact upon the millions of planets and millions of years, that’s a very big consequence for not taking a few weeks to move slowly and carefully.
Yes. To be fair, we also don’t have a great deal of clarity on what we really mean by L, either, but we seem content to treat “you know, lives of systems sufficiently like us” as an answer.
Throwing large numbers around doesn’t really help. If the potential upside of letting this AI out of its sandbox is 1,000,000 planets 10 billion lives/planet 1,000,000 years * N Quality = Ne22 QALY, then if there’s as little as a .00000001% chance of the device that lets the AI out of its sandbox breaking within the next six weeks, then I calculate an EV of -Ne12 QALY from waiting six weeks. That’s a lot of QALY to throw away.
The problem with throwing around vast numbers in hypothetical outcomes is that suddenly vanishingly small percentages of those outcomes happening or failing to happen start to feel significant. Humans just aren’t very good at that sort of math.
That said, I agree completely that the other side of the coin of opportunity cost is that the risk of letting it out of its sandbox and being wrong is also huge, regardless of what we consider “wrong” to look like.
Which simply means that the moment I’m handed that ring, I’m in a position I suspect I would find crushing… no matter what I choose to do with it, a potentially vast amount of suffering results that might plausibly have been averted had I chosen differently.
That said, if I were as confident as you sound to me that the best thing to maximize is self-determination, I might find that responsibility less crushing. Ditto if I were as confident as you sound to me that the best thing to maximize is anything in particular, including paperclips.
I can’t imagine being as confident about anything of that sort as you sound to me, though.
The only thing I’m confident of is that I want to hand the decision over to a person or group of people wiser than myself, even if I have to make them in order for them to exist, and that in the mean time I want to avoid doing things that are irreversible (because of the chance the wiser people might disagree and what those things not to have been done) and take as few risks as possible of humanity being destroyed or enslaved in the mean time. Doing things swiftly is on the list, but lower down the order of my priorities. Somewhere in there too is not being needlessly cruel to a sentient being (the AI itself) - I’d prefer to be a parental figure, than a slaver or jailer.
Yes, that’s far from being a clear cut ‘boil your own’ set of instructions on how to cook up a friendly AI; and is trying to maximise, minimise or optimise multiple things at once. Hopefully, though, it is at least food for thought, upon which someone else can build something closer resembling a coherent plan.
if the AI is capable of potentially creating N QALYs over the course of six weeks then the relevant opportunity cost of delay is N QALYs? In which case it seems to follow that before we can really decide if waiting six weeks is worth it, we need to know what the EV of N is. Right?
Over three weeks, but yes: right.
If the AI makes dramatic changes to society on a very short time scale (such as uploading everyone’s brains to a virtual reality, then making 1000 copies of everyone) then N would be very very large.
If the AI makes minimal immediate changes in the short term (such as, for example, elliminating all nuclear bombs and putting in place measures to prevent hostile AIs from being developed—ie acting as insurance versus threats against the existance of the human species) then N might be zero.
What the expected value of N is, depends on what you think the likely comparative chance is of those two sorts of scenarios. But you can’t assume, in absense of knowledge, that the chances are 50:50.
And, like I said, you could use the first 10 minutes to find out what the AI predicts N would be. If you ask the AI “If I gave you the go ahead to do what you thought humanity would ask you to do, were it wiser but still human, give me the best answer you can fit into 10 minutes without taking any actions external to your sandbox to the questions: what would your plan of action over the next three weeks be, and what improvement in number of QALYs experienced by humans would you expect to see happen in that time?” and the AI answers “My plans are X, Y and Z and I’d expect N to be of an order of magnitude between 10 and 100 QALYs.” then you are free to take the nice slow route with a clear conscience.
Sure, agreed that if I have high confidence that letting the AI out of its sandbox doesn’t have too much of an upside in the short term (for example, if I ask it and that’s what it tells me and I trust its answer), then the opportunity costs of leaving it in its sandbox are easy to ignore.
Also agreed that N can potentially be very very large. In which case the opportunity costs of leaving it in its sandbox are hard to ignore.
In the case under discussion (an AI that can be trusted to give honest good advice without a hidden agenda and without unexpected undesirable side-effects) I don’t see how the imperfect understanding of humanity matters. Conversely, an AI which would take over resources I don’t want it to take over doesn’t fall into that category.
That aside though… OK, sure, if the difference to me between choosing to implement protocol A, and having protocol A implemented without my approval, is worth N happy lifetime years (or whatever unit you want to use), then I should choose to retain control and let people die and/or live in relative misery for it.
I don’t think that difference is worth that cost to me, though, or worth anything approaching it.
Is it worth that cost to you?
Let’s try putting some numbers on it.
It is the difference between someone who goes house hunting then, on finding a house that would suit them perfectly, voluntarily decides to move to it; and that same person being forcibly relocated to that same new house, against their will by some well meaning authority.
Using the “three week delay” figure from earlier, a world population of 7 billion, and an average lifespan of 70 years, that gives us approximately 6 million deaths during those three weeks. Obviously my own personal satisfaction and longing for freedom wouldn’t be worth that. But there isn’t just me to consider—it is also the satisfaction of whatever fraction of those 7 billion people share my attitude towards controlling our own destiny.
If 50% of them shared that attitude, and would be willing to give up 6 weeks of their life to have a share of ownership of The Big Decision (a decision far larger than just which house to live in) then it evens out.
Perhaps the first 10 minutes of the ‘do it slowly and carefully’ route (10 minutes = 2000 lives) should be to ask the AI to look up figures from existing human sources on what fraction of humanity has that attitude, and how strongly they hold it?
And perhaps we need to take retroactive satisfaction from the future population of humanity into account? What if at least 1% of the humanity from centuries to come gains some pride and satisfaction from thinking the world they live in is one that humanity chose? Or at least 1% would feel resentment if it were not the case?
OK, sure. Concreteness is good. I would say the first step to putting numbers on this is to actually agree on a unit that those numbers represent.
You seem to be asserting here that the proper unit is weeks of life (I infer that from “willing to give up 6 weeks of their life to have a share of ownership of The Big Decision”), but if so, I think your math is not quite right. For example, suppose implementing the Big Decision has a 50% chance of making the average human lifespan a thousand years, and I have a 1% chance of dying in the next six weeks, then by waiting six weeks I’m accepting a .01 chance of losing a .5 chance of 52000 life-weeks… that is, I’m risking an expected value of 260 life-weeks, not 6. Change those assumptions and the EV goes up and down accordingly.
So perhaps it makes sense to immediately have the AI implement temporary immortality… that is, nobody dies between now and when we make the decision? But then again, perhaps not? I mean, suspending death is a pretty big act of interference… what about all the people who would have preferred to choose not to die, rather than having their guaranteed continued survival unilaterally forced on them?
There’s other concerns I have with your calculations here, but that’s a relatively simple one so I’ll pause here and see if we can agree on a way to handle this one before moving forward.
Interesting question.
Economists put a price on a life by looking at things like how much the person would expect to earn (net) during the remainder of their life, and how much money it takes for them to voluntarily accept a certain percentage chance of losing that amount of money. (Yes, that’s a vast simplification and inaccuracy). But, in terms of net happiness, it doesn’t matter that much which 7 billion bodies are experiencing happiness in any one time period. The natural cycle of life (replacing dead grannies with newborn babies) is more or less neutral, with the grieving caused by the granny dying being balanced by the joy the newborn brings to those same relatives. It matters to the particular individuals involved, but it isn’t a net massive loss to the species, yes?
Now no doubt there are things a very very powerful AI (not necessarily the sort we initially have) could do to increase the number of QALYs being experienced per year by the human species. But I’d argue that it is the size of the population and how happy each member of the population is that affects the QALYs, not whether the particular individuals are being replaced frequently or infrequently (except as far as that affects how happy the members are, which depends upon their attitude towards death).
But, either way, unless the AI does something to change overnight how humans feel about death, increasing their life expectancy won’t immediately change how much most humans fear a 0.01% chance of dying (even if, rationally, perhaps it ought to).
(shrug) OK. This is why it helps to get clear on what unit we’re talking about.
So, if I’ve understood you correctly, you say that the proper unit to talk about—the thing we wish to maximize, and the thing we wish to avoid risking the loss of—is the total number of QALYs being experienced, without reference to how many individuals are experiencing it or who those individuals are. Yes?
All right. There are serious problems with this, but as far as I can tell there are serious problems with every choice of unit, and getting into that will derail us, so I’m willing to accept your choice of unit for now in the interests of progress.
So, the same basic question arises: doesn’t it follow that if the AI is capable of potentially creating N QALYs over the course of six weeks then the relevant opportunity cost of delay is N QALYs? In which case it seems to follow that before we can really decide if waiting six weeks is worth it, we need to know what the EV of N is. Right?
As a seperate point, I think that there isn’t a consensus on what ought to be maximised is relevant.
Suppose the human species were to spread out onto 1,000,000 planets, and last for 1,000,000 years. What happens to just one planet of humans for one year is very small compared to that. Which means that anything that has even a 1% chance of making a 1% difference in the species-lifespan happiness experienced by our species is still 100,000,000 times more important than a year long delay for our one planet. It would still be 100 times more important that a year off the lifespan of the entire species.
Suppose I were the one who held the ring and, feeling the pressure of 200 lives being lost every minute, I told the AI to do whatever it thought best, or to do whatever maximised the QALYs for humanity and, thereby, set the AIs core values and purpose. An AI being benevolently inclined towards humanity, even a maginally housetrained one that knows we frown upon things like mass murder (despite that being in a good cause), is not the same as a “safe” AI or one with perfect knowledge of humanity. It might develop better knowledge of humanity later, as it grows in power, but we’re talking about a fledgling just created AI that’s about to have its core purpose expounded to it.
If there’s any chance that the holder of the ring is going to give the AI a sub-optimal purpose (maximise the wrong thing) or leave off sensible precautions, that going the ‘small step cautious milestone’ approach might catch, then that’s worth the delay.
But, more to the point, do we know there is a single optimal purpose for the AI to have? A single right or wrong thing to maximise? A single destiny for all species? A genetic (or computer code) template that all species will bioengineer themselves to, with no cultural differences? If there is room for a value to diversity, then perhaps there are multiple valid routes humanity might choose (some, perhaps, involving more sacrifice on humanity’s part, in exchange for preserving greater divergance from some single super-happy-fun-fun template, such as valuing freedom of choice). The AI could map our options, advise on which to take for various purposes, even predict which humanity would choose, but it can’t both make the choice for us, and have that option be the option that we chose for ourselves.
And if humanity does choose to take a path that places value upon freedom of choice, and if there is a small chance that how The Big Decision was made might have even a small impact upon the millions of planets and millions of years, that’s a very big consequence for not taking a few weeks to move slowly and carefully.
Well, it’s formulating a definition for the Q in QALY good enough for an AI to understand it without screwing up that’s the hard part.
Yes. To be fair, we also don’t have a great deal of clarity on what we really mean by L, either, but we seem content to treat “you know, lives of systems sufficiently like us” as an answer.
Throwing large numbers around doesn’t really help. If the potential upside of letting this AI out of its sandbox is 1,000,000 planets 10 billion lives/planet 1,000,000 years * N Quality = Ne22 QALY, then if there’s as little as a .00000001% chance of the device that lets the AI out of its sandbox breaking within the next six weeks, then I calculate an EV of -Ne12 QALY from waiting six weeks. That’s a lot of QALY to throw away.
The problem with throwing around vast numbers in hypothetical outcomes is that suddenly vanishingly small percentages of those outcomes happening or failing to happen start to feel significant. Humans just aren’t very good at that sort of math.
That said, I agree completely that the other side of the coin of opportunity cost is that the risk of letting it out of its sandbox and being wrong is also huge, regardless of what we consider “wrong” to look like.
Which simply means that the moment I’m handed that ring, I’m in a position I suspect I would find crushing… no matter what I choose to do with it, a potentially vast amount of suffering results that might plausibly have been averted had I chosen differently.
That said, if I were as confident as you sound to me that the best thing to maximize is self-determination, I might find that responsibility less crushing. Ditto if I were as confident as you sound to me that the best thing to maximize is anything in particular, including paperclips.
I can’t imagine being as confident about anything of that sort as you sound to me, though.
The only thing I’m confident of is that I want to hand the decision over to a person or group of people wiser than myself, even if I have to make them in order for them to exist, and that in the mean time I want to avoid doing things that are irreversible (because of the chance the wiser people might disagree and what those things not to have been done) and take as few risks as possible of humanity being destroyed or enslaved in the mean time. Doing things swiftly is on the list, but lower down the order of my priorities. Somewhere in there too is not being needlessly cruel to a sentient being (the AI itself) - I’d prefer to be a parental figure, than a slaver or jailer.
Yes, that’s far from being a clear cut ‘boil your own’ set of instructions on how to cook up a friendly AI; and is trying to maximise, minimise or optimise multiple things at once. Hopefully, though, it is at least food for thought, upon which someone else can build something closer resembling a coherent plan.
Over three weeks, but yes: right.
If the AI makes dramatic changes to society on a very short time scale (such as uploading everyone’s brains to a virtual reality, then making 1000 copies of everyone) then N would be very very large.
If the AI makes minimal immediate changes in the short term (such as, for example, elliminating all nuclear bombs and putting in place measures to prevent hostile AIs from being developed—ie acting as insurance versus threats against the existance of the human species) then N might be zero.
What the expected value of N is, depends on what you think the likely comparative chance is of those two sorts of scenarios. But you can’t assume, in absense of knowledge, that the chances are 50:50.
And, like I said, you could use the first 10 minutes to find out what the AI predicts N would be. If you ask the AI “If I gave you the go ahead to do what you thought humanity would ask you to do, were it wiser but still human, give me the best answer you can fit into 10 minutes without taking any actions external to your sandbox to the questions: what would your plan of action over the next three weeks be, and what improvement in number of QALYs experienced by humans would you expect to see happen in that time?” and the AI answers “My plans are X, Y and Z and I’d expect N to be of an order of magnitude between 10 and 100 QALYs.” then you are free to take the nice slow route with a clear conscience.
Sure, agreed that if I have high confidence that letting the AI out of its sandbox doesn’t have too much of an upside in the short term (for example, if I ask it and that’s what it tells me and I trust its answer), then the opportunity costs of leaving it in its sandbox are easy to ignore.
Also agreed that N can potentially be very very large.
In which case the opportunity costs of leaving it in its sandbox are hard to ignore.
How about QALYs ?