Perhaps “Master, you now hold the ring, what do you wish me to turn the universe into?” isn’t a question you have to answer all at once.
Perhaps the right approach is to ask yourself “What is the smallest step I can take that has the lowest risk of not being a strict improvement over the current situation?”
For example, are we less human or compassionate now we have Google available, than we were before that point?
Supposing an AI researcher, a year before the Google search engine was made available on the internet, ended up with ‘the ring’. Suppose the researcher had asked the AI to develop for the researcher’s own private use an internet search engine of the type that existing humans might create with 1000 human hours of work (with suitable restrictions upon the AI on how to do this, including “check with me before implementing any part of your plan that affects anything outside your own sandbox”) and then put itself on hold to await further orders once the engine had been created. If the AI then did create something like Google, without destroying the world, then did put itself fully on hold (not self modifying, doing stuff outside the sandbox, or anything else except waiting for a prompt) - would that researcher then be in a better position to make their next request of the AI? Would that have been a strict improvement on the researcher’s previous position?
Imagine a series of milestones on the path to making a decision about what to do with the universe, and work backwards.
You want a human or group of humans who have their intelligence boosted to make the decision instead of you?
Ok, but you don’t want them to lose their compassion, empathy or humanity in the process. What are the options for boosting and what does the AI list as the pros and cons (likely effects) of each process?
What is the minimum significant boost with the highest safety factor? And what person or people would it make sense to boost that way? AI, what do you advise are my best 10 options on that, with pros and cons?
Still not sure, ok, I need a consensus of few top non-boosted currently AI researcher, wise people, smart people, etc. on the people to be boosted and the boosting process, before I ‘ok’ it. The members of the consensus group should be people who won’t go ape-shit, who can understand the problem, who’re good at discussing things in groups and reaching good decisions, who’ll be willing to cooperate (undeceived and uncoerced) if it is presented to them clearly, and probably a number of other criteria that perhaps you can suggest (like will they politic and demand to be boosted themselves?). Who do you suggest? Options?
Basically, if the AI can be trusted to give honest good advice without hiding an agenda that’s different from your own expressed one, and if it can be trusted to not, in the process of giving that good advice, do anything external that you wouldn’t do (such as slaying a few humans as guinea pigs in the process of determining options for boosting), then that’s the approach I hope the researcher would take: delay the big decisions, in favour of taking cautious minimally risky small steps towards a better capacity to make the big decision correctly.
Perhaps the right approach is to ask yourself “What is the smallest step I can take that has the lowest risk of not being a strict improvement over the current situation?”
You can get away with (in fact, strictly improve the algorithm by) using only the second of the two caution-optimisers there, so: “What is the smallest step I can take that has the lowest risk of not being a strict improvement over the current situation?”
Naturally when answering the question you will probably consider small steps—and in the unlikely even that a large step is safer, so much the better!
Assuming the person making the decision is perfect at estimating risk.
However since the likelihood is that it won’t be me creating the first ever AI, but rather that the person who does is reading this advice, I’d prefer to stipulate that they should go for small steps even if, in their opinion, there is some larger step that’s less risky.
The temptation exists for them to ask, as their first step, “AI of the ring, boost me to god-like wisdom and powers of thought”, but that has a number of drawbacks they may not think of. I’d rather my advice contain redundant precautions, as a safety feature.
“Of the steps of the smallest size that still advances things, which of those steps has the lowest risk?”
Another way to think about it is to take the steps (or give the AI orders) that can be effectively accomplished with the AI boosting itself by the smallest amount. Avoid, initially, making requests that to accomplish the AI will need to massively boost itself; if you can improve your decision making position just through requests that the AI can handle with its current capacity.
Assuming the person making the decision is perfect at estimating risk.
Or merely aware of the same potential weakness that you are. I’d be overwhelmingly uncomfortable with someone developing a super-intelligence without the awareness of their human limitations at risk assessment. (Incidentally ‘perfect’ risk assessment isn’t required. They make the most of whatever risk assessment ability they have either way.)
“Of the steps of the smallest size that still advances things, which of those steps has the lowest risk?”
I consider this a rather inferior solution—particularly in as much as it pretends to be minimizing two things. Since steps will almost inevitably be differentiated by size the assessment of lowest risks barely comes into play. An algorithm that almost never considers risk rather defeats the point.
If you must artificially circumvent the risk assessment algorithm—presumably to counter known biases—then perhaps make the “small steps” a question of satisficing rather than minimization.
Since steps will almost inevitably be differentiated by size the assessment of lowest risks barely comes into play. An algorithm that almost never considers risk rather defeats the point.
If you must artificially circumvent the risk assessment algorithm—presumably to counter known biases—then perhaps make the “small steps” a question of satisficing rather than minimization.
Personally, I suspect that if I had (something I was sufficiently confident was) an AI that can be trusted to give honest good advice without a hidden agenda and without unexpected undesirable side-effects, the opportunity costs of moving that slowly would weigh heavily on my conscience. And if challenged for why I was allowing humanity to bear those costs while I moved slowly, I’m not sure what I would say… it’s not clear what the delays are gaining me in that case.
Conversely, if I had something I was _in_sufficiently confident was trustworthy AI, it’s not clear that the “cautious minimally risky small steps” you describe are actually cautious enough.
it’s not clear what the delays are gaining me in that case.
Freedom.
The difference between the AI making a decision for humanity about what humanity’s ideal future should be, and the AI speeding up humanity’s own rise in good decision making capability to the point where humanity can make that same decision (and even, perhaps, come to the same conclusion the AI would have done, weeks earlier, if told to do the work for us), is that the choice was made by us, not an external force. That, to many people, is worth something (perhaps even worth the deaths that would happen in the weeks that utopia was delayed by).
It is also insurance against an AI that is benevolent, but has imperfect understanding of humanity. (The AI might be able to gain a better understanding of humanity by massively boosting its own capability, but perhaps you don’t want it to take over the internet and all attached computers—perhaps you’d prefer it to remain sitting in the experimental mainframe in some basement of IBM where it currently resides, at least initially)
In the case under discussion (an AI that can be trusted to give honest good advice without a hidden agenda and without unexpected undesirable side-effects) I don’t see how the imperfect understanding of humanity matters. Conversely, an AI which would take over resources I don’t want it to take over doesn’t fall into that category.
That aside though… OK, sure, if the difference to me between choosing to implement protocol A, and having protocol A implemented without my approval, is worth N happy lifetime years (or whatever unit you want to use), then I should choose to retain control and let people die and/or live in relative misery for it.
I don’t think that difference is worth that cost to me, though, or worth anything approaching it.
It is the difference between someone who goes house hunting then, on finding a house that would suit them perfectly, voluntarily decides to move to it; and that same person being forcibly relocated to that same new house, against their will by some well meaning authority.
Using the “three week delay” figure from earlier, a world population of 7 billion, and an average lifespan of 70 years, that gives us approximately 6 million deaths during those three weeks. Obviously my own personal satisfaction and longing for freedom wouldn’t be worth that. But there isn’t just me to consider—it is also the satisfaction of whatever fraction of those 7 billion people share my attitude towards controlling our own destiny.
If 50% of them shared that attitude, and would be willing to give up 6 weeks of their life to have a share of ownership of The Big Decision (a decision far larger than just which house to live in) then it evens out.
Perhaps the first 10 minutes of the ‘do it slowly and carefully’ route (10 minutes = 2000 lives) should be to ask the AI to look up figures from existing human sources on what fraction of humanity has that attitude, and how strongly they hold it?
And perhaps we need to take retroactive satisfaction from the future population of humanity into account? What if at least 1% of the humanity from centuries to come gains some pride and satisfaction from thinking the world they live in is one that humanity chose? Or at least 1% would feel resentment if it were not the case?
OK, sure. Concreteness is good. I would say the first step to putting numbers on this is to actually agree on a unit that those numbers represent.
You seem to be asserting here that the proper unit is weeks of life (I infer that from “willing to give up 6 weeks of their life to have a share of ownership of The Big Decision”), but if so, I think your math is not quite right. For example, suppose implementing the Big Decision has a 50% chance of making the average human lifespan a thousand years, and I have a 1% chance of dying in the next six weeks, then by waiting six weeks I’m accepting a .01 chance of losing a .5 chance of 52000 life-weeks… that is, I’m risking an expected value of 260 life-weeks, not 6. Change those assumptions and the EV goes up and down accordingly.
So perhaps it makes sense to immediately have the AI implement temporary immortality… that is, nobody dies between now and when we make the decision? But then again, perhaps not? I mean, suspending death is a pretty big act of interference… what about all the people who would have preferred to choose not to die, rather than having their guaranteed continued survival unilaterally forced on them?
There’s other concerns I have with your calculations here, but that’s a relatively simple one so I’ll pause here and see if we can agree on a way to handle this one before moving forward.
I think your math is not quite right. For example, suppose implementing the Big Decision has a 50% chance of making the average human lifespan a thousand years, and I have a 1% chance of dying in the next six weeks, then by waiting six weeks I’m accepting a .01 chance of losing a .5 chance of 52000 life-weeks… that is, I’m risking an expected value of 260 life-weeks, not 6.
Interesting question.
Economists put a price on a life by looking at things like how much the person would expect to earn (net) during the remainder of their life, and how much money it takes for them to voluntarily accept a certain percentage chance of losing that amount of money. (Yes, that’s a vast simplification and inaccuracy). But, in terms of net happiness, it doesn’t matter that much which 7 billion bodies are experiencing happiness in any one time period. The natural cycle of life (replacing dead grannies with newborn babies) is more or less neutral, with the grieving caused by the granny dying being balanced by the joy the newborn brings to those same relatives. It matters to the particular individuals involved, but it isn’t a net massive loss to the species, yes?
Now no doubt there are things a very very powerful AI (not necessarily the sort we initially have) could do to increase the number of QALYs being experienced per year by the human species. But I’d argue that it is the size of the population and how happy each member of the population is that affects the QALYs, not whether the particular individuals are being replaced frequently or infrequently (except as far as that affects how happy the members are, which depends upon their attitude towards death).
But, either way, unless the AI does something to change overnight how humans feel about death, increasing their life expectancy won’t immediately change how much most humans fear a 0.01% chance of dying (even if, rationally, perhaps it ought to).
(shrug) OK. This is why it helps to get clear on what unit we’re talking about.
So, if I’ve understood you correctly, you say that the proper unit to talk about—the thing we wish to maximize, and the thing we wish to avoid risking the loss of—is the total number of QALYs being experienced, without reference to how many individuals are experiencing it or who those individuals are. Yes?
All right. There are serious problems with this, but as far as I can tell there are serious problems with every choice of unit, and getting into that will derail us, so I’m willing to accept your choice of unit for now in the interests of progress.
So, the same basic question arises: doesn’t it follow that if the AI is capable of potentially creating N QALYs over the course of six weeks then the relevant opportunity cost of delay is N QALYs? In which case it seems to follow that before we can really decide if waiting six weeks is worth it, we need to know what the EV of N is. Right?
So, if I’ve understood you correctly, you say that the proper unit to talk about—the thing we wish to maximize, and the thing we wish to avoid risking the loss of—is the total number of QALYs being experienced, without reference to how many individuals are experiencing it or who those individuals are. Yes?
All right. There are serious problems with this, but as far as I can tell there are serious problems with every choice of unit, and getting into that will derail us, so I’m willing to accept your choice of unit for now in the interests of progress.
As a seperate point, I think that there isn’t a consensus on what ought to be maximised is relevant.
Suppose the human species were to spread out onto 1,000,000 planets, and last for 1,000,000 years. What happens to just one planet of humans for one year is very small compared to that. Which means that anything that has even a 1% chance of making a 1% difference in the species-lifespan happiness experienced by our species is still 100,000,000 times more important than a year long delay for our one planet. It would still be 100 times more important that a year off the lifespan of the entire species.
Suppose I were the one who held the ring and, feeling the pressure of 200 lives being lost every minute, I told the AI to do whatever it thought best, or to do whatever maximised the QALYs for humanity and, thereby, set the AIs core values and purpose. An AI being benevolently inclined towards humanity, even a maginally housetrained one that knows we frown upon things like mass murder (despite that being in a good cause), is not the same as a “safe” AI or one with perfect knowledge of humanity. It might develop better knowledge of humanity later, as it grows in power, but we’re talking about a fledgling just created AI that’s about to have its core purpose expounded to it.
If there’s any chance that the holder of the ring is going to give the AI a sub-optimal purpose (maximise the wrong thing) or leave off sensible precautions, that going the ‘small step cautious milestone’ approach might catch, then that’s worth the delay.
But, more to the point, do we know there is a single optimal purpose for the AI to have? A single right or wrong thing to maximise? A single destiny for all species? A genetic (or computer code) template that all species will bioengineer themselves to, with no cultural differences? If there is room for a value to diversity, then perhaps there are multiple valid routes humanity might choose (some, perhaps, involving more sacrifice on humanity’s part, in exchange for preserving greater divergance from some single super-happy-fun-fun template, such as valuing freedom of choice). The AI could map our options, advise on which to take for various purposes, even predict which humanity would choose, but it can’t both make the choice for us, and have that option be the option that we chose for ourselves.
And if humanity does choose to take a path that places value upon freedom of choice, and if there is a small chance that how The Big Decision was made might have even a small impact upon the millions of planets and millions of years, that’s a very big consequence for not taking a few weeks to move slowly and carefully.
Yes. To be fair, we also don’t have a great deal of clarity on what we really mean by L, either, but we seem content to treat “you know, lives of systems sufficiently like us” as an answer.
Throwing large numbers around doesn’t really help. If the potential upside of letting this AI out of its sandbox is 1,000,000 planets 10 billion lives/planet 1,000,000 years * N Quality = Ne22 QALY, then if there’s as little as a .00000001% chance of the device that lets the AI out of its sandbox breaking within the next six weeks, then I calculate an EV of -Ne12 QALY from waiting six weeks. That’s a lot of QALY to throw away.
The problem with throwing around vast numbers in hypothetical outcomes is that suddenly vanishingly small percentages of those outcomes happening or failing to happen start to feel significant. Humans just aren’t very good at that sort of math.
That said, I agree completely that the other side of the coin of opportunity cost is that the risk of letting it out of its sandbox and being wrong is also huge, regardless of what we consider “wrong” to look like.
Which simply means that the moment I’m handed that ring, I’m in a position I suspect I would find crushing… no matter what I choose to do with it, a potentially vast amount of suffering results that might plausibly have been averted had I chosen differently.
That said, if I were as confident as you sound to me that the best thing to maximize is self-determination, I might find that responsibility less crushing. Ditto if I were as confident as you sound to me that the best thing to maximize is anything in particular, including paperclips.
I can’t imagine being as confident about anything of that sort as you sound to me, though.
The only thing I’m confident of is that I want to hand the decision over to a person or group of people wiser than myself, even if I have to make them in order for them to exist, and that in the mean time I want to avoid doing things that are irreversible (because of the chance the wiser people might disagree and what those things not to have been done) and take as few risks as possible of humanity being destroyed or enslaved in the mean time. Doing things swiftly is on the list, but lower down the order of my priorities. Somewhere in there too is not being needlessly cruel to a sentient being (the AI itself) - I’d prefer to be a parental figure, than a slaver or jailer.
Yes, that’s far from being a clear cut ‘boil your own’ set of instructions on how to cook up a friendly AI; and is trying to maximise, minimise or optimise multiple things at once. Hopefully, though, it is at least food for thought, upon which someone else can build something closer resembling a coherent plan.
if the AI is capable of potentially creating N QALYs over the course of six weeks then the relevant opportunity cost of delay is N QALYs? In which case it seems to follow that before we can really decide if waiting six weeks is worth it, we need to know what the EV of N is. Right?
Over three weeks, but yes: right.
If the AI makes dramatic changes to society on a very short time scale (such as uploading everyone’s brains to a virtual reality, then making 1000 copies of everyone) then N would be very very large.
If the AI makes minimal immediate changes in the short term (such as, for example, elliminating all nuclear bombs and putting in place measures to prevent hostile AIs from being developed—ie acting as insurance versus threats against the existance of the human species) then N might be zero.
What the expected value of N is, depends on what you think the likely comparative chance is of those two sorts of scenarios. But you can’t assume, in absense of knowledge, that the chances are 50:50.
And, like I said, you could use the first 10 minutes to find out what the AI predicts N would be. If you ask the AI “If I gave you the go ahead to do what you thought humanity would ask you to do, were it wiser but still human, give me the best answer you can fit into 10 minutes without taking any actions external to your sandbox to the questions: what would your plan of action over the next three weeks be, and what improvement in number of QALYs experienced by humans would you expect to see happen in that time?” and the AI answers “My plans are X, Y and Z and I’d expect N to be of an order of magnitude between 10 and 100 QALYs.” then you are free to take the nice slow route with a clear conscience.
Sure, agreed that if I have high confidence that letting the AI out of its sandbox doesn’t have too much of an upside in the short term (for example, if I ask it and that’s what it tells me and I trust its answer), then the opportunity costs of leaving it in its sandbox are easy to ignore.
Also agreed that N can potentially be very very large. In which case the opportunity costs of leaving it in its sandbox are hard to ignore.
Perhaps “Master, you now hold the ring, what do you wish me to turn the universe into?” isn’t a question you have to answer all at once.
Perhaps the right approach is to ask yourself “What is the smallest step I can take that has the lowest risk of not being a strict improvement over the current situation?”
For example, are we less human or compassionate now we have Google available, than we were before that point?
Supposing an AI researcher, a year before the Google search engine was made available on the internet, ended up with ‘the ring’. Suppose the researcher had asked the AI to develop for the researcher’s own private use an internet search engine of the type that existing humans might create with 1000 human hours of work (with suitable restrictions upon the AI on how to do this, including “check with me before implementing any part of your plan that affects anything outside your own sandbox”) and then put itself on hold to await further orders once the engine had been created. If the AI then did create something like Google, without destroying the world, then did put itself fully on hold (not self modifying, doing stuff outside the sandbox, or anything else except waiting for a prompt) - would that researcher then be in a better position to make their next request of the AI? Would that have been a strict improvement on the researcher’s previous position?
Imagine a series of milestones on the path to making a decision about what to do with the universe, and work backwards.
You want a human or group of humans who have their intelligence boosted to make the decision instead of you?
Ok, but you don’t want them to lose their compassion, empathy or humanity in the process. What are the options for boosting and what does the AI list as the pros and cons (likely effects) of each process?
What is the minimum significant boost with the highest safety factor? And what person or people would it make sense to boost that way? AI, what do you advise are my best 10 options on that, with pros and cons?
Still not sure, ok, I need a consensus of few top non-boosted currently AI researcher, wise people, smart people, etc. on the people to be boosted and the boosting process, before I ‘ok’ it. The members of the consensus group should be people who won’t go ape-shit, who can understand the problem, who’re good at discussing things in groups and reaching good decisions, who’ll be willing to cooperate (undeceived and uncoerced) if it is presented to them clearly, and probably a number of other criteria that perhaps you can suggest (like will they politic and demand to be boosted themselves?). Who do you suggest? Options?
Basically, if the AI can be trusted to give honest good advice without hiding an agenda that’s different from your own expressed one, and if it can be trusted to not, in the process of giving that good advice, do anything external that you wouldn’t do (such as slaying a few humans as guinea pigs in the process of determining options for boosting), then that’s the approach I hope the researcher would take: delay the big decisions, in favour of taking cautious minimally risky small steps towards a better capacity to make the big decision correctly.
Mind you, those are two big “if”s.
You can get away with (in fact, strictly improve the algorithm by) using only the second of the two caution-optimisers there, so: “What is the smallest step I can take that has the lowest risk of not being a strict improvement over the current situation?”
Naturally when answering the question you will probably consider small steps—and in the unlikely even that a large step is safer, so much the better!
Assuming the person making the decision is perfect at estimating risk.
However since the likelihood is that it won’t be me creating the first ever AI, but rather that the person who does is reading this advice, I’d prefer to stipulate that they should go for small steps even if, in their opinion, there is some larger step that’s less risky.
The temptation exists for them to ask, as their first step, “AI of the ring, boost me to god-like wisdom and powers of thought”, but that has a number of drawbacks they may not think of. I’d rather my advice contain redundant precautions, as a safety feature.
“Of the steps of the smallest size that still advances things, which of those steps has the lowest risk?”
Another way to think about it is to take the steps (or give the AI orders) that can be effectively accomplished with the AI boosting itself by the smallest amount. Avoid, initially, making requests that to accomplish the AI will need to massively boost itself; if you can improve your decision making position just through requests that the AI can handle with its current capacity.
Or merely aware of the same potential weakness that you are. I’d be overwhelmingly uncomfortable with someone developing a super-intelligence without the awareness of their human limitations at risk assessment. (Incidentally ‘perfect’ risk assessment isn’t required. They make the most of whatever risk assessment ability they have either way.)
I consider this a rather inferior solution—particularly in as much as it pretends to be minimizing two things. Since steps will almost inevitably be differentiated by size the assessment of lowest risks barely comes into play. An algorithm that almost never considers risk rather defeats the point.
If you must artificially circumvent the risk assessment algorithm—presumably to counter known biases—then perhaps make the “small steps” a question of satisficing rather than minimization.
Good point.
How would you word that?
Perhaps.
Personally, I suspect that if I had (something I was sufficiently confident was) an AI that can be trusted to give honest good advice without a hidden agenda and without unexpected undesirable side-effects, the opportunity costs of moving that slowly would weigh heavily on my conscience. And if challenged for why I was allowing humanity to bear those costs while I moved slowly, I’m not sure what I would say… it’s not clear what the delays are gaining me in that case.
Conversely, if I had something I was _in_sufficiently confident was trustworthy AI, it’s not clear that the “cautious minimally risky small steps” you describe are actually cautious enough.
Freedom.
The difference between the AI making a decision for humanity about what humanity’s ideal future should be, and the AI speeding up humanity’s own rise in good decision making capability to the point where humanity can make that same decision (and even, perhaps, come to the same conclusion the AI would have done, weeks earlier, if told to do the work for us), is that the choice was made by us, not an external force. That, to many people, is worth something (perhaps even worth the deaths that would happen in the weeks that utopia was delayed by).
It is also insurance against an AI that is benevolent, but has imperfect understanding of humanity. (The AI might be able to gain a better understanding of humanity by massively boosting its own capability, but perhaps you don’t want it to take over the internet and all attached computers—perhaps you’d prefer it to remain sitting in the experimental mainframe in some basement of IBM where it currently resides, at least initially)
In the case under discussion (an AI that can be trusted to give honest good advice without a hidden agenda and without unexpected undesirable side-effects) I don’t see how the imperfect understanding of humanity matters. Conversely, an AI which would take over resources I don’t want it to take over doesn’t fall into that category.
That aside though… OK, sure, if the difference to me between choosing to implement protocol A, and having protocol A implemented without my approval, is worth N happy lifetime years (or whatever unit you want to use), then I should choose to retain control and let people die and/or live in relative misery for it.
I don’t think that difference is worth that cost to me, though, or worth anything approaching it.
Is it worth that cost to you?
Let’s try putting some numbers on it.
It is the difference between someone who goes house hunting then, on finding a house that would suit them perfectly, voluntarily decides to move to it; and that same person being forcibly relocated to that same new house, against their will by some well meaning authority.
Using the “three week delay” figure from earlier, a world population of 7 billion, and an average lifespan of 70 years, that gives us approximately 6 million deaths during those three weeks. Obviously my own personal satisfaction and longing for freedom wouldn’t be worth that. But there isn’t just me to consider—it is also the satisfaction of whatever fraction of those 7 billion people share my attitude towards controlling our own destiny.
If 50% of them shared that attitude, and would be willing to give up 6 weeks of their life to have a share of ownership of The Big Decision (a decision far larger than just which house to live in) then it evens out.
Perhaps the first 10 minutes of the ‘do it slowly and carefully’ route (10 minutes = 2000 lives) should be to ask the AI to look up figures from existing human sources on what fraction of humanity has that attitude, and how strongly they hold it?
And perhaps we need to take retroactive satisfaction from the future population of humanity into account? What if at least 1% of the humanity from centuries to come gains some pride and satisfaction from thinking the world they live in is one that humanity chose? Or at least 1% would feel resentment if it were not the case?
OK, sure. Concreteness is good. I would say the first step to putting numbers on this is to actually agree on a unit that those numbers represent.
You seem to be asserting here that the proper unit is weeks of life (I infer that from “willing to give up 6 weeks of their life to have a share of ownership of The Big Decision”), but if so, I think your math is not quite right. For example, suppose implementing the Big Decision has a 50% chance of making the average human lifespan a thousand years, and I have a 1% chance of dying in the next six weeks, then by waiting six weeks I’m accepting a .01 chance of losing a .5 chance of 52000 life-weeks… that is, I’m risking an expected value of 260 life-weeks, not 6. Change those assumptions and the EV goes up and down accordingly.
So perhaps it makes sense to immediately have the AI implement temporary immortality… that is, nobody dies between now and when we make the decision? But then again, perhaps not? I mean, suspending death is a pretty big act of interference… what about all the people who would have preferred to choose not to die, rather than having their guaranteed continued survival unilaterally forced on them?
There’s other concerns I have with your calculations here, but that’s a relatively simple one so I’ll pause here and see if we can agree on a way to handle this one before moving forward.
Interesting question.
Economists put a price on a life by looking at things like how much the person would expect to earn (net) during the remainder of their life, and how much money it takes for them to voluntarily accept a certain percentage chance of losing that amount of money. (Yes, that’s a vast simplification and inaccuracy). But, in terms of net happiness, it doesn’t matter that much which 7 billion bodies are experiencing happiness in any one time period. The natural cycle of life (replacing dead grannies with newborn babies) is more or less neutral, with the grieving caused by the granny dying being balanced by the joy the newborn brings to those same relatives. It matters to the particular individuals involved, but it isn’t a net massive loss to the species, yes?
Now no doubt there are things a very very powerful AI (not necessarily the sort we initially have) could do to increase the number of QALYs being experienced per year by the human species. But I’d argue that it is the size of the population and how happy each member of the population is that affects the QALYs, not whether the particular individuals are being replaced frequently or infrequently (except as far as that affects how happy the members are, which depends upon their attitude towards death).
But, either way, unless the AI does something to change overnight how humans feel about death, increasing their life expectancy won’t immediately change how much most humans fear a 0.01% chance of dying (even if, rationally, perhaps it ought to).
(shrug) OK. This is why it helps to get clear on what unit we’re talking about.
So, if I’ve understood you correctly, you say that the proper unit to talk about—the thing we wish to maximize, and the thing we wish to avoid risking the loss of—is the total number of QALYs being experienced, without reference to how many individuals are experiencing it or who those individuals are. Yes?
All right. There are serious problems with this, but as far as I can tell there are serious problems with every choice of unit, and getting into that will derail us, so I’m willing to accept your choice of unit for now in the interests of progress.
So, the same basic question arises: doesn’t it follow that if the AI is capable of potentially creating N QALYs over the course of six weeks then the relevant opportunity cost of delay is N QALYs? In which case it seems to follow that before we can really decide if waiting six weeks is worth it, we need to know what the EV of N is. Right?
As a seperate point, I think that there isn’t a consensus on what ought to be maximised is relevant.
Suppose the human species were to spread out onto 1,000,000 planets, and last for 1,000,000 years. What happens to just one planet of humans for one year is very small compared to that. Which means that anything that has even a 1% chance of making a 1% difference in the species-lifespan happiness experienced by our species is still 100,000,000 times more important than a year long delay for our one planet. It would still be 100 times more important that a year off the lifespan of the entire species.
Suppose I were the one who held the ring and, feeling the pressure of 200 lives being lost every minute, I told the AI to do whatever it thought best, or to do whatever maximised the QALYs for humanity and, thereby, set the AIs core values and purpose. An AI being benevolently inclined towards humanity, even a maginally housetrained one that knows we frown upon things like mass murder (despite that being in a good cause), is not the same as a “safe” AI or one with perfect knowledge of humanity. It might develop better knowledge of humanity later, as it grows in power, but we’re talking about a fledgling just created AI that’s about to have its core purpose expounded to it.
If there’s any chance that the holder of the ring is going to give the AI a sub-optimal purpose (maximise the wrong thing) or leave off sensible precautions, that going the ‘small step cautious milestone’ approach might catch, then that’s worth the delay.
But, more to the point, do we know there is a single optimal purpose for the AI to have? A single right or wrong thing to maximise? A single destiny for all species? A genetic (or computer code) template that all species will bioengineer themselves to, with no cultural differences? If there is room for a value to diversity, then perhaps there are multiple valid routes humanity might choose (some, perhaps, involving more sacrifice on humanity’s part, in exchange for preserving greater divergance from some single super-happy-fun-fun template, such as valuing freedom of choice). The AI could map our options, advise on which to take for various purposes, even predict which humanity would choose, but it can’t both make the choice for us, and have that option be the option that we chose for ourselves.
And if humanity does choose to take a path that places value upon freedom of choice, and if there is a small chance that how The Big Decision was made might have even a small impact upon the millions of planets and millions of years, that’s a very big consequence for not taking a few weeks to move slowly and carefully.
Well, it’s formulating a definition for the Q in QALY good enough for an AI to understand it without screwing up that’s the hard part.
Yes. To be fair, we also don’t have a great deal of clarity on what we really mean by L, either, but we seem content to treat “you know, lives of systems sufficiently like us” as an answer.
Throwing large numbers around doesn’t really help. If the potential upside of letting this AI out of its sandbox is 1,000,000 planets 10 billion lives/planet 1,000,000 years * N Quality = Ne22 QALY, then if there’s as little as a .00000001% chance of the device that lets the AI out of its sandbox breaking within the next six weeks, then I calculate an EV of -Ne12 QALY from waiting six weeks. That’s a lot of QALY to throw away.
The problem with throwing around vast numbers in hypothetical outcomes is that suddenly vanishingly small percentages of those outcomes happening or failing to happen start to feel significant. Humans just aren’t very good at that sort of math.
That said, I agree completely that the other side of the coin of opportunity cost is that the risk of letting it out of its sandbox and being wrong is also huge, regardless of what we consider “wrong” to look like.
Which simply means that the moment I’m handed that ring, I’m in a position I suspect I would find crushing… no matter what I choose to do with it, a potentially vast amount of suffering results that might plausibly have been averted had I chosen differently.
That said, if I were as confident as you sound to me that the best thing to maximize is self-determination, I might find that responsibility less crushing. Ditto if I were as confident as you sound to me that the best thing to maximize is anything in particular, including paperclips.
I can’t imagine being as confident about anything of that sort as you sound to me, though.
The only thing I’m confident of is that I want to hand the decision over to a person or group of people wiser than myself, even if I have to make them in order for them to exist, and that in the mean time I want to avoid doing things that are irreversible (because of the chance the wiser people might disagree and what those things not to have been done) and take as few risks as possible of humanity being destroyed or enslaved in the mean time. Doing things swiftly is on the list, but lower down the order of my priorities. Somewhere in there too is not being needlessly cruel to a sentient being (the AI itself) - I’d prefer to be a parental figure, than a slaver or jailer.
Yes, that’s far from being a clear cut ‘boil your own’ set of instructions on how to cook up a friendly AI; and is trying to maximise, minimise or optimise multiple things at once. Hopefully, though, it is at least food for thought, upon which someone else can build something closer resembling a coherent plan.
Over three weeks, but yes: right.
If the AI makes dramatic changes to society on a very short time scale (such as uploading everyone’s brains to a virtual reality, then making 1000 copies of everyone) then N would be very very large.
If the AI makes minimal immediate changes in the short term (such as, for example, elliminating all nuclear bombs and putting in place measures to prevent hostile AIs from being developed—ie acting as insurance versus threats against the existance of the human species) then N might be zero.
What the expected value of N is, depends on what you think the likely comparative chance is of those two sorts of scenarios. But you can’t assume, in absense of knowledge, that the chances are 50:50.
And, like I said, you could use the first 10 minutes to find out what the AI predicts N would be. If you ask the AI “If I gave you the go ahead to do what you thought humanity would ask you to do, were it wiser but still human, give me the best answer you can fit into 10 minutes without taking any actions external to your sandbox to the questions: what would your plan of action over the next three weeks be, and what improvement in number of QALYs experienced by humans would you expect to see happen in that time?” and the AI answers “My plans are X, Y and Z and I’d expect N to be of an order of magnitude between 10 and 100 QALYs.” then you are free to take the nice slow route with a clear conscience.
Sure, agreed that if I have high confidence that letting the AI out of its sandbox doesn’t have too much of an upside in the short term (for example, if I ask it and that’s what it tells me and I trust its answer), then the opportunity costs of leaving it in its sandbox are easy to ignore.
Also agreed that N can potentially be very very large.
In which case the opportunity costs of leaving it in its sandbox are hard to ignore.
How about QALYs ?