So, if I’ve understood you correctly, you say that the proper unit to talk about—the thing we wish to maximize, and the thing we wish to avoid risking the loss of—is the total number of QALYs being experienced, without reference to how many individuals are experiencing it or who those individuals are. Yes?
All right. There are serious problems with this, but as far as I can tell there are serious problems with every choice of unit, and getting into that will derail us, so I’m willing to accept your choice of unit for now in the interests of progress.
As a seperate point, I think that there isn’t a consensus on what ought to be maximised is relevant.
Suppose the human species were to spread out onto 1,000,000 planets, and last for 1,000,000 years. What happens to just one planet of humans for one year is very small compared to that. Which means that anything that has even a 1% chance of making a 1% difference in the species-lifespan happiness experienced by our species is still 100,000,000 times more important than a year long delay for our one planet. It would still be 100 times more important that a year off the lifespan of the entire species.
Suppose I were the one who held the ring and, feeling the pressure of 200 lives being lost every minute, I told the AI to do whatever it thought best, or to do whatever maximised the QALYs for humanity and, thereby, set the AIs core values and purpose. An AI being benevolently inclined towards humanity, even a maginally housetrained one that knows we frown upon things like mass murder (despite that being in a good cause), is not the same as a “safe” AI or one with perfect knowledge of humanity. It might develop better knowledge of humanity later, as it grows in power, but we’re talking about a fledgling just created AI that’s about to have its core purpose expounded to it.
If there’s any chance that the holder of the ring is going to give the AI a sub-optimal purpose (maximise the wrong thing) or leave off sensible precautions, that going the ‘small step cautious milestone’ approach might catch, then that’s worth the delay.
But, more to the point, do we know there is a single optimal purpose for the AI to have? A single right or wrong thing to maximise? A single destiny for all species? A genetic (or computer code) template that all species will bioengineer themselves to, with no cultural differences? If there is room for a value to diversity, then perhaps there are multiple valid routes humanity might choose (some, perhaps, involving more sacrifice on humanity’s part, in exchange for preserving greater divergance from some single super-happy-fun-fun template, such as valuing freedom of choice). The AI could map our options, advise on which to take for various purposes, even predict which humanity would choose, but it can’t both make the choice for us, and have that option be the option that we chose for ourselves.
And if humanity does choose to take a path that places value upon freedom of choice, and if there is a small chance that how The Big Decision was made might have even a small impact upon the millions of planets and millions of years, that’s a very big consequence for not taking a few weeks to move slowly and carefully.
Yes. To be fair, we also don’t have a great deal of clarity on what we really mean by L, either, but we seem content to treat “you know, lives of systems sufficiently like us” as an answer.
Throwing large numbers around doesn’t really help. If the potential upside of letting this AI out of its sandbox is 1,000,000 planets 10 billion lives/planet 1,000,000 years * N Quality = Ne22 QALY, then if there’s as little as a .00000001% chance of the device that lets the AI out of its sandbox breaking within the next six weeks, then I calculate an EV of -Ne12 QALY from waiting six weeks. That’s a lot of QALY to throw away.
The problem with throwing around vast numbers in hypothetical outcomes is that suddenly vanishingly small percentages of those outcomes happening or failing to happen start to feel significant. Humans just aren’t very good at that sort of math.
That said, I agree completely that the other side of the coin of opportunity cost is that the risk of letting it out of its sandbox and being wrong is also huge, regardless of what we consider “wrong” to look like.
Which simply means that the moment I’m handed that ring, I’m in a position I suspect I would find crushing… no matter what I choose to do with it, a potentially vast amount of suffering results that might plausibly have been averted had I chosen differently.
That said, if I were as confident as you sound to me that the best thing to maximize is self-determination, I might find that responsibility less crushing. Ditto if I were as confident as you sound to me that the best thing to maximize is anything in particular, including paperclips.
I can’t imagine being as confident about anything of that sort as you sound to me, though.
The only thing I’m confident of is that I want to hand the decision over to a person or group of people wiser than myself, even if I have to make them in order for them to exist, and that in the mean time I want to avoid doing things that are irreversible (because of the chance the wiser people might disagree and what those things not to have been done) and take as few risks as possible of humanity being destroyed or enslaved in the mean time. Doing things swiftly is on the list, but lower down the order of my priorities. Somewhere in there too is not being needlessly cruel to a sentient being (the AI itself) - I’d prefer to be a parental figure, than a slaver or jailer.
Yes, that’s far from being a clear cut ‘boil your own’ set of instructions on how to cook up a friendly AI; and is trying to maximise, minimise or optimise multiple things at once. Hopefully, though, it is at least food for thought, upon which someone else can build something closer resembling a coherent plan.
As a seperate point, I think that there isn’t a consensus on what ought to be maximised is relevant.
Suppose the human species were to spread out onto 1,000,000 planets, and last for 1,000,000 years. What happens to just one planet of humans for one year is very small compared to that. Which means that anything that has even a 1% chance of making a 1% difference in the species-lifespan happiness experienced by our species is still 100,000,000 times more important than a year long delay for our one planet. It would still be 100 times more important that a year off the lifespan of the entire species.
Suppose I were the one who held the ring and, feeling the pressure of 200 lives being lost every minute, I told the AI to do whatever it thought best, or to do whatever maximised the QALYs for humanity and, thereby, set the AIs core values and purpose. An AI being benevolently inclined towards humanity, even a maginally housetrained one that knows we frown upon things like mass murder (despite that being in a good cause), is not the same as a “safe” AI or one with perfect knowledge of humanity. It might develop better knowledge of humanity later, as it grows in power, but we’re talking about a fledgling just created AI that’s about to have its core purpose expounded to it.
If there’s any chance that the holder of the ring is going to give the AI a sub-optimal purpose (maximise the wrong thing) or leave off sensible precautions, that going the ‘small step cautious milestone’ approach might catch, then that’s worth the delay.
But, more to the point, do we know there is a single optimal purpose for the AI to have? A single right or wrong thing to maximise? A single destiny for all species? A genetic (or computer code) template that all species will bioengineer themselves to, with no cultural differences? If there is room for a value to diversity, then perhaps there are multiple valid routes humanity might choose (some, perhaps, involving more sacrifice on humanity’s part, in exchange for preserving greater divergance from some single super-happy-fun-fun template, such as valuing freedom of choice). The AI could map our options, advise on which to take for various purposes, even predict which humanity would choose, but it can’t both make the choice for us, and have that option be the option that we chose for ourselves.
And if humanity does choose to take a path that places value upon freedom of choice, and if there is a small chance that how The Big Decision was made might have even a small impact upon the millions of planets and millions of years, that’s a very big consequence for not taking a few weeks to move slowly and carefully.
Well, it’s formulating a definition for the Q in QALY good enough for an AI to understand it without screwing up that’s the hard part.
Yes. To be fair, we also don’t have a great deal of clarity on what we really mean by L, either, but we seem content to treat “you know, lives of systems sufficiently like us” as an answer.
Throwing large numbers around doesn’t really help. If the potential upside of letting this AI out of its sandbox is 1,000,000 planets 10 billion lives/planet 1,000,000 years * N Quality = Ne22 QALY, then if there’s as little as a .00000001% chance of the device that lets the AI out of its sandbox breaking within the next six weeks, then I calculate an EV of -Ne12 QALY from waiting six weeks. That’s a lot of QALY to throw away.
The problem with throwing around vast numbers in hypothetical outcomes is that suddenly vanishingly small percentages of those outcomes happening or failing to happen start to feel significant. Humans just aren’t very good at that sort of math.
That said, I agree completely that the other side of the coin of opportunity cost is that the risk of letting it out of its sandbox and being wrong is also huge, regardless of what we consider “wrong” to look like.
Which simply means that the moment I’m handed that ring, I’m in a position I suspect I would find crushing… no matter what I choose to do with it, a potentially vast amount of suffering results that might plausibly have been averted had I chosen differently.
That said, if I were as confident as you sound to me that the best thing to maximize is self-determination, I might find that responsibility less crushing. Ditto if I were as confident as you sound to me that the best thing to maximize is anything in particular, including paperclips.
I can’t imagine being as confident about anything of that sort as you sound to me, though.
The only thing I’m confident of is that I want to hand the decision over to a person or group of people wiser than myself, even if I have to make them in order for them to exist, and that in the mean time I want to avoid doing things that are irreversible (because of the chance the wiser people might disagree and what those things not to have been done) and take as few risks as possible of humanity being destroyed or enslaved in the mean time. Doing things swiftly is on the list, but lower down the order of my priorities. Somewhere in there too is not being needlessly cruel to a sentient being (the AI itself) - I’d prefer to be a parental figure, than a slaver or jailer.
Yes, that’s far from being a clear cut ‘boil your own’ set of instructions on how to cook up a friendly AI; and is trying to maximise, minimise or optimise multiple things at once. Hopefully, though, it is at least food for thought, upon which someone else can build something closer resembling a coherent plan.