Note that you are talking about a very low propability of a potentially very high impact scenario. This is going to push the envelope on your system of determining atttitudes towards uncertain events.
One would assume that human-to-human torture is low-enough risk with low-enough impact that it doesn’t trigger money usage concerns. Then you could estimate how much more unlikely a malicious AI would be and how much more badder that could be in comparison. It could be argued that the badness raises faster than the unlikelihood lowers making it more relevant. However you could try to quantify that by some Fermi estimation. If one intelligence could have a total misery impact of a nazi concentration camp (a highish torture unit) what would need to be the per person chance of those kinds of people existing that you would cough up cash? You can then try to raise the unit of terribelness. If every human could choose to trigger global thermonuclear war (which I think only top brass of a couple of countries are able to do easily currently) what rate of moral fallibility would be unacceptable? I guess it also the idea that if you are asking what the probability is in order to make a spread to match that you can first fix a spread and then ask what the probability would need to be that this spread would still be warranted. Such as “Will the probability be high enough that giving MIRI $1 is warrranted?” This is a yes or no question and the answer is likely to be a lower bound on the needed probability*impact or whatever the relevant thing to say about the future.
Do note that because you are applying a very rough description of your ethics in a very extreme situation it would be good to be confident that the description really applies. For example because you prefer non-suffering over life it means you ought to also prefer death over suffering. The implication of this is that killing off people that are miserable improves a situation ie that euthanasia should be exercised whereever applicaple. It also means that maximising whatever is the opposite of suffering should be more important than life extension. This will lead into some form of “short rock star life” over “long grey life”. It differs a little depending on whether it’s virtousness, pleasure or goal attainment. If you are unsymmteric about torture and anti-torture then you need to decide what is your attitudes against futures of roughly equal expected amonut of torture and anti-torture and a dull nothing happens future (no torture or anti-torture) ie does chance of heaven mitigate against chance of hell? Then there is also the issue on whether you care about the depth of the torture ie how many circles in the hell there are. If you could half the chance of hell but make it twice as worse if it happens is this a neutral move or an improvement? Given that you antivalue torture even more than death it stands also to argue whether there is a maximum amount of anti-value to be created. You can’t be deader than dead but it might be possible to make really heinous forms of torture and spesifically boost the conciousness level of the tortured one to make them especially aware of the horribelness of it all (ie is there a limit to antisedation). There might also be the possibility of antiwireheading (as in the torture doesn’t involve about doing bad but redefining pain signals so that everything the tortured experiences is interpreted as pain, ie instead of doing something to their bodies that they hate make them hate their bodies (or the fact that they are human)).
I do have to note that because the situation is about extreme conditions there are a couple of differences to a typical assement situaiton. In a normal assement we can imagine pretty much on all the possibilities but we lack information on which of them migth be the case. However in this scenario we are not imaginarily complete and might not reach that state by the end of the analysis. That is there is a significant portion of mechanisms that we fail to imagine being relevant. This also leads in a need to “imagine in a fair sampling way”. LIke with machines that search only finite proof lengths the order and expression length of proposition migth alter the outcome, this kind of situation might be sensitive on what parts of it you understand. If you wanted to reach a yes or no outcome (motivated cognition) you could explore only the kinds of ways that might happen. in order to avoid that you would need to know that the “imaginative attention” ratio reflects the ratio of probabilities. However an AI scenario involves the utilisation of yet undiscovered techs. The probabilty effecting properties of those techs can’t be summed neatly without knowing their details. In the same way it’s hard to evaluate base research vs applicative reserach on economic terms it’s hard to evaluate how ethics vs political stability guarantees that world will stay a place that everybody likes. Usually when one attempts reserach one typically doesn’t produce anti-research (ie making experiements doesn’t burn libraries) but messing with ethics has a real chance to produce monsters.
So you migth not actually be looking for information to make an informed decision but understanding to make a concious decision. In this area simply not a lot of information is available.
Note that you are talking about a very low propability of a potentially very high impact scenario. This is going to push the envelope on your system of determining atttitudes towards uncertain events.
One would assume that human-to-human torture is low-enough risk with low-enough impact that it doesn’t trigger money usage concerns. Then you could estimate how much more unlikely a malicious AI would be and how much more badder that could be in comparison. It could be argued that the badness raises faster than the unlikelihood lowers making it more relevant. However you could try to quantify that by some Fermi estimation. If one intelligence could have a total misery impact of a nazi concentration camp (a highish torture unit) what would need to be the per person chance of those kinds of people existing that you would cough up cash? You can then try to raise the unit of terribelness. If every human could choose to trigger global thermonuclear war (which I think only top brass of a couple of countries are able to do easily currently) what rate of moral fallibility would be unacceptable? I guess it also the idea that if you are asking what the probability is in order to make a spread to match that you can first fix a spread and then ask what the probability would need to be that this spread would still be warranted. Such as “Will the probability be high enough that giving MIRI $1 is warrranted?” This is a yes or no question and the answer is likely to be a lower bound on the needed probability*impact or whatever the relevant thing to say about the future.
Do note that because you are applying a very rough description of your ethics in a very extreme situation it would be good to be confident that the description really applies. For example because you prefer non-suffering over life it means you ought to also prefer death over suffering. The implication of this is that killing off people that are miserable improves a situation ie that euthanasia should be exercised whereever applicaple. It also means that maximising whatever is the opposite of suffering should be more important than life extension. This will lead into some form of “short rock star life” over “long grey life”. It differs a little depending on whether it’s virtousness, pleasure or goal attainment. If you are unsymmteric about torture and anti-torture then you need to decide what is your attitudes against futures of roughly equal expected amonut of torture and anti-torture and a dull nothing happens future (no torture or anti-torture) ie does chance of heaven mitigate against chance of hell? Then there is also the issue on whether you care about the depth of the torture ie how many circles in the hell there are. If you could half the chance of hell but make it twice as worse if it happens is this a neutral move or an improvement? Given that you antivalue torture even more than death it stands also to argue whether there is a maximum amount of anti-value to be created. You can’t be deader than dead but it might be possible to make really heinous forms of torture and spesifically boost the conciousness level of the tortured one to make them especially aware of the horribelness of it all (ie is there a limit to antisedation). There might also be the possibility of antiwireheading (as in the torture doesn’t involve about doing bad but redefining pain signals so that everything the tortured experiences is interpreted as pain, ie instead of doing something to their bodies that they hate make them hate their bodies (or the fact that they are human)).
I do have to note that because the situation is about extreme conditions there are a couple of differences to a typical assement situaiton. In a normal assement we can imagine pretty much on all the possibilities but we lack information on which of them migth be the case. However in this scenario we are not imaginarily complete and might not reach that state by the end of the analysis. That is there is a significant portion of mechanisms that we fail to imagine being relevant. This also leads in a need to “imagine in a fair sampling way”. LIke with machines that search only finite proof lengths the order and expression length of proposition migth alter the outcome, this kind of situation might be sensitive on what parts of it you understand. If you wanted to reach a yes or no outcome (motivated cognition) you could explore only the kinds of ways that might happen. in order to avoid that you would need to know that the “imaginative attention” ratio reflects the ratio of probabilities. However an AI scenario involves the utilisation of yet undiscovered techs. The probabilty effecting properties of those techs can’t be summed neatly without knowing their details. In the same way it’s hard to evaluate base research vs applicative reserach on economic terms it’s hard to evaluate how ethics vs political stability guarantees that world will stay a place that everybody likes. Usually when one attempts reserach one typically doesn’t produce anti-research (ie making experiements doesn’t burn libraries) but messing with ethics has a real chance to produce monsters.
So you migth not actually be looking for information to make an informed decision but understanding to make a concious decision. In this area simply not a lot of information is available.