E.g. if they can take steps towards safe cognitive enhancement
I didn’t think that the scenario assumed the bunch of humans in a box had access to enough industrial/technology base to do cognitive enhancement. It seems like we’re in danger of getting bogged down in details about the “people in box” scenario, which I don’t think was meant to be a realistic scenario. Maybe we should just go back to talking about your actual AI control proposals?
So I don’t yet see a natural scenario where A and B have are forced to bargain but the “default” is for B to be able to secure a proportional fraction of the future.
Here’s one: Suppose A, B, C each share 1⁄3 of the universe. If A and B join forces they can destroy C and take C’s resources, otherwise it’s a stalemate. (To make the problem easier assume C can’t join with anyone else.) Another one is A and B each have secure rights, but they need to join together to maximize negentropy.
I’m curious whether you see as the disanalogy between these cases
I’m not sure what analogy you’re proposing between the two cases. Can you explain more?
I don’t see how it can revise our view of the value of AI control by more than say a factor of 2.
I didn’t understand this claim when I first read it on your blog. Can you be more formal/explicit about what two numbers you’re comparing, that yields less than a factor of 2?
Maybe we should just go back to talking about your actual AI control proposals
I’m happy to drop it, we seem to go around in circles on this point as well, I thought this example might be easier to agree about but I no longer think that.
I’m not sure what analogy you’re proposing between the two cases. Can you explain more?
Certain destructive technologies will lead to bad outcomes unless we have strong coordination mechanisms (to prevent anyone from using such technologies). Certain philosophical errors might lead to bad outcomes unless we have strong coordination mechanisms (to prevent anyone from implementing philosophically unsophisticated solutions). The mechanisms that could cope with destructive technologies could also cope with philosophical problems.
Can you be more formal/explicit about what two numbers you’re comparing, that yields less than a factor of 2?
You argue: there are likely to exist philosophical problems which must be solved before reaching a certain level of technological sophistication, or else there will be serious negative consequences.
I reply: your argument has at most a modest effect on the value of AI control work of the kind I advocate.
Your claim does suggest that my AI control work is less valuable. If there are hard philosophical problems (or destructive physical technologies), then we may be doomed unless we coordinate well, whether or not we solve AI control.
Here is a very crude quantitative model, to make it clear what I am talking about.
Let P1 be the probability of coordinating before the development of AI that would be catastrophic without AI control, and let P2 be the probability of coordinating before the next destructive technology / killer philosophical hurdle after that.
If there are no destructive technologies or philosophical hurdles, then the value of solving AI control is (1 - P1). If there are destructive technologies or philosophical hurdles, then the value of solving AI control is (P2 - P1). I am arguing that (P2 - P1) >= 0.5 * (1 - P1).
This comes down to the claim that P(get house in order after AI but before catastrophe | not gotten house in order prior to AI) is at least 1⁄2.
If we both believe this claim, then it seems like the disagreement between us about philosophy could at best account for a factor of 2 difference in our estimates of how valuable AI control research is (where value is measured in terms of “fraction of the universe”—if we measure value in terms of dollars, your argument could potentially decrease our value significantly, since it might suggest that other interventions could do more good and hence dollars are more valuable in terms of “fraction of the universe”).
Realistically it would account for much less though, since we can both agree that there are likely to be destructive technologies, and so all we are really doing is adjusting the timing of the next hurdle that requires coordination.
Suppose A, B, C each share 1⁄3 of the universe. If A and B join forces they can destroy C and take C’s resources, otherwise it’s a stalemate. (To make the problem easier assume C can’t join with anyone else.) Another one is A and B each have secure rights, but they need to join together to maximize negentropy.
I’m not sure it’s worth arguing about this. I think that (1) these examples do only a little to increase my expectation of losses from insufficiently-sophisticated understanding of bargaining, I’m happy to argue about it if it ends up being important, but (2) it seems like the main difference is that I am looking for arguments that particular problems are costly such that it is worthwhile to work on them, while you are looking for an argument that there won’t be any costly problems. (This is related to the discussion above.)
Unlike destructive technologies, philosophical hurdles are only a problem for aligned AIs. With destructive technologies, both aligned and unaligned AIs (at least the ones who don’t terminally value destruction) would want to coordinate to prevent them and they only have to figure out how. But with philosophical problems, unaligned AIs instead want to exploit them to gain advantages over aligned AIs. For example if aligned AIs have to spend a lot of time to think about how to merge or self-improve safely (due to deferring to slow humans), unaligned AIs won’t want to join some kind of global pact to all wait for the humans to decide, but will instead move forward amongst themselves as quickly as they can. This seems like a crucial disanalogy between destructive technologies and philosophical hurdles.
This comes down to the claim that P(get house in order after AI but before catastrophe | not gotten house in order prior to AI) is at least 1⁄2.
This seems really high. In your Medium article you only argued that (paraphrasing) AI could be as helpful for improving coordination as for creating destructive technology. I don’t see how you get from that to this conclusion.
Unaligned AIs don’t necessarily have efficient idealized values. Waiting for (simulated) humans to decide is analogous to computing a complicated pivotal fact about unaligned AI’s values. It’s not clear that “naturally occurring” unaligned AIs have simpler idealized/extrapolated values than aligned AIs with upload-based value definitions. Some unaligned AIs may actually be on the losing side, recall the encrypted-values AI example.
I didn’t think that the scenario assumed the bunch of humans in a box had access to enough industrial/technology base to do cognitive enhancement. It seems like we’re in danger of getting bogged down in details about the “people in box” scenario, which I don’t think was meant to be a realistic scenario. Maybe we should just go back to talking about your actual AI control proposals?
Here’s one: Suppose A, B, C each share 1⁄3 of the universe. If A and B join forces they can destroy C and take C’s resources, otherwise it’s a stalemate. (To make the problem easier assume C can’t join with anyone else.) Another one is A and B each have secure rights, but they need to join together to maximize negentropy.
I’m not sure what analogy you’re proposing between the two cases. Can you explain more?
I didn’t understand this claim when I first read it on your blog. Can you be more formal/explicit about what two numbers you’re comparing, that yields less than a factor of 2?
I’m happy to drop it, we seem to go around in circles on this point as well, I thought this example might be easier to agree about but I no longer think that.
Certain destructive technologies will lead to bad outcomes unless we have strong coordination mechanisms (to prevent anyone from using such technologies). Certain philosophical errors might lead to bad outcomes unless we have strong coordination mechanisms (to prevent anyone from implementing philosophically unsophisticated solutions). The mechanisms that could cope with destructive technologies could also cope with philosophical problems.
You argue: there are likely to exist philosophical problems which must be solved before reaching a certain level of technological sophistication, or else there will be serious negative consequences.
I reply: your argument has at most a modest effect on the value of AI control work of the kind I advocate.
Your claim does suggest that my AI control work is less valuable. If there are hard philosophical problems (or destructive physical technologies), then we may be doomed unless we coordinate well, whether or not we solve AI control.
Here is a very crude quantitative model, to make it clear what I am talking about.
Let P1 be the probability of coordinating before the development of AI that would be catastrophic without AI control, and let P2 be the probability of coordinating before the next destructive technology / killer philosophical hurdle after that.
If there are no destructive technologies or philosophical hurdles, then the value of solving AI control is (1 - P1). If there are destructive technologies or philosophical hurdles, then the value of solving AI control is (P2 - P1). I am arguing that (P2 - P1) >= 0.5 * (1 - P1).
This comes down to the claim that P(get house in order after AI but before catastrophe | not gotten house in order prior to AI) is at least 1⁄2.
If we both believe this claim, then it seems like the disagreement between us about philosophy could at best account for a factor of 2 difference in our estimates of how valuable AI control research is (where value is measured in terms of “fraction of the universe”—if we measure value in terms of dollars, your argument could potentially decrease our value significantly, since it might suggest that other interventions could do more good and hence dollars are more valuable in terms of “fraction of the universe”).
Realistically it would account for much less though, since we can both agree that there are likely to be destructive technologies, and so all we are really doing is adjusting the timing of the next hurdle that requires coordination.
I’m not sure it’s worth arguing about this. I think that (1) these examples do only a little to increase my expectation of losses from insufficiently-sophisticated understanding of bargaining, I’m happy to argue about it if it ends up being important, but (2) it seems like the main difference is that I am looking for arguments that particular problems are costly such that it is worthwhile to work on them, while you are looking for an argument that there won’t be any costly problems. (This is related to the discussion above.)
Unlike destructive technologies, philosophical hurdles are only a problem for aligned AIs. With destructive technologies, both aligned and unaligned AIs (at least the ones who don’t terminally value destruction) would want to coordinate to prevent them and they only have to figure out how. But with philosophical problems, unaligned AIs instead want to exploit them to gain advantages over aligned AIs. For example if aligned AIs have to spend a lot of time to think about how to merge or self-improve safely (due to deferring to slow humans), unaligned AIs won’t want to join some kind of global pact to all wait for the humans to decide, but will instead move forward amongst themselves as quickly as they can. This seems like a crucial disanalogy between destructive technologies and philosophical hurdles.
This seems really high. In your Medium article you only argued that (paraphrasing) AI could be as helpful for improving coordination as for creating destructive technology. I don’t see how you get from that to this conclusion.
Unaligned AIs don’t necessarily have efficient idealized values. Waiting for (simulated) humans to decide is analogous to computing a complicated pivotal fact about unaligned AI’s values. It’s not clear that “naturally occurring” unaligned AIs have simpler idealized/extrapolated values than aligned AIs with upload-based value definitions. Some unaligned AIs may actually be on the losing side, recall the encrypted-values AI example.