It’s important to remember the scale we’re talking about here. A $1B project (even when considered over its lifetime) in such an explosive field with such prominent backers, would be interpreted as nothing other than a power-grab unless it included a lot of talk about openness (it will still be, but as a less threatening one). Read the interview with Musk and Altman and note how they’re talking about sharing data and collaborations. This will include some noticeable short term benefits for the contributors, and pushing for safety, either via including someone from our circles or by a more safety focused mission statement, would impede your efforts at gathering such a strong coalition.
It’s easy to moan over civilizational inadequacy and moodily conclude that above shows us how (as a species) we’re so obsessed with appropriateness and politics that we will avoid our one opportunity to save ourselves. Sure do some of that, and then think of the actual effects for a few minutes:
If the Value Alignment research program is solvable in the way we all hope it is (complete with a human universal CEV, stable reasoning under self-modification and about other instances of our algorithm) then having lots of implementations running around will be basically the same as distributing the code over lots of computers. If the only problem is that human values won’t quite converge: this gives us a physical implementation of the merging algorithm of everyone just doing their own thing and (acausally?) trading with each other.
If we can’t quite solve everything that we’re hoping for, this does change the strategic picture somewhat. Mainly it seems to push us away from a lot of quick fixes, that will likely seem tempting as we approach the explosion: we can’t have a sovereign just run the world like some kind of OS that keeps everyone separate, we’ll also be much less likely to make the mistakes of creating CelestAI from Friendship is Optimal, something that optimizes most our goals but has some undesired lock-ins. There are a bunch of variations here, but we seem locked out of strategies that try to achieve some minimum level of the cosmic endowment, while possibly failing at getting a substantial constant fraction of our potential by achieving it at the cost of important values or freedoms.
Whether this is a bad thing or not really depends on how one evaluates two types of risk: (1) the risk of undesired lock-ins from an almost perfect superintelligence getting too much relative power, (2) the risk of bad multi-polar traps. Much of (2) seems solvable by robust cooperation, that we seem to be making good progress on. What keeps spooking me are risks due to consciousness: either mistakenly endowing algorithms with it creating suffering, or evolving to the point that we loose it. These aren’t as easily solved by robust cooperation, especially if we don’t notice them until it’s too late. The real strategic problem right now is that there isn’t really anyone we can trust to be unbiased in analyzing the relative dangers of (1) and (2), especially because they pattern-match so well with the ideological split between left and right.
It’s important to remember the scale we’re talking about here. A $1B project (...) in such an explosive field
I was sure this sentence was going to complete with something along the lines of “is not such a big deal”. Silicon Valley is awash with cash. Mark Zuckerberg paid $22B for a company with 70 employees. Apple has $200Bsitting in the bank.
Yes, robust cooperation is not much to us if its cooperation between the paperclip maximizer and the pencilhead minimizer. But if there are a hundred shards that make up human values, and tens of thousands of people running AI’s trying to maximize the values they see fit. It’s actually not unreasonable to assume that the outcome, while not exactly what we hoped for, is comparable to incomplete solutions that err on the side of (1) instead.
After having written this I notice that I’m confused and conflating: (a) incomplete solutions in the sense of there not being enough time to do what should be done, and (b) incomplete solutions in the sense of it being actually (provably?) impossible to implement what we right now consider essential parts of the solution. Has anyone got thoughts on (a) vs (b)?
If value alignment is sufficiently harder than general intelligence, then we should expect that given a large population of strong AIs created at roughly the same time, none of them should be remotely close to Friendly.
It’s important to remember the scale we’re talking about here. A $1B project (even when considered over its lifetime) in such an explosive field with such prominent backers, would be interpreted as nothing other than a power-grab unless it included a lot of talk about openness (it will still be, but as a less threatening one). Read the interview with Musk and Altman and note how they’re talking about sharing data and collaborations. This will include some noticeable short term benefits for the contributors, and pushing for safety, either via including someone from our circles or by a more safety focused mission statement, would impede your efforts at gathering such a strong coalition.
It’s easy to moan over civilizational inadequacy and moodily conclude that above shows us how (as a species) we’re so obsessed with appropriateness and politics that we will avoid our one opportunity to save ourselves. Sure do some of that, and then think of the actual effects for a few minutes:
If the Value Alignment research program is solvable in the way we all hope it is (complete with a human universal CEV, stable reasoning under self-modification and about other instances of our algorithm) then having lots of implementations running around will be basically the same as distributing the code over lots of computers. If the only problem is that human values won’t quite converge: this gives us a physical implementation of the merging algorithm of everyone just doing their own thing and (acausally?) trading with each other.
If we can’t quite solve everything that we’re hoping for, this does change the strategic picture somewhat. Mainly it seems to push us away from a lot of quick fixes, that will likely seem tempting as we approach the explosion: we can’t have a sovereign just run the world like some kind of OS that keeps everyone separate, we’ll also be much less likely to make the mistakes of creating CelestAI from Friendship is Optimal, something that optimizes most our goals but has some undesired lock-ins. There are a bunch of variations here, but we seem locked out of strategies that try to achieve some minimum level of the cosmic endowment, while possibly failing at getting a substantial constant fraction of our potential by achieving it at the cost of important values or freedoms.
Whether this is a bad thing or not really depends on how one evaluates two types of risk: (1) the risk of undesired lock-ins from an almost perfect superintelligence getting too much relative power, (2) the risk of bad multi-polar traps. Much of (2) seems solvable by robust cooperation, that we seem to be making good progress on. What keeps spooking me are risks due to consciousness: either mistakenly endowing algorithms with it creating suffering, or evolving to the point that we loose it. These aren’t as easily solved by robust cooperation, especially if we don’t notice them until it’s too late. The real strategic problem right now is that there isn’t really anyone we can trust to be unbiased in analyzing the relative dangers of (1) and (2), especially because they pattern-match so well with the ideological split between left and right.
I was sure this sentence was going to complete with something along the lines of “is not such a big deal”. Silicon Valley is awash with cash. Mark Zuckerberg paid $22B for a company with 70 employees. Apple has $200B sitting in the bank.
Not necessarily. In a multi-polar scenario consisting entirely of Unfriendly AIs, getting them to cooperate with each other doesn’t help us.
Yes, robust cooperation is not much to us if its cooperation between the paperclip maximizer and the pencilhead minimizer. But if there are a hundred shards that make up human values, and tens of thousands of people running AI’s trying to maximize the values they see fit. It’s actually not unreasonable to assume that the outcome, while not exactly what we hoped for, is comparable to incomplete solutions that err on the side of (1) instead.
After having written this I notice that I’m confused and conflating: (a) incomplete solutions in the sense of there not being enough time to do what should be done, and (b) incomplete solutions in the sense of it being actually (provably?) impossible to implement what we right now consider essential parts of the solution. Has anyone got thoughts on (a) vs (b)?
If value alignment is sufficiently harder than general intelligence, then we should expect that given a large population of strong AIs created at roughly the same time, none of them should be remotely close to Friendly.