jessicata answers Help me solve this problem: The basilisk isn’t real, but people are

jessicata 28 Nov 2023 5:25 UTC
2 points
0
1. What reason is there to expect Bob is at all likely to succeed? Many people have tried making AGI over the years and none has succeeded. Aligning the AI would be even harder. Does Bob have a solution to the alignment problem? If so, that seems like the dominant consideration, is there a way to make Bob’s solution to the alignment problem available to others? If Bob doesn’t have a solution to the alignment problem, then why expect Bob to be able to steer the AGI?
2. Has Alice considered that this whole setup might be a ruse? As in, there is no credible plan to build AGI or align the AI, and it’s basically for hype, and she’s getting memed into possibly killing herself by a total lie? Perhaps scaring people is part of Bob’s political strategy for maintaining control and taking out people who could block him in some way?
3. What about the decision theory of extortion? Classically, you shouldn’t negotiate with terrorists because being the type of person to pay off terrorists is what gives terrorists an incentive to threaten you in the first place. Maybe Alice gets tortured less overall by not being the type of person to fold this easily with this non-credible of a threat? I mean, if someone were controlled that easy by such a small probability of torture, couldn’t a lot of their actions be determined by a threatening party that would make things worse for them?
4. There are unsolved ethical issues regarding the balance of pain and pleasure. There are optimized negative experiences and optimized positive experiences. Without AGI it’s generally easier for people to create negative experiences than positive ones. But with AGI it’s possible to do both, because the AGI would be so powerful. See Carl Shulman’s critique of negative utilitarianism. If in some possible worlds there are aligned AGIs that create positive experiences for Alice, this could outweigh the negative experiences by other AGIs.
5. To get into weirder theoretical territory, even under the assumption that AGIs can create negative experiences much more efficiently than positive experiences, reducing the total amount of negative experience involves having influence over which AGI is created. Having control of AGI in some possible worlds gives you negotiating power with which you can convince other AGIs (perhaps even in other branches of the multiverse) to not torture you. If you kill yourself you don’t get to have much influence over the eventual AGI that is created, so don’t get to be at the negotiating table, so to speak. You already exist in some possible worlds (multiverse branches etc, depending on physics/philosophy assumptions) so reducing the degree to which you’re tortured to 0 is infeasible, but reducing it is still possible.
6. At some level there’s a decision people have to make about whether life is good or bad. Life is good in some ways and bad in other ways. It’s hard to make abstract arguments about the balance. At some point people have to decide whether they’re in favor of or against life. This is a philosophy problem that goes beyond AGI, that people have been contemplating for a long time.
7. Maybe this is actually a mental health problem? I mean, I’m tempted to say that people who think superhuman AGI is likely to be created in the next 20 years are already crazy, though that’s a popular opinion around here. But most of those people think alignment is unlikely, and so intentional torture scenarios are also correspondingly unlikely. If this is a mental health problem, then usual methods, such as therapy, meditation, and therapeutic drug regimes for depression and so on, might be helpful. Even very risky methods of therapy that could induce psychosis (e.g. certain drugs) are far less risky than killing yourself.