[Question] Would a scope-insensitive AGI be less likely to incapacitate humanity?

Jim Buhler21 Jul 2024 14:15 UTC

2 points

AI AI Control Existential risk AI Risk Space Exploration & Colonization

I was listening to Anders Sandberg talk about “humble futures” (i.e., futures that may be considered good in a sober non-”let’s tile the universe with X” way), and started wondering whether training (not yet proven safe) AIs to have such “humble” scope-insensitive-ish goals—which seems more tractable than (complete) value alignment—might disincentivize the AI from incapacitating humans?

Why would it disincentivize it this way? I have some ideas but I thought I wouldn’t flesh them out here to make sure people don’t anchor on the particular scenarios I have in mind.

Here’s an AI-generated image of a scope-insensitive AI chilling with a cup of tea to help you think:

Jim Buhler21 Jul 2024 14:15 UTC

2 points

3 comments1 min readLW link

AI AI Control Existential risk AI Risk Space Exploration & Colonization

Jim Buhler 21 Jul 2024 16:30 UTC
0 points
−2
GPT-4′s answer fwiw (with the extra prompt “please, respond as if you were the most brilliant logntermist AI safety researcher”):

When considering the development of artificial general intelligence (AGI) and the implications for humanity, a key concern is ensuring that AGIs act in ways that are beneficial rather than harmful. One proposal to address this concern is to make AGIs scope-insensitive. To analyze whether this approach could make AGIs less likely to harm humanity, we’ll first define and then discuss the implications of scope-insensitivity within AGI goals.
Scope-Insensitivity and AGI
Scope-insensitivity in this context refers to goals that do not excessively amplify their aims with scale. For instance, a scope-insensitive AGI designed to reduce suffering might not distinguish much between preventing a minor inconvenience for millions of people and alleviating a major disaster for a few people. These AGIs would have a form of goal-setting where small-scale improvements and large-scale interventions do not differ significantly in perceived value, avoiding the classic trap of prioritizing extreme versions of their objectives at the cost of human well-being.
How Scope-Insensitivity Could Disincentivize Harm
1. Stability in Goal Pursuit: Scope-insensitive AGIs are less likely to pursue extreme actions to achieve their goals. For example:
  - An AGI instructed to “reduce suffering” might improve local conditions incrementally rather than attempting a global overhaul that could incapacitate human structures.
  - This averts scenarios where AGIs conclude that radical interventions (like controlling or incapacitating humans) are necessary for achieving massive-scale benefits.
2. Reduced Singular Focus on Extreme Outcomes:
  - Scope-insensitive AGIs would distribute their efforts more evenly rather than focusing singularly on high-leverage interventions which might involve control or coercion of humans.
  - They would value smaller, manageable improvements, which inherently reduces the likelihood of executing grand, disruptive plans that could incapacitate human society.
3. Compatibility with Human Incremental Evolution:
  - Humans generally make progress through incremental changes. Scope-insensitive AGIs, focusing on steady, small-scale improvements, would align more closely with the human method of development.
  - This harmony reduces friction and conflict, promoting cooperation over coercion or incapacitation.
4. Embedded Humble Goals:
  - Training AGIs with humble, scope-insensitive goals inherently de-emphasizes grandiosity.
  - It promotes modest, beneficial interventions over transformative ambitions that can disrupt human autonomy and infrastructure.
Potential Counterpoints
1. Risk of Under-Attenuation:
  - If too scope-insensitive, AGIs might lack the drive to address large-scale issues that genuinely need extensive efforts, potentially making them less effective in critical situations.
2. Complexity of True Scope-Insensitivity:
  - Defining and implementing scope-insensitive goals might be complex, especially ensuring they do not covertly evolve into over-reaching plans.
3. Alignment Challenges:
  - While scope-insensitivity might curb extreme actions, it doesn’t inherently solve alignment. An AGI’s methods to achieve modest goals still need alignment to avoid unintended harm.
Conclusion
Training AGIs to have scope-insensitive or modest goals could significantly reduce the likelihood of them incapacitating humanity, as their approach would inherently avoid extreme, radical solutions. Instead, these AGIs would favor incremental and potentially more cooperative strategies. Nonetheless, comprehensive safeguards and iterative monitoring are essential to ensure these systems remain aligned and truly beneficial in all scales of their actions. Integrating scope-insensitivity with broader alignment frameworks might offer a robust path forward in AGI safety research, ensuring that AGIs act in ways that uphold and enhance human values and well-being.

Eli Tyre 21 Jul 2024 20:09 UTC
5 points
0
I think the key word you want to search for is “Myopia”. This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don’t yet know how to instill them in a system via ML training.
- Jim Buhler 22 Jul 2024 11:06 UTC
  1 point
  0
  Parent
  Thanks! I guess myopia is a specific example of one form of scope-insensitivity (which has to do with longterm thinking, according to this at least), yes.
  
  > This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don’t yet know how to instill them in a system via ML training.
  
  I didn’t follow discussions around myopia and didn’t have this context (e.g., I thought maybe people didn’t find myopia promising at all to begin with or something) so thanks a lot. That’s very helpful.

[Question] Would a scope-insensitive AGI be less likely to incapacitate humanity?

Scope-Insensitivity and AGI

How Scope-Insensitivity Could Disincentivize Harm

Potential Counterpoints

Conclusion