www.jimbuhler.site
Jim Buhler
Interesting, thanks!
I guess one could object that in you’re even more clueless sniper example, applying the POI between Hit and Not Hit is just as arbitrary as applying it between, e.g., Hit, Hit on his right, and Hit on his left. This is what Greaves (2016) -- and maybe others? -- called the “problem of multiple partitions”. In my original scenario, people might argue that there isn’t such a problem and that there is only one sensible way to apply POI. So it’d be ok to apply it in my case and not in yours.
I don’t know what to make of this objection, though. I’m not sure it makes sense. It feels a bit arbitrary to say “we can apply POI but only when there is one way of applying it that clearly seems more sensible”. Maybe this problem of multiple partitions is a reason to reject POI altogether (in situations of what Greaves call “complex cluelessness” at least, like in my sniper example).
Yeah, I guess I meant something like “aim as if there were no external factors other than gravity”.
Do you agree with AnthonyC’s view that the bullet’s perturbations are well-modeled by a random walk? If so, maybe I’ll simulate it if I have time and report back—but only makes sense to do that if you agree that the random walk model is appropriate in the first place.
Oh yeah, good question. I’m not sure because random walk models are chaotic and seem to model situations of what Greaves (2016) calls “simple cluelessness”. Here, we’re in a case she would call “complex”. There are systematic reasons to believe the bullet will go right (e.g., the Earth’s rotation, say) and systematic reasons to believe it will go left (e.g., the wind that we see blowing left). The problem is not that it is random/chaotic, but that we are incapable of weighing up the evidence for left vs the evidence for right, incapable to the point where we cannot update away from a radically agnostic prior on whether the bullet will hit the target or the kid.
My answer: because strictly monotonic[1] probability distribution prior to accounting for external factors
Ok so that’s defo what I think assuming no external factors, yes. But if I know that there are external factors, I know the bullet will deviate for sure. I don’t know where but I know it will. And it might luckily deviate a bit back and forth and come back exactly where I aimed, but I don’t get how I can rationally believe that’s any more likely than it doing something else and landing 10 centimeters more on the right. And I feel like what everyone in the comments so far is saying is basically “Well, POI!”, taking it for granted/self-obvious, but afaict, no one has actually justified why we should use POI rather than simply remain radically agnostic on whether the bullet is more likely to hit the target than the kid. I feel like your intuition pump, for example, is implicitly assuming POI and is sort of justifying POI with POI.
Interesting, thanks. My intuition is that if you draw a circle of say a dozen (?) meters around the target, there’s no spot within that circle that is more or less likely to be hit than any other, and it’s only outside the circle than you start having something like a normal distribution. I really don’t see why I should think the 35 centimeters on the target’s right is any more (or less) likely than 42 centimeters on his left. Can you think of any good reason why I should think that? (Not saying my intuition is better than yours. I just want to get where I’m wrong if I am.)
I’m just interested in the POI thing, yeah.
At some sufficiently far distance, it is essentially landing in a random spot in a normal distribution around the intended target
Say I tell you the bullet landed either 35 centimeters on the target’s right or 42 centimeters on his left, and ask you to bet on which one you think it is. Are you indifferent/agnostic or do you favor 35 very (very very very very) slightly? (If the former, you reject the POI. If the latter, you embrace it. Or at least that’s my understanding. If you don’t find it more likely the bullet hits a spot a bit closer to the target, than you don’t agree with the superior that aiming at the target makes you more likely to hit him over the child, all else equal.)
The Clueless Sniper and the Principle of Indifference
Without an objective standard of “winning” to turn to, this leaves us searching for new principles that could guide us in the face of indeterminacy. But that’s all for another post.
First time ever I am left hanging by a LW post. Genuinely.
Thanks! I guess myopia is a specific example of one form of scope-insensitivity (which has to do with longterm thinking, according to this at least), yes.
> This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don’t yet know how to instill them in a system via ML training.
I didn’t follow discussions around myopia and didn’t have this context (e.g., I thought maybe people didn’t find myopia promising at all to begin with or something) so thanks a lot. That’s very helpful.
GPT-4′s answer fwiw (with the extra prompt “please, respond as if you were the most brilliant logntermist AI safety researcher”):
When considering the development of artificial general intelligence (AGI) and the implications for humanity, a key concern is ensuring that AGIs act in ways that are beneficial rather than harmful. One proposal to address this concern is to make AGIs scope-insensitive. To analyze whether this approach could make AGIs less likely to harm humanity, we’ll first define and then discuss the implications of scope-insensitivity within AGI goals.Scope-Insensitivity and AGI
Scope-insensitivity in this context refers to goals that do not excessively amplify their aims with scale. For instance, a scope-insensitive AGI designed to reduce suffering might not distinguish much between preventing a minor inconvenience for millions of people and alleviating a major disaster for a few people. These AGIs would have a form of goal-setting where small-scale improvements and large-scale interventions do not differ significantly in perceived value, avoiding the classic trap of prioritizing extreme versions of their objectives at the cost of human well-being.
How Scope-Insensitivity Could Disincentivize Harm
Stability in Goal Pursuit: Scope-insensitive AGIs are less likely to pursue extreme actions to achieve their goals. For example:
An AGI instructed to “reduce suffering” might improve local conditions incrementally rather than attempting a global overhaul that could incapacitate human structures.
This averts scenarios where AGIs conclude that radical interventions (like controlling or incapacitating humans) are necessary for achieving massive-scale benefits.
Reduced Singular Focus on Extreme Outcomes:
Scope-insensitive AGIs would distribute their efforts more evenly rather than focusing singularly on high-leverage interventions which might involve control or coercion of humans.
They would value smaller, manageable improvements, which inherently reduces the likelihood of executing grand, disruptive plans that could incapacitate human society.
Compatibility with Human Incremental Evolution:
Humans generally make progress through incremental changes. Scope-insensitive AGIs, focusing on steady, small-scale improvements, would align more closely with the human method of development.
This harmony reduces friction and conflict, promoting cooperation over coercion or incapacitation.
Embedded Humble Goals:
Training AGIs with humble, scope-insensitive goals inherently de-emphasizes grandiosity.
It promotes modest, beneficial interventions over transformative ambitions that can disrupt human autonomy and infrastructure.
Potential Counterpoints
Risk of Under-Attenuation:
If too scope-insensitive, AGIs might lack the drive to address large-scale issues that genuinely need extensive efforts, potentially making them less effective in critical situations.
Complexity of True Scope-Insensitivity:
Defining and implementing scope-insensitive goals might be complex, especially ensuring they do not covertly evolve into over-reaching plans.
Alignment Challenges:
While scope-insensitivity might curb extreme actions, it doesn’t inherently solve alignment. An AGI’s methods to achieve modest goals still need alignment to avoid unintended harm.
Conclusion
Training AGIs to have scope-insensitive or modest goals could significantly reduce the likelihood of them incapacitating humanity, as their approach would inherently avoid extreme, radical solutions. Instead, these AGIs would favor incremental and potentially more cooperative strategies. Nonetheless, comprehensive safeguards and iterative monitoring are essential to ensure these systems remain aligned and truly beneficial in all scales of their actions. Integrating scope-insensitivity with broader alignment frameworks might offer a robust path forward in AGI safety research, ensuring that AGIs act in ways that uphold and enhance human values and well-being.
[Question] Would a scope-insensitive AGI be less likely to incapacitate humanity?
Some relevant resources I found:
- On the Value of Advancing Progress
- How useful is “progress”?
- On Progress and Prosperity
Interesting points, thanks!
> The problem is that the public correctly perceives that economic growth and technological progress make the average life better, so it is hard to get political support for any measures to slow them down.
I mean, if we think these things are actually bad overall (which I’m not convinced of but maybe), we could at least avoid doing things that directly or indirectly promote or generate more economic growth for example. There are some very low-hanging fruits.
Thanks, that’s fair! Such a threshold exists if and only if you assume—
non-zero AI research (which is the scenario we’re interested in here I guess),
- technological progress correlates with AI progress (which as you say is not guaranteed but that still seems very likely to me),
- maybe a few other crucial things I implicitly assume without realizing.
[Question] How bad would AI progress need to be for us to think general technological progress is also bad?
I don’t know and this is outside the scope of this post I guess. There are a few organizations like the Center on Long-Term Risk studying cooperation and conflict between ASIs, however.
Interesting, thanks! This is relevant to question #2 in the post! Not sure everyone should act as if they were the first considering the downsides of interciv conflicts, but yeah, that’s a good point.
Yup! (a related comment fwiw).
Well, not if you factor in other factors that might luckily exactly compensate for the Coriolis effect (e.g., the wind). But yeah, considering that it’s a Gaussian distribution where the top is “target hit” (rather than “kid hit” or “rock over there hit”) just because that’s where you happen to be aiming (ignoring the Coriolis effect, the wind and all) seems very suspiciously convenient.