sudo comments on Proposed Alignment Technique: OSNR (Output Sanitization via Noising and Reconstruction) for Safer Usage of Potentially Misaligned AGI

sudo May 29, 2023, 2:15 AM
2 points
1
“Most paths lead to bad outcomes” is not quite right. For most (let’s say human developed, but not a crux) plan specification languages, most syntactically valid plans in that language would not substantially mutate the world state when executed.
I’ll begin by noting that over the course of writing this post, the brittleness of treacherous plans became significantly less central.
However, I’m still reasonably convinced that the intuition is sound. If a plan is adversarial to humans, the plan’s executor will face adverse optimization pressure from humans and adverse optimization pressure complicates error correction.
Consider the case of a sniper with a gun that is loaded with 50% blanks and 50% lethal bullets (such that the ordering of the blanks and lethals are unknown to the sniper). Let’s say his goal is to kill a person on the enemy team.
If the sniper is shooting at an enemy team equipped with counter-snipers, he is highly unlikely to succeed (<50%). In fact, he is quite likely to die.
Without the counter-snipers, the fact that his gun is loaded with 50% blanks suddenly becomes less material. He could always just take another shot.
I claim that our world resembles the world with counter-snipers. The counter-snipers in the real world are humans who do not want to be permanently disempowered.
- Shmi May 29, 2023, 4:52 AM
  2 points
  0
  Parent
  If a plan is adversarial to humans, the plan’s executor will face adverse optimization pressure from humans and adverse optimization pressure complicates error correction.
  I can see that working when the entity is at the human level of intelligence or less. Maybe I misunderstand the setup, and this is indeed the case. I can’t imagine that it would work on a superintelligence...
  - sudo May 29, 2023, 5:00 AM
    1 point
    0
    Parent
    Is your claim that the noise borne asymmetric pressure away from treacherous plans disappears in above-human intelligences? I could see it becoming less material as intelligence increases, but the intuition should still hold in principle.
    - Shmi May 29, 2023, 5:14 AM
      2 points
      0
      Parent
      I am not confidently claiming anything, not really an expert… But yeah, I guess I like the way you phrased it. The more disparity there is in intelligence, the less extra noise matters. I do not have a good model of it though. Just feels like more and more disparate dangerous paths appear in this case, overwhelming the noise.
      - sudo May 29, 2023, 5:45 AM
        1 point
        0
        Parent
        Fair enough! For what it’s worth, I think the reconstruction is probably the more load-bearing part of the proposal.