tailcalled comments on Boundary Placement Rebellion

tailcalled 21 Jul 2023 18:32 UTC
2 points
0
I could definitely be open to reading your draft. That said, I can’t necessarily guarantee that I have much to comment on for reviews.
As for applications to AI safety, I don’t know. I can definitely buy that you could specify a notion of boundary (or membrane) that is specific enough to define your skin as separating your organs from the outside world.
I’m not sure whether this could be used to prevent the AI from e.g. stabbing you to death. Certainly it could prevent it from immediately stabbing you to death using its immediately available actions, but there’s a lot of methods that could be used to achieve this. However I’m not sure whether it could prevent it from building a Rube Goldberg machine that ends up stabbing you to death. And I strongly doubt it could be used to make the AI proactively double-check whether the tough guy that comes up and promises that he will “get rid of” you will stab you to death.
(Or like, you could probably make it avoid stabbing you to death in all these cases by giving it negative utility for your boundary being violated, but in that case it might also interfere with you getting vaccinated or getting plastic surgery.)
But I strongly doubt this could be used to prevent it from making propaganda to trick you. At least I don’t see any way that could be done. Arguably honesty represents some sort of boundary/membrane, but it’s a much more fuzzy and much weaker one that seems like it would be harder to learn decisively.
- Chipmonk 21 Jul 2023 18:40 UTC
  1 point
  0
  Parent
  could probably make it avoid stabbing you to death in all these cases by giving it negative utility for your boundary being violated
  yeah
  but in that case it might also interfere with you getting vaccinated or getting plastic surgery
  I think this can be solved with some notion of consent
  Maybe ideally, though, «boundaries» don’t have to be respected, they just have to be defended by the organism inside. Maybe everyone has their own AGI to defend their «boundary».
  But I strongly doubt this could be used to prevent it from making propaganda to trick you.
  I think this is fundamentally different than stabbing someone. You have the power to resist propaganda. This is where I disagree with a lot of people on LW: manipulation is not intrinsically unavoidable.^[1]
  1. ^
    There’s an exception if you can literally scan someone’s brain, predict it forward in time, and reverse engineer the outputs that you want. But besides that— besides actions with very high information costs— I think it’s basically not possible (in the same way decrypting an encrypted message is theoretically possible but not practical).
  - tailcalled 21 Jul 2023 18:42 UTC
    2 points
    0
    Parent
    Maybe ideally, though, «boundaries» don’t have to be respected, they just have to be defended by the organism inside. Maybe everyone has their own AGI to defend their «boundary».
    I think this requires relatively low inequality.
    I think this is fundamentally different than stabbing someone. You have the power to resist propaganda. This is where I disagree with a lot of people on LW: manipulation is not intrinsically unavoidable.
    I disagree but it might not be super relevant to resolve here? Idk.
    - Chipmonk 21 Jul 2023 18:45 UTC
      1 point
      0
      Parent
      I think this requires relatively low inequality.
      Yeah, but this has been the case for all of history and probably will forever be.
      But the key point is that «boundaries» are intersubjectively definable, and it’s not just that everyone’s intrinsic self is smearing over each other.
      I disagree but it might not be super relevant to resolve here? Idk.
      Looking forward to your comments on the draft I sent you:)