Protecting agent boundaries

Chris Lakin25 Jan 2024 4:13 UTC

11 points

Computer Security & Cryptography Boundaries / Membranes [technical]AI Security Mindset Cyborgism Tool AI Intelligence Amplification

If the preservation of an agent’s boundary is necessary for that agent’s safety, how can that boundary/membrane be protected?

How agent boundaries get violated

In order to protect boundaries, we must first understand how they get violated.

Let’s say there’s a cat, and it gets stabbed by a sword. That’s a boundary violation (a.k.a. membrane piercing). In order for that to have happened, three conditions must have been met:

There was a sword.
The cat and the sword collided.
The cat wasn’t strong enough to resist penetration from the sword.

More generally, in order for any existing membrane to be pierced, three conditions must have all been met:

There was a potential threat. (E.g., a sword, or a person with a sword.)
The moral patient and the threat collided.
The victim failed to adequately defend itself. (Because if the cat was better at self-defense — if its skin was thicker or if it was able to dodge — then it would not have been successfully stabbed.)

Protecting agent boundaries

Each of these three conditions then implies ways of preventing boundary violations (a.k.a. membrane piercing):

1. There was a potential threat.

→ Minimize potential threats

2. There was a collision.

→ Minimize dangerous collisions
- → Predict and prevent collisions before they occur.
- → Prevent collisions by putting distance between threats and moral patients.
- → Prevent premeditated collisions by pre-committing to retribution.

3. The victim failed to defend itself.

→ Empower the membranes of humans and other moral patients to be better at self-defense.

How human societies already try to solve this problem

As a helpful analogy, here’s some examples of how modern human societies try to solve this problem:

Minimize potential threats

Restrict access to weapons (e.g., nukes, bioweapons, etc.)
Minimize potential perpetrators (i.e., e.g., some fictional societies predict and eliminate potential psychopaths).

Minimize dangerous collisions

Protect high-risk individuals, e.g. put them witness protection
Prevent collisions before they occur, e.g. predictive policing, traffic lights.
Police crimes after they occur.

Empower membranes to be better at self-defense

Infosec defense: Use good security practices and strong encryption.
Biological defense: Develop and use beneficial vaccines.
Manipulation defense: Reduce unhelpful cognitive biases and emotional insecurities.

How this applies to AI safety:

Minimize potential AI threats

(this is obvious/boring so I’m omitting it)

Minimize dangerous AI collisions

(this is obvious/boring so I’m omitting it)

Empower membranes to be better at self-defense

Empower the membranes of humans and other moral patients to be more resilient to collisions with threats. Examples:

Manipulation defense: You have an AI assistant that filters potentially-adversarial information for you.
Crime defense: Police have AI assistants that help them predict, deduce, investigate, and prevent crime.
Physical threat defense: (If nanotech works out) You have an AI assistant that shields you from physical threats.
Biological defense: Faster better vaccines, personal antibody printers, etc.
Cybersecurity defense: Good security practices and strong encryption. Software encryption can be arbitrarily strong.
- c.f. writing about this from Foresight Institute: (1), (2), (3)…
Legal defense: personal AI assistants for e.g. interfacing with contracts and the legal system.
Bargaining: personal AI assistants for negotiation.
Human intelligence enhancement
Cyborgism
Mark Miller and Allison Duettmann (Foresight Institute) outline more ideas in the form of “Active Shields” here: 7. DEFEND AGAINST PHYSICAL THREATS | Multipolar Active Shields. Cf Engines of Creation by Eric Drexler.
Related: We have to Upgrade – Jed McCaleb

What links here?

Chris Lakin25 Jan 2024 4:13 UTC

11 points

6 comments2 min readLW link

Computer Security & Cryptography Boundaries / Membranes [technical]AI Security Mindset Cyborgism Tool AI Intelligence Amplification

the gears to ascension 25 Jan 2024 4:42 UTC
3 points
0
solid, but I still think you’re missing structure that makes this approach less effective than it seems on the face:

in full generality, what’s a “threat”?

in full generality, what’s a “dangerous” collision?

I worry that the current failure mode of attempting to empower in order to defend is that the defense is actually used to strike inside another’s boundary, as has been the case for ~all weapons
- Chris Lakin 25 Jan 2024 5:03 UTC
  1 point
  0
  Parent
  in full generality, what’s a “threat”?
  in full generality, what’s a “dangerous” collision?
  Hm I’m not immediately sure how to define these
- Chris Lakin 25 Jan 2024 5:02 UTC
  1 point
  0
  Parent
  is that the defense is actually used to strike inside another’s boundary, as has been the case for ~all weapons
  Yeah, I am worried about this.
  This is notably not the case for infosec and encryption, where defensive capability doesn’t imply offensive capability. However, I’m unsure if this is also true for any physical interventions. (e.g.: Vaccines? No, bioweapons… Nanotech? No…)
  That said, physical interventions do seem to be defense-dominant when there is coordination among a sufficiently large portion of society/power.
  - the gears to ascension 25 Jan 2024 18:11 UTC
    2 points
    0
    Parent
    I don’t think I’m convinced physical interactions are defense dominant. The easiest-to-formally-certify defense is to enclose something in a hunk of impenetrable matter, and that only can be certified up to a given impact energy level. Above that energy level, the defense will simply be stripped away. Only MAD seems able to be game theoretically durable, and certifying that a MAD situation will endure requires proving through a simulation of the opposition.
VojtaKovarik 25 Jan 2024 19:58 UTC
1 point
0
Might be obvious, but perhaps seems worth noting anyway: Ensuring that our boundaries are respected is, at least with a straightforward understanding of “boundaries”, not sufficient for being safe.
For example:
- If I take away all food from your local supermarkets (etc etc), you will die of starvation—but I haven’t done anything with your boundaries.
- On a higher level, you can wipe out humanity without messing with our boundaries, by blocking out the sun.
- Chris Lakin 25 Jan 2024 21:16 UTC
  2 points
  0
  Parent
  Yes, see Agent membranes/boundaries and formalizing “safety” and davidad’s comment.
  (Also, I’m not necessarily agreeing that your examples are not violations of boundaries. First one isn’t a violation of end-person (although probably the farmer). Second one could be.)