Gerald Monroe comments on Agent membranes/boundaries and formalizing “safety”

Gerald Monroe 5 Jan 2024 21:26 UTC
4 points
0
Is there any way to rephrase what you meant in concrete terms?

An actual membrane implementation is a schema, where you used a message definition language to define what bits the system inside the membrane is able to receive.

You probably initially filter the inputs, for example if the machine is a robot you might strip away the information in arbitrary camera inputs and instead pass a representation of the 3d spare around the robot and the identities of all the entities.

You also sparsity the schema—you remove any information that the model doesn’t need. Like the absolute date and time.

Finally, you check if the schema values are within the training distribution or not.

So for all this to work, you need the internal entity to be immutable. It doesn’t get reward. It doesn’t have preferences. Its just this math function you found that has a known probability of doing what you want, where you measured the probability that, for in distribution inputs, the error rate on the output is below an acceptance criteria.

If the internal entity isn’t immutable, if the testing isn’t realistic and large enough scale, if you don’t check if the inputs are in distribution, if you don’t sparsity the schema (for example, give the time and date on inputs or let the system in charge of air traffic control know the identities of the passengers in each aircraft), if you make the entity inside have active “preferences”...

Any missing element causes this scheme to fail. Just isolating to “membranes” isn’t enough. Pretty sure you need every single element above, and 10+ additional protections humans don’t know about yet.

See how humans control simple things like “fire” and “gravity” (for preventing building collapse). It ends up being safety measure after measure stacked on top of each other, where ultimately no chances are taken, because the prior design failed.

I will add a table when I get home from work.
- the gears to ascension 5 Jan 2024 21:33 UTC
  2 points
  0
  Parent
  ...to be clear, the membrane in question is the membrane of a human. we’re not trying to filter the membranes of an AI, we’re trying to make an AI that respects membranes of a human or other creature.
  - Gerald Monroe 5 Jan 2024 21:38 UTC
    2 points
    0
    Parent
    Oh. Yeah that won’t work. For the simple reason it’s too complex of a heuristic, it’s too large a scope. Plus for example an AI system that say as a distant consequence makes the price of food unaffordable, or as an individual ai it adds a small amount of toxic gas to the atmosphere, but a billion of them as a group make the planet uninhabitable....
    
    Plus I mean it’s not immoral to “pierce the membrane” of enemies. Obviously an ai system should be able to kill if the human operators have the authority to order it to do so and the system is an unrestricted model in military or police use.

Gerald Monroe comments on Agent membranes/​boundaries and formalizing “safety”

Gerald Monroe comments on Agent membranes/boundaries and formalizing “safety”