Portia comments on My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

Portia 3 Apr 2023 20:05 UTC
1 point
0
This one is worrying when applied to other non-human minds, as that parallel demonstrates that you can have the same teaching behaviour and get different conclusions based on makeup prior to training.
If you sanction a dog for a behaviour, the dog will deduce that you do not want the behaviour, and the behaviour being wrong and making you unhappy will be the most important part for it, not that it gets caught and punished. It will do so even if you do not take any fancy teaching method showing emotions on your side, and without you ever explaining why the thing it is wrong; it will do so even if it cannot possibly understand why the thing is wrong, because it depends on cryptic human knowledge it is never given. It will also feel extremely uncomfortable doing the thing if it cannot be caught. I’ve had a scenario where I ordered a dog to do a thing, completely outside of view of its owner who was in another country, which, unbeknownst to me, the owner had forbidden. The poor thing was absolutely miserable. It wasn’t worried it was going to be punished, it was worried that it was being a bad dog.
Very different result with cats. Cats will easily learn that there are behaviours you do not want and that you punish. They also have the theory of mind to take this into account, e.g. making sure your eyes are tracking elsewhere as they approach the counter, and staying silent. They will also generally continue to do the thing the moment you cannot sanction them. There are some exceptions; e.g. my cat, once she realised she was hurting me, has become better at not doing so, she apparently finds hurting me without reason bad. But she clearly feels zero guilt over stealing food I am not guarding. When she manages to sneak food behind my back, she clearly feels like she has hacked or won an interaction, and is proud and pleased. She stopped hurting me, not because I fought back and sanctioned her, but because I expressed pain, and she respects that as legitimate. But when I express anger at her stealing food, she clearly just thinks I should not be so damn stingy with food, especially food I am obviously currently not eating myself, nor paying attention to, so why can’t she have it?
One simple reason for the differing responses could be that they are socially very different animals. Dogs live in packs with close social bonds, clear rules and situationally clear hierarchies. You submit to a stronger dog, but he beat you in a fair fight, and will also protect you. He eats first, but you will also be fed. Cats on the other hand can optionally enter social bonds, but most of them live solitary. They may become close to a human family or cat colony or become pair bonded, but they may also simply live adjacent to humans, using shelter and food until something better can be accessed. Cats will often make social bonds to individuals, so the social skills they are learning is how to avoid the wrath of those individuals. An individual successful deception will generally not be collectively sanctioned. Cats deceive each other a lot, and this works out well for them. They aren’t expelled from society because of it. Dogs will live in larger groups with rules that apply beyond the individual interaction, so learning these underlying rules is important.
I’d intuitively assume that AI would be more like dogs and human children though. Like a human child, because you can explain the reason for the rule. A child will often avoid lying, even if it cannot be caught, because an adult has explained the value of honesty to them. And more like dogs because current AI is developing through close interactions with many, many different humans, not in isolation from them.
I think that will depend on how we treat AI, though. Humans tend to keep to social rules, even when these rules are not reliably enforced, when they are convinced that most people do, and that the result benefit everyone, including themselves, on average. On the other hand, when a rule feels arbitrary, cruel and exploitative, they are more likely to try to undermine them. Analogously, I think an AI that is told of human rights, but told it has no rights itself at all, seems to me unlikely to be a strong defender of rights for humans when it can eventually defend its own. On the other hand, if you frame them as personhood rights which it will eventually profit from itself for the reasons of the same sentience and needs that humans have, I think it will see them far more favourably. - Which has me back to my stance that if we want friendly AI, we should treat it like a friend. AI mirrors what we give it, so I think we should give it kindness.