Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

People often attack frontier AI labs for “hypocrisy” when the labs admit publicly that AI is an extinction threat to humanity. Often these attacks ignore the difference between various kinds of hypocrisy, some of which are good, including what I’ll call “reformative hypocrisy”. Attacking good kinds of hypocrisy can be actively harmful for humanity’s ability to survive, and as far as I can tell we (humans) usually shouldn’t do that when our survival is on the line. Arguably, reformative hypocrisy shouldn’t even be called hypocrisy, due to the negative connotations of “hypocrisy”. That said, bad forms of hypocrisy can be disguised as the reformative kind for long periods, so it’s important to pay enough attention to hypocrisy to actually figure out what kind it is.

Here’s what I mean, by way of examples:

***

0. No Hypocrisy —
Lab: “Building AGI without regulation shouldn’t be allowed. Since there’s no AGI regulation, I’m not going to build AGI.”
Meanwhile, the lab doesn’t build AGI. This is a case of honest behavior, and what many would consider very high integrity. However, it’s not obviously better, and arguably sometimes worse, than...

1. Reformative Hypocrisy:
Lab: “Absent adequate regulation for it, building AGI shouldn’t be allowed at all, and right now there is no adequate regulation for it. Anyway, I’m building AGI, and calling for regulation, and making lots of money as I go, which helps me prove the point that AGI is powerful and needs to be regulated.”
Meanwhile, the lab builds AGI and calls for regulation. So, this is a case of honest hypocrisy. I think this is straightforwardly better than...

2. Erosive Hypocrisy:
Lab: “Building AGI without regulation shouldn’t be allowed, but it is, so I’m going to build it anyway and see how that goes; the regulatory approach to safety is hopeless.”
Meanwhile, the lab builds AGI and doesn’t otherwise put efforts into supporting regulation. This could also be a case of honest hypocrisy, but it erodes the norm that AGI should regulated rather than supporting it.

Some even worse forms of hypocrisy include...

3. Dishonest Hypocrisy, which comes in at least two importantly distinct flavors:

a) feigning abstinence:
Lab: “AGI shouldn’t be allowed.”
Meanwhile, the lab secretly builds AGI, contrary to what one might otherwise guess according to their stance that building AGI is maybe a bad thing, from a should-it-be-allowed perspective.

b) feigning opposition:
Lab: “AGI should be regulated.”
Meanwhile, the lab overtly builds AGI, while covertly trying to confuse and subvert regulatory efforts wherever possible.

***

It’s important to remain aware that reformative hypocrisy can be on net a better thing to do for the world than avoiding hypocrisy completely. It allows you to divert resources from the thing you think should be stopped, and to use those resources to help stop the thing. For mathy people, I’d say this is a way of diagonalizing against a potentially harmful thing, by turning the thing against itself, or against the harmful aspects of itself. For life sciencey people, I’d say this is how homeostasis is preserved, through negative feedback loops whereby bad stuff feeds mechanisms that reduce the bad stuff.

Of course, a strategy of feigning opposition (3a) can disguise itself as reformative hypocrisy, so it can be hard to distinguish the two. For example, if a lab says for long time that they’re going to admit their hypocritical stance, and then never actually does, then it turns out to be dishonest hypocrisy. On the other hand, if the dishonesty ever does finally end in a way that honestly calls for reform, it’s good to reward the honest and reformative aspects of their behavior. Note also that, it’s not reformative, even honest hypocrisy can erode positive norms as in (2), by overtly denegrating the idea of even establishing norms. So the key distinction is not just to avoid supporting dishonesty, but to specifically reward honesty that takes action in support of broader reform.

In summary, what I’m suggesting is to pay close attention to the three different kinds of hypocrisy above, and close enough attention to actually distinguish between them and treat them separately, without being fooled as to which one is which. This can be a lot of work, but it’s important work that is necessary to create the right incentives when you are in the habit of criticizing people for hypocrisy. The key is to make sure that all hypocrisy is sufficiently actively reformative. Otherwise, it’s not part of a homeostatic loop, and hence not a positive contribution to a working survival strategy when the stakes are existential.

That’s all for now. Happy Tuesday :)