My Mental Model of Infohazards
In this post, I give my rules of thumb for dealing with infohazards, which are as far as I can tell wildly divergent from the accepted views of the larger alignment community. I wrote this post less as a means of propagating my own views, and more because I’m curious as to why my views are so uncommon.
My Infohazard Rules of Thumb
Two people can keep a secret if one of them is dead (seems pretty uncontroversial)
Nothing can be alllll that dangerous if it’s known to literally everyone how it works (this is probably the crux where I differ wildly from the larger alignment community, if I had to guess)
By the first rule, the only stable numbers of people that can know a secret are zero, one, and literally everyone. (Seems like a pretty straightforward inference, although there’s a difference between “permanently stable” and “stable for a long time” that I would guess is probably more salient to other people than it is to me.)
By the second rule, having a secret known to n > 1 people but not literally everyone is basically just a power play that doesn’t make anyone safer. It’s just a question of people trying to secure a first mover advantage so that they have more bananas and get to sit higher on the tree and boss the other monkeys around once we inevitably get to the stable state that the secret is known to everyone.
Therefore, by the first four rules, if you know a supposed infohazard, and you know or strongly suspect that it is known to at least one other person, then you are morally obligated to broadcast the infohazard after some sort of responsible disclosure process with your local government.
Anyone following the above rules of thumb will thus act in the opposite manner to anyone who buys into the traditional infohazard framework, making them incredibly difficult to coordinate with. Let’s call the people who follow the traditional infohazard model, infohoarders, and let’s call the people who follow the mental models I am describing here infopandoras.
How can a community work best if it is a mixture of infohoarders and infopandoras? Clearly the infohoarders will want to keep all infohazards away from any infopandora; just as clearly, infopandoras will not declare themselves to be such if they want to attain any position of influence in a community dominated by infohoarders. This seems like a recipe for some seriously unproductive dynamics in trying to solve coordination problems, which are always frequent in life, and generally the hardest problems for humans to solve.
It also seems possible to me that infohoarders will naturally and inevitably dominate any community that values knowledge for its own sake. Infohoarders are like black holes for infohazards, always growing larger and more hazardous over time. Infopandoras are something else, maybe like a supernova, creating one huge blast of infohazard which then settles down to become a new galaxy or what have you after the chaos subsides.
Acknowledgments
I kind of expect this post to be wildly unpopular, just because of how wildly the models presented diverge from the accepted wisdom in the alignment community. So I struggled with the question of whether to cite the people who influenced this post. Ultimately, since I identify as an infopandora, I thought I would just bite the bullet and cite my main influence, which is @jessicata. I’ve never met her and she doesn’t know I exist, but a lot of this mental framework is derived from reading her public writings and just kind of pondering the related questions for a few years.
- Stupid Question: Why am I getting consistently downvoted? by 30 Nov 2023 0:21 UTC; 28 points) (
- Stupid Question: Why am I getting consistently downvoted? by 30 Nov 2023 0:21 UTC; 28 points) (
- Stupid Question: Why am I getting consistently downvoted? by 30 Nov 2023 0:21 UTC; 28 points) (
- Should you publish solutions to corrigibility? by 30 Jan 2025 11:52 UTC; 13 points) (
A dissenting voice on info-hazards. I appreciate the bulleted list starting of premises and building towards conclusions. Unfortunately I don’t think all the reasoning holds up to close scrutiny. For example, the conclusion that “infohoarders are like black holes for infohazards” conflicts with the premise that “two people can keep a secret if one of them is dead”. The post would have been stronger if it had stopped before getting into community dynamics.
Still, this post moved and clarified my thinking. My sketch at a better argument for a similar conclusion is below:
Definitions:
hard-info-hazard: information that reliably causes catastrophe, no mitigation possible.
soft-info-hazard: information that risks catastrophe, but can be mitigated.
Premises:
Two people can keep a secret if one of them is dead.
If there are hard-info-hazards then we are already extinct, we just don’t know it.
You, by yourself, are not smart enough to tell if an info-hazard is hard or soft.
Authorities with the power to mitigate info-hazards are not aligned with your values.
Possible strategies on discovering an infohazard:
Tell nobody.
Tell everybody.
Follow a responsible disclosure process.
Expected Value calculations left as an exercise for the reader, but responsible disclosure seems favored. The main exception is if we are in Civilizational Hospice where we know we are going extinct in the next decade anyway and are just trying to live our last few years in peace.