Unfortunately, my only motivation for staying quiet here is altruism, for others in similar circumstances, how could they be incentivized by the world at large to avoid disclosure of unambiguously harmful ideas?
I don’t think we have a general solution to this problem, but it would be great if we could find one because I believe it to be equivalent to solving the coordination problems we face in avoiding unsafe superintelligent AI (assuming you believe superintelligent AI is unsafe by default, as I do). You might find some insight towards mechanism people have considered by looking at posts here about AI policy related to avoiding and preventing the development of unsafe AI.
I think I have a general solution. It requires altruism, but at the social rather than the individual level.
This concept of a reverse patent office as described is based on the idea that if a toxic meme emerges, it would be more harmful if it had emerged in the past, and that any delay is good.
The reverse patent office accepts encrypted submissions from idea generators, and observes new ideas in the world.
When the reverse patent office observes a harmful idea in existence in the world, and assesses it as worth ‘having been delayed’ based on pre-existing criteria, submitters who previously conceived of the toxic meme submit decryption instructions. Payout is given based on the amount of time that has passed since initial receipt of the encrypted submission, using a function that gives compounding interest as time passes (thus not creating an incentive to ‘harvest’ a harmful idea by disclosing it after submission).
Does this solution work? Who would fund such an organization? What would be its criteria?
I think a serious complication arises in this scenario:
I discover a dangerous idea at age 20, and get a reverse patent on it as described here.
At age 60 I learn I am terminally ill and don’t believe there is any mechanism by which my existence can be carried forward (e.g. cryonics). I am given 1 year to live.
I make the dangerous idea public and collect and large sum for having kept quiet for years so I can enjoy my last year of life, even if the world doesn’t continue much beyond that because it’s destroyed by my dangerous idea.
Maybe some sort of prohibition on collecting if it can be proven that you chose to publicize it would be a good idea.
I argue that in your scenario, the reverse patent is unambiguously a good thing, as it bought us 40 years.
The same ugly incentives apply to pollution, ‘I can get money now, or leave an unpolluted world to posterity’, you don’t need too many people to think that way to get some severe harm.
Thank you! I was hoping that someone was aware of some clever solution to this problem.
I believe that AI is at least as inherently unsafe as HI, ‘Human Intelligence’. I do think that our track record with managing the dangers of HI is pretty good, in that we are still here, which gives me hope for AI safety.
I wonder how humans would react to a superintelligent AI stating ‘I have developed an idea harmful to humans, and I am incentivized to publicize that idea. I don’t want to do harm to humans, can you please take a look at my incentives and tell me if my read of them is correct? I’ll stick with inaction until the analysis is complete.’
Is that a best-case scenario for a friendly superintelligence?
I don’t think we have a general solution to this problem, but it would be great if we could find one because I believe it to be equivalent to solving the coordination problems we face in avoiding unsafe superintelligent AI (assuming you believe superintelligent AI is unsafe by default, as I do). You might find some insight towards mechanism people have considered by looking at posts here about AI policy related to avoiding and preventing the development of unsafe AI.
I think I have a general solution. It requires altruism, but at the social rather than the individual level.
This concept of a reverse patent office as described is based on the idea that if a toxic meme emerges, it would be more harmful if it had emerged in the past, and that any delay is good.
The reverse patent office accepts encrypted submissions from idea generators, and observes new ideas in the world.
When the reverse patent office observes a harmful idea in existence in the world, and assesses it as worth ‘having been delayed’ based on pre-existing criteria, submitters who previously conceived of the toxic meme submit decryption instructions. Payout is given based on the amount of time that has passed since initial receipt of the encrypted submission, using a function that gives compounding interest as time passes (thus not creating an incentive to ‘harvest’ a harmful idea by disclosing it after submission).
Does this solution work? Who would fund such an organization? What would be its criteria?
I think a serious complication arises in this scenario:
I discover a dangerous idea at age 20, and get a reverse patent on it as described here.
At age 60 I learn I am terminally ill and don’t believe there is any mechanism by which my existence can be carried forward (e.g. cryonics). I am given 1 year to live.
I make the dangerous idea public and collect and large sum for having kept quiet for years so I can enjoy my last year of life, even if the world doesn’t continue much beyond that because it’s destroyed by my dangerous idea.
Maybe some sort of prohibition on collecting if it can be proven that you chose to publicize it would be a good idea.
I argue that in your scenario, the reverse patent is unambiguously a good thing, as it bought us 40 years.
The same ugly incentives apply to pollution, ‘I can get money now, or leave an unpolluted world to posterity’, you don’t need too many people to think that way to get some severe harm.
Thank you! I was hoping that someone was aware of some clever solution to this problem.
I believe that AI is at least as inherently unsafe as HI, ‘Human Intelligence’. I do think that our track record with managing the dangers of HI is pretty good, in that we are still here, which gives me hope for AI safety.
I wonder how humans would react to a superintelligent AI stating ‘I have developed an idea harmful to humans, and I am incentivized to publicize that idea. I don’t want to do harm to humans, can you please take a look at my incentives and tell me if my read of them is correct? I’ll stick with inaction until the analysis is complete.’
Is that a best-case scenario for a friendly superintelligence?