Thank you for writing this post; I had been struggling with these considerations a while back. I investigated going full paranoid mode but in the end mostly decided against it.
I agree theoretical insight on agency and intelligence have a real chance of leiding to capability gains. I agree on the government spy threat model as being unlikely. I would like to add however that if say MIRI builds a safe AGI prototype—perhaps based on different principles than systems used by adversaries it might make sense for an (ai-assisted) adversary to trawl through your old blogposts.
Byrnes has already mentioned the distinction between pioneers and median researchers. Another aspect that your threat models don’t capture is: research that builds on your research. Your research may end up in a very long chain of theoretical research only a minority of which you have contributed. Or the spirit if not the letter of your ideas may percolate through the research community. Additionally, the alignment field will almost certainly become very much larger raising both the status of John and the alignment field in general. Over longer timescales I expect percolation to be quite strong.
Even if approximately nobody reads your or know of your works the insights may very well become massively signalboosted by other alignment researchers (once again I expect the community to explode in size within a decade) and thereby end up in a flashy demo.
All-in-all these and other considerations let me to the conclusion that this danger is very real. That is there is a significant minority of possible worlds in which early alignment researchers tragically contribute to DOOM.
However, I still think on the whole most alignment researchers should work in the open. Any solution to alignment will most likely come from a group (albeit-small) of people. Working privately massively hampers collaboration. It makes the community look weird and makes it way harder to recruit good people. Also, for most researchers it is difficult to support themselves financially if they can’t show their work. As by far the most likely doom scenario is some company/government simply building AGI without sufficient safeguards because either there is no alignment solution or they are simply unaware of it/it ote it I conclude that the best policy in expected value is to work mostly publicly*.
*Ofc if there is a clear path to capability gain keeping it secret might be the best.
EDIT: Cochran has a comical suggestion
Georgy Flerov was a young nuclear physicist in the Soviet Union who ( in 1943) sent a letter to Stalin advocating an atomic bomb project. It is not clear that Stalin read that letter, but one of Flerov’s arguments was particularly interesting: he pointed out the abrupt and complete silence on the subject of nuclear fission in the scientific literature of the US, UK, and Germany – previously an extremely hot topic.
Stopping publications on atomic energy ( which happened in April 1940) was a voluntary effort by American and British physicists. But that cessation was itself a signal that something strategically important was going on.
Imagine another important discovery with important strategic implications: how would you maximize your advantage ?
Probably this is only practically possible if your side alone has made the discovery. If the US and the UK had continued publishing watered-down nuclear research, the paper stoppage in Germany would still have given away the game. But suppose, for the moment, that you have a monopoly on the information. Suddenly stopping closely related publications obviously doesn’t work. What do you do?
You have to continue publications, but they must stop being useful. You have to have the same names at the top ( an abrupt personnel switch would also be a giveaway) but the useful content must slide to zero. You could employ people that A. can sound like the previous real authors and B. are good at faking boring trash. Or, possibly, hire people who are genuinely mediocre and don’t have to fake it.
Maybe you can distract your rivals with a different, totally fake but extremely exciting semiplausible breakthrough.
Or – an accidental example of a very effective approach to suppression. Once upon a time, around 1940, some researchers began to suspect that duodenal ulcers were caused by a spiral bacterium. Some physicians were even using early antibiotics against them, which seemed to work. Others thought what they were seeing might be postmortem contamination. A famous pathologist offered to settle the issue.
He looked, didn’t see anything, and the hypothesis was buried for 40 years.
But he was wrong: he had used the wrong stains.
So, a new (?) intelligence tactic for hiding strategic breakthroughs: the magisterial review article.
Thank you for writing this post; I had been struggling with these considerations a while back. I investigated going full paranoid mode but in the end mostly decided against it.
I agree theoretical insight on agency and intelligence have a real chance of leiding to capability gains. I agree on the government spy threat model as being unlikely. I would like to add however that if say MIRI builds a safe AGI prototype—perhaps based on different principles than systems used by adversaries it might make sense for an (ai-assisted) adversary to trawl through your old blogposts.
Byrnes has already mentioned the distinction between pioneers and median researchers. Another aspect that your threat models don’t capture is: research that builds on your research. Your research may end up in a very long chain of theoretical research only a minority of which you have contributed. Or the spirit if not the letter of your ideas may percolate through the research community. Additionally, the alignment field will almost certainly become very much larger raising both the status of John and the alignment field in general. Over longer timescales I expect percolation to be quite strong.
Even if approximately nobody reads your or know of your works the insights may very well become massively signalboosted by other alignment researchers (once again I expect the community to explode in size within a decade) and thereby end up in a flashy demo.
All-in-all these and other considerations let me to the conclusion that this danger is very real. That is there is a significant minority of possible worlds in which early alignment researchers tragically contribute to DOOM.
However, I still think on the whole most alignment researchers should work in the open. Any solution to alignment will most likely come from a group (albeit-small) of people. Working privately massively hampers collaboration. It makes the community look weird and makes it way harder to recruit good people. Also, for most researchers it is difficult to support themselves financially if they can’t show their work. As by far the most likely doom scenario is some company/government simply building AGI without sufficient safeguards because either there is no alignment solution or they are simply unaware of it/it ote it I conclude that the best policy in expected value is to work mostly publicly*.
*Ofc if there is a clear path to capability gain keeping it secret might be the best.
EDIT: Cochran has a comical suggestion