Vanessa Kosoy comments on Needed: AI infohazard policy

Vanessa Kosoy 22 Sep 2020 9:40 UTC
2 points
0
AF
Hmm, so in this model we assume that (i) the research output of the rest of the world is known (ii) we are deciding about one result only (iii) the thresholds are unknown. In this case you are right that we need to compare our alignment : capability ratio to the rest of the world’s alignment : capability ratio.

Now assume that, instead of just one result overall, you produce a single result every year. Most of the results in the sequence have alignment : capability ratio way above the rest of the world, but then there is a year in which the ratio is only barely above the rest of the world. In this case, you are better off not publishing the irregular result, even though the naive ratio criterion says to publish. We can reconcile it with the previous model by including your own research in the reference, but it creates a somewhat confusing self-reference.

Second, we can switch to modeling the research output of the rest of the world as a random walk. In this case, if the average direction of progress is pointing towards failure, then moving along this direction is net negative, since it reduces the chance to get success by luck.
What links here?
- Vanessa Kosoy's comment on Principles of Privacy for Alignment Research by johnswentworth (29 Jul 2022 8:23 UTC; 4 points)