Nobody currently knows how to align strongly superhumanly smart AIs to human interests, and we need way more time to solve this problem. Making incremental progress on AI capabilities is shortening the timeline we have left to figure out how to align AI and is thus making human extinction more likely. Thus by far the best action is to stop advancing AI capabilities.
It seems that not much research is done into studying invariant properties of rapidly self-modifying ecosystems. At least, when I did some search and also asked here a few months ago, not much came up: https://www.lesswrong.com/posts/sDapsTwvcDvoHe7ga/what-is-known-about-invariants-in-self-modifying-systems.
It’s not possible to have a handle on the dynamics of rapidly self-modifying ecosystems without better understanding how to think about properties conserved during self-modification. And ecosystems with rapidly increasing capabilities will be strongly self-modifying.
However, any progress in this direction is likely to be dual-use. Knowing how to think about self-modification invariants is very important for AI existential safety and is also likely to be a strong capability booster.
This is a very typical conundrum for AI existential safety. We can try to push harder to make sure that the research into invariant properties of self-modifying (eco)systems is an active research area again, but the likely side-effect of better understanding properties of potentially fooming systems is making it easier to bring these systems into existence. And we don’t have good understanding of proper ways to handle this kind of situations (although the topic of dual-use is discussed here from time to time).
No, https://chat.lmsys.org/ says this:
So one can choose to know the names of the models one is talking with, but then one’s votes will not be counted for the statistics.