Is this ‘alignment’ a natural thing you can get easily or even by default, that is essentially a normal engineering problem, or is it a highly unnatural outcome where security mindset and bulletproof approaches as yet unfound even in principle are required, with any flaws are exploited, amplified and fatal, and many lethal problems all of which one must avoid?
To answer these questions specifically, it’s really important not just to consider AI—human alignment “in the abstract”, but embedded in the current civilisation, with its infrastructure and incentives structures. As I wrote here:
[...] we should address this strategic concern by rewiring the economic and action landscapes (which also interacts with the “game-theoretic, mechanism-design” alignment paradigm mentioned above). The current (internet) infrastructure and economic systems are not prepared for the emergence of powerful adversarial agents at all:
There are no systems of trust and authenticity verification at the root of internet communication (see https://trustoverip.org/)
The storage of information is centralised enormously (primarily in the data centres of BigCos such as Google, Meta, etc.)
Money has no trace, so one may earn money in arbitrary malicious or unlawful ways (i.e., gain instrumental power) and then use it to acquire resources from respectable places, e.g., paying for ML training compute at AWS or Azure and purchasing data from data providers. Formal regulations such as compute governance and data governance and human-based KYC procedures can only go so far and could probably be social-engineered by a superhuman imposter or persuader AI.
In essence, we want to design civilisational cooperation systems such that being aligned is a competitive advantage. Cf. “The Gaia Attractor” by Rafael Kaufmann.
This is a very ambitious program to rewire the entirety of the internet, other infrastructure, and the economy, but I believe this must be done anyway, just expecting a “miracle” HRAD invention to be sufficient without fixing the infrastructure and system design layers doesn’t sound like a good strategy. By the way, such infrastructure and economy rewiring is the real “pivotal act”.
If we imagined that the world had a “right” kind of infrastructure and social structure (really decentralised, trust-first), probably alignment would be much more of an “ordinary engineering” problem. With the current economic and infrastructural vulnerabilities mentioned, however, the alignment becomes a much higher-stakes problem, requiring more of “bulletproof” solutions “on the first try”, I think.
To answer these questions specifically, it’s really important not just to consider AI—human alignment “in the abstract”, but embedded in the current civilisation, with its infrastructure and incentives structures. As I wrote here:
If we imagined that the world had a “right” kind of infrastructure and social structure (really decentralised, trust-first), probably alignment would be much more of an “ordinary engineering” problem. With the current economic and infrastructural vulnerabilities mentioned, however, the alignment becomes a much higher-stakes problem, requiring more of “bulletproof” solutions “on the first try”, I think.