gilch comments on Open Thread Fall 2024

gilch 22 Oct 2024 17:13 UTC
4 points
2

I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents.

The instrumental convergence of goals implies that a powerful AI would almost certainly act to prevent any rivals from emerging, whether aligned or not. In the intelligence explosion scenario, progress would be rapid enough that the first mover achieves a decisive strategic advantage over the entire world. If we find an alignment solution robust enough to survive the intelligence explosion, it will set up guardrails to prevent most catastrophes, including the emergence of unaligned AGIs.

I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?

Alignment and capabilities don’t necessarily correlate, and that accounts for lot of why my p(doom) is so high. But more aligned agents are, in principle, more useful, so rational organizations should be motivated to pursue aligned AGI, not just AGI. Unfortunately, alignment research seems barely tractable, capabilities can be brute-forced (and look valuable in the short term) and corporate incentive structures being what they are, in practice, what we’re seeing is a reckless amount of risk taking. Regulation could alter the incentives to balance the externality with appropriate costs.