Steven Byrnes comments on The Alignment Trap: AI Safety as Path to Power

Steven Byrnes 30 Oct 2024 14:05 UTC
3 points
0
When we develop mechanisms to control AI systems, we are essentially creating tools that could be used by any sufficiently powerful entity—whether that’s a government, corporation, or other organization. The very features that make an AI system “safe” in terms of human control could make it a more effective instrument of power consolidation.
…And if we fail to develop such mechanisms, AI systems will still be an “instrument of power consolidation”, but the power being consolidated will be the AI’s own power, right?
I mean, 90% of this article—the discussion of offense-defense balance, and limits on human power and coordination—applies equally to “humans using AI to get power” versus “AI getting power for its own purposes”, right?
E.g. out-of-control misaligned AI is still an “enabler of coherent entities”, because it can coordinate with copies of itself.
I guess you’re not explicitly arguing against “open publication of safety advances” but just raising a point of consideration? Anyway, a more balanced discussion of the pros and cons of “open publication of safety advances” would include:
- Is “humans using AI to get power” less bad versus more bad than “AI getting power for its own purposes”? (I lean towards “probably less bad but it sure depends on the humans and the AI”)
- If AI obedience is an unsolved technical problem to such-and-such degree, to what extent does that lead to people not developing ever-more-powerful AI anyway? (I lean towards “not much”, cf. Meta / LeCun today, or the entire history of AI)
- Is the sentence “in reality we should expect combined human-AI entities to reach dangerous capabilities before pure artificial intelligence” really true, and if so how much earlier and does it matter? (I lean towards “not necessarily true in the first place, and if true, probably not by much, and it’s not all that important”)
It’s probably a question that needs to be considered on a case-by-case basis anyway. ¯\_(ツ)_/¯
- crispweed 30 Oct 2024 17:03 UTC
  1 point
  0
  Parent
  Is the sentence “in reality we should expect combined human-AI entities to reach dangerous capabilities before pure artificial intelligence” really true, and if so how much earlier and does it matter? (I lean towards “not necessarily true in the first place, and if true, probably not by much, and it’s not all that important”)
  I guess in my model this is not something that suddenly becomes true at a certain level of capabilities. Instead, I think that the capabilities of human-AI entities become more dangerous in something of a continuous fashion as AI (and the technology for controlling AI) improves.
  - crispweed 30 Oct 2024 21:16 UTC
    1 point
    0
    Parent
    how much earlier
    Yeah, good question. I don’t know really.
    and does it matter?
    I think so, because even if pure AI control follows on from human-AI entity control (which would actually be my prediction), I expect the dynamics of human-AI control to very much lead to and accelerate that eventual pure AI control.
    I’m thinking, also, that there is a thing where pure AI entities need to be careful not to ‘tip their hat’. What I mean by this is that pure AI entities will need to be careful not to reveal the extent of their capabilities up until a point where they are actually capable of taking control, whereas human-AI entities can kind of go ahead and play the power game and start to build up control without so much concern about this. (To the average voter, this could just look like more of the same.)