CarlShulman comments on Partial value takeover without world takeover

CarlShulman 5 Apr 2024 18:27 UTC
42 points
15
If AI agents with unusual values would for a long time be mostly interested in promoting them through means other than lying in wait and taking over the world, is important because...AIs pursuing this strategy are much more visible than those hiding in wait deceptively. We might less expect AI scheming.
AIs showing bits of unintended motives in experiments or deployment would be a valuable piece of evidence re scheming risk, but such behavior would be trained against, pushing scheming behavior out towards the tails of takeover/escape with the power to resist modification. The tendency of human institutions to retrain or replace AIs to human preferences pushes towards misaligned AIs having ~0 or very high power.
What links here?
- eggsyntax's comment on Partial value takeover without world takeover by KatjaGrace (5 Apr 2024 20:52 UTC; 5 points)