Eldho Kuriakose comments on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Eldho Kuriakose 31 May 2023 20:11 UTC
1 point
Awesome piece! Isn’t it fascinating that our existing incentives and motives are already un-aligned with the priority of creating aligned systems? This then raises the question of whether alignment is even the right goal if our bigger goal is to avoid ruin.
Stepping back a bit, I can’t convince myself that Aligned AI will or will not result in societal ruin. It almost feels like a “don’t care” in the karnaugh map.
The fundamental question is whether we collectively are wise enough to wield power without causing self harm. If the last 200+ years are a testament, and if the projections of climate change and bio diversity loss are accurate, the answer appears that we’re not even wise enough to wield whale oil, let alone fossil fuels.
There is also the very real possibility that Alignment can occur in two ways − 1) with the machine aligning with human values and 2) with the humans aligning with values generated in machines. Would we be able to tell the difference?
If indeed AI can surpass some intelligence threshold, could it also surpass some wisdom threshold? If this is possible, is alignment necessarily our best bet for avoiding ruin?