Does Gödel’s incompleteness theorem apply to AGI safety?
I understand his theorem is one of the most wildly misinterpreted in mathematics because it technically only applies to first order predicate logic, but there’s something about it that has always left me unsettled.
As far as I know, this form of logic is the best tool we’ve developed to really know things with certainty. I’m not aware of better alternatives (senses frequently are misleading, subjective knowledge is not falsifiable, etc). This has left me with the perspective that with the best tools we have we will either self contradict or not be able to prove true things we need to know with any single algorithm; everything else has limitations that are even more pronounced.
This seems like a profound issue if you’re trying to determine in advance whether or not an AI will destroy humanity.
I try to process the stream of posts on AI safety and I find myself wondering whether or not “solving” AGI safety might already be proven to be impossible with a single, formal system.
It’s an issue, but not an insurmountable one; strategies for sidestepping incompleteness problems exist, even in the context where you treat your AGI as pure math and insist on full provability. Most of the work on incompleteness problems focuses on Löb’s theorem, sometimes jokingly calling it the Löbstacle. I’m not sure what the state of this subfield is, exactly, but I’ve seen enough progress to be pretty sure that it’s tractable.
Does Gödel’s incompleteness theorem apply to AGI safety?
I understand his theorem is one of the most wildly misinterpreted in mathematics because it technically only applies to first order predicate logic, but there’s something about it that has always left me unsettled.
As far as I know, this form of logic is the best tool we’ve developed to really know things with certainty. I’m not aware of better alternatives (senses frequently are misleading, subjective knowledge is not falsifiable, etc). This has left me with the perspective that with the best tools we have we will either self contradict or not be able to prove true things we need to know with any single algorithm; everything else has limitations that are even more pronounced.
This seems like a profound issue if you’re trying to determine in advance whether or not an AI will destroy humanity.
I try to process the stream of posts on AI safety and I find myself wondering whether or not “solving” AGI safety might already be proven to be impossible with a single, formal system.
It’s an issue, but not an insurmountable one; strategies for sidestepping incompleteness problems exist, even in the context where you treat your AGI as pure math and insist on full provability. Most of the work on incompleteness problems focuses on Löb’s theorem, sometimes jokingly calling it the Löbstacle. I’m not sure what the state of this subfield is, exactly, but I’ve seen enough progress to be pretty sure that it’s tractable.