I understood the context provided by your 4 color problem example.
What I’m unsure about is how that relates to your question.
Maybe I don’t understand the question you have.
I thought it was, “What should happen if both (1) everything it says makes sense and (2) you can’t follow the full argument?”.
My claim is “Following enough of an argument to agree is precisely what it means for something to make sense.”.
In the case of the four color problem, it sounds like for 20 years there were many folks that did not follow the full argument because it was too long for them to read. During that time, the conclusion did not make sense to them. Then, in 1995 a new shorter argument came along. One that they could follow. It included propositions that describe how the computer proofing system works.
For your latter question, “What would it take for me to trust an AI’s reasoning over my own beliefs when I’m unable to actually verify the AI’s reasoning?”. My answer is “A leap of faith.”. I would highly recommend that people not take leaps of faith. In general, I would not trust an AI’s reasoning if I were not able to actually verify the AI’s reasoning. This is why mechanistic interpretability is critical in alignment.
Instead they wound up reading the program. This time it was written in C—which is easier to follow. And the fact that there were now two independent proofs in different languages that ran on different computers greatly reduced the worries that one of them might have a simple bug.
I do not know that any human has ever tried to properly read any proof of the 4 color theorem.
Now to the issue. The overall flow and method of argument were obviously correct. Spot checking individual points gave results that were also correct. The basic strategy was also obviously correct. It was a basic, “We prove that if it holds in every one of these special cases, then it is true. Then we check each special case.” Therefore it “made sense”. The problem was the question, “Might there be a mistake somewhere?” After all proofs do not simply have to make sense, they need to be verified. And that was what people couldn’t accept.
The same thing with the Objectivist. You can in fact come up with flaws in proposed understandings of the philosophy fairly easily. It happens all the time. But Objectivists believe that, after enough thought and evidence, it will converge on the one objective version. The AI’s proposed proof therefore can make sense in all of the same ways. It would even likely have a similar form. “Here is a categorization of all of the special cases which might be true. We just have to show that each one can’t work.” You might look at them and agree that those sound right. You can look at individual cases and accept that they don’t work. But do you abandon the belief that somewhere, somehow, there is a way to make it work? As opposed to the AI saying that there is none?
As you said, it requires a leap of faith. And your answer is mechanistic interpretability. Which is exactly what happened in the end with the 4 color proof. A mechanistically interpretable proof was produced, and mechanistically interpreted by Coq. QED.
But for something as vague as a philosophy, I think it will take a long time to get to mechanistically interpretable demonstrations. And the thing which will do so is likely itself to be an AI...
It will not take a long time if we use collective intelligence to do it together. The technology is already here. I’ve been trying to share it with others that understand the value of doing this before AI learns to do it on its own. If you want to learn more about that, feel free to look me up on the ‘X’ platform @therealkrantz.
For math, it is already here. Several options exist, Coq is the most popular.
For philosophy, the language requirements alone need AI at the level of reasonably current LLMs. Which brings their flaws as well. Plus you need knowledge of human experience. By the time you put it together, I don’t see how a mechanistic interpreter can be anything less than a (hopefully somewhat limited) AI.
Which again raises the question of how we come to trust in it enough for it not to be a leap of faith.
I understood the context provided by your 4 color problem example.
What I’m unsure about is how that relates to your question.
Maybe I don’t understand the question you have.
I thought it was, “What should happen if both (1) everything it says makes sense and (2) you can’t follow the full argument?”.
My claim is “Following enough of an argument to agree is precisely what it means for something to make sense.”.
In the case of the four color problem, it sounds like for 20 years there were many folks that did not follow the full argument because it was too long for them to read. During that time, the conclusion did not make sense to them. Then, in 1995 a new shorter argument came along. One that they could follow. It included propositions that describe how the computer proofing system works.
For your latter question, “What would it take for me to trust an AI’s reasoning over my own beliefs when I’m unable to actually verify the AI’s reasoning?”. My answer is “A leap of faith.”. I would highly recommend that people not take leaps of faith. In general, I would not trust an AI’s reasoning if I were not able to actually verify the AI’s reasoning. This is why mechanistic interpretability is critical in alignment.
Nobody ever read the 1995 proof.
Instead they wound up reading the program. This time it was written in C—which is easier to follow. And the fact that there were now two independent proofs in different languages that ran on different computers greatly reduced the worries that one of them might have a simple bug.
I do not know that any human has ever tried to properly read any proof of the 4 color theorem.
Now to the issue. The overall flow and method of argument were obviously correct. Spot checking individual points gave results that were also correct. The basic strategy was also obviously correct. It was a basic, “We prove that if it holds in every one of these special cases, then it is true. Then we check each special case.” Therefore it “made sense”. The problem was the question, “Might there be a mistake somewhere?” After all proofs do not simply have to make sense, they need to be verified. And that was what people couldn’t accept.
The same thing with the Objectivist. You can in fact come up with flaws in proposed understandings of the philosophy fairly easily. It happens all the time. But Objectivists believe that, after enough thought and evidence, it will converge on the one objective version. The AI’s proposed proof therefore can make sense in all of the same ways. It would even likely have a similar form. “Here is a categorization of all of the special cases which might be true. We just have to show that each one can’t work.” You might look at them and agree that those sound right. You can look at individual cases and accept that they don’t work. But do you abandon the belief that somewhere, somehow, there is a way to make it work? As opposed to the AI saying that there is none?
As you said, it requires a leap of faith. And your answer is mechanistic interpretability. Which is exactly what happened in the end with the 4 color proof. A mechanistically interpretable proof was produced, and mechanistically interpreted by Coq. QED.
But for something as vague as a philosophy, I think it will take a long time to get to mechanistically interpretable demonstrations. And the thing which will do so is likely itself to be an AI...
It will not take a long time if we use collective intelligence to do it together. The technology is already here. I’ve been trying to share it with others that understand the value of doing this before AI learns to do it on its own. If you want to learn more about that, feel free to look me up on the ‘X’ platform @therealkrantz.
It depends on subject matter.
For math, it is already here. Several options exist, Coq is the most popular.
For philosophy, the language requirements alone need AI at the level of reasonably current LLMs. Which brings their flaws as well. Plus you need knowledge of human experience. By the time you put it together, I don’t see how a mechanistic interpreter can be anything less than a (hopefully somewhat limited) AI.
Which again raises the question of how we come to trust in it enough for it not to be a leap of faith.