You may be missing context on my reference to the 4 color problem. The original 1976 proof, by Appel and Haken, took over 1000 hours of computer time to check. A human lifetime is too short to verify that proof. This eliminates your first option. The Objectivist cannot, even in principle, check the proof. Life is too short.
Your first option is therefore, by hypothesis, not an option. You can believe the AI or not. But you can’t actually check its reasoning.
The history of the 4 color problem proof shows this kind of debate. People argued for nearly 20 years about whether there might be a bug. Then an independent, and easier to check, computer proof came along in 1995. The debate mostly ended. More efficient computer generated proofs have since been created. The best that I’m aware of is 60,000 lines. In principle that would be verifiable by a human. But no human that I know of has actually bothered. Instead the proof was verified by the proof assistant Coq. And, today, most mathematicians trust Coq over any human.
We have literally come full circle on the 4 color problem. We started by asking whether we can trust a computer if a human can’t check it. And now we accept that a computer can be more trustworthy than a human!
However it took a long time to get the proof down to such a manageable size. And it took a long time to get a computer program that is so trustworthy that most believe it over themselves.
And so the key epistemological challenge. What would it take for you to trust an AI’s reasoning over your own beliefs when you’re unable to actually verify the AI’s reasoning?
I understood the context provided by your 4 color problem example.
What I’m unsure about is how that relates to your question.
Maybe I don’t understand the question you have.
I thought it was, “What should happen if both (1) everything it says makes sense and (2) you can’t follow the full argument?”.
My claim is “Following enough of an argument to agree is precisely what it means for something to make sense.”.
In the case of the four color problem, it sounds like for 20 years there were many folks that did not follow the full argument because it was too long for them to read. During that time, the conclusion did not make sense to them. Then, in 1995 a new shorter argument came along. One that they could follow. It included propositions that describe how the computer proofing system works.
For your latter question, “What would it take for me to trust an AI’s reasoning over my own beliefs when I’m unable to actually verify the AI’s reasoning?”. My answer is “A leap of faith.”. I would highly recommend that people not take leaps of faith. In general, I would not trust an AI’s reasoning if I were not able to actually verify the AI’s reasoning. This is why mechanistic interpretability is critical in alignment.
Instead they wound up reading the program. This time it was written in C—which is easier to follow. And the fact that there were now two independent proofs in different languages that ran on different computers greatly reduced the worries that one of them might have a simple bug.
I do not know that any human has ever tried to properly read any proof of the 4 color theorem.
Now to the issue. The overall flow and method of argument were obviously correct. Spot checking individual points gave results that were also correct. The basic strategy was also obviously correct. It was a basic, “We prove that if it holds in every one of these special cases, then it is true. Then we check each special case.” Therefore it “made sense”. The problem was the question, “Might there be a mistake somewhere?” After all proofs do not simply have to make sense, they need to be verified. And that was what people couldn’t accept.
The same thing with the Objectivist. You can in fact come up with flaws in proposed understandings of the philosophy fairly easily. It happens all the time. But Objectivists believe that, after enough thought and evidence, it will converge on the one objective version. The AI’s proposed proof therefore can make sense in all of the same ways. It would even likely have a similar form. “Here is a categorization of all of the special cases which might be true. We just have to show that each one can’t work.” You might look at them and agree that those sound right. You can look at individual cases and accept that they don’t work. But do you abandon the belief that somewhere, somehow, there is a way to make it work? As opposed to the AI saying that there is none?
As you said, it requires a leap of faith. And your answer is mechanistic interpretability. Which is exactly what happened in the end with the 4 color proof. A mechanistically interpretable proof was produced, and mechanistically interpreted by Coq. QED.
But for something as vague as a philosophy, I think it will take a long time to get to mechanistically interpretable demonstrations. And the thing which will do so is likely itself to be an AI...
It will not take a long time if we use collective intelligence to do it together. The technology is already here. I’ve been trying to share it with others that understand the value of doing this before AI learns to do it on its own. If you want to learn more about that, feel free to look me up on the ‘X’ platform @therealkrantz.
For math, it is already here. Several options exist, Coq is the most popular.
For philosophy, the language requirements alone need AI at the level of reasonably current LLMs. Which brings their flaws as well. Plus you need knowledge of human experience. By the time you put it together, I don’t see how a mechanistic interpreter can be anything less than a (hopefully somewhat limited) AI.
Which again raises the question of how we come to trust in it enough for it not to be a leap of faith.
You may be missing context on my reference to the 4 color problem. The original 1976 proof, by Appel and Haken, took over 1000 hours of computer time to check. A human lifetime is too short to verify that proof. This eliminates your first option. The Objectivist cannot, even in principle, check the proof. Life is too short.
Your first option is therefore, by hypothesis, not an option. You can believe the AI or not. But you can’t actually check its reasoning.
The history of the 4 color problem proof shows this kind of debate. People argued for nearly 20 years about whether there might be a bug. Then an independent, and easier to check, computer proof came along in 1995. The debate mostly ended. More efficient computer generated proofs have since been created. The best that I’m aware of is 60,000 lines. In principle that would be verifiable by a human. But no human that I know of has actually bothered. Instead the proof was verified by the proof assistant Coq. And, today, most mathematicians trust Coq over any human.
We have literally come full circle on the 4 color problem. We started by asking whether we can trust a computer if a human can’t check it. And now we accept that a computer can be more trustworthy than a human!
However it took a long time to get the proof down to such a manageable size. And it took a long time to get a computer program that is so trustworthy that most believe it over themselves.
And so the key epistemological challenge. What would it take for you to trust an AI’s reasoning over your own beliefs when you’re unable to actually verify the AI’s reasoning?
I understood the context provided by your 4 color problem example.
What I’m unsure about is how that relates to your question.
Maybe I don’t understand the question you have.
I thought it was, “What should happen if both (1) everything it says makes sense and (2) you can’t follow the full argument?”.
My claim is “Following enough of an argument to agree is precisely what it means for something to make sense.”.
In the case of the four color problem, it sounds like for 20 years there were many folks that did not follow the full argument because it was too long for them to read. During that time, the conclusion did not make sense to them. Then, in 1995 a new shorter argument came along. One that they could follow. It included propositions that describe how the computer proofing system works.
For your latter question, “What would it take for me to trust an AI’s reasoning over my own beliefs when I’m unable to actually verify the AI’s reasoning?”. My answer is “A leap of faith.”. I would highly recommend that people not take leaps of faith. In general, I would not trust an AI’s reasoning if I were not able to actually verify the AI’s reasoning. This is why mechanistic interpretability is critical in alignment.
Nobody ever read the 1995 proof.
Instead they wound up reading the program. This time it was written in C—which is easier to follow. And the fact that there were now two independent proofs in different languages that ran on different computers greatly reduced the worries that one of them might have a simple bug.
I do not know that any human has ever tried to properly read any proof of the 4 color theorem.
Now to the issue. The overall flow and method of argument were obviously correct. Spot checking individual points gave results that were also correct. The basic strategy was also obviously correct. It was a basic, “We prove that if it holds in every one of these special cases, then it is true. Then we check each special case.” Therefore it “made sense”. The problem was the question, “Might there be a mistake somewhere?” After all proofs do not simply have to make sense, they need to be verified. And that was what people couldn’t accept.
The same thing with the Objectivist. You can in fact come up with flaws in proposed understandings of the philosophy fairly easily. It happens all the time. But Objectivists believe that, after enough thought and evidence, it will converge on the one objective version. The AI’s proposed proof therefore can make sense in all of the same ways. It would even likely have a similar form. “Here is a categorization of all of the special cases which might be true. We just have to show that each one can’t work.” You might look at them and agree that those sound right. You can look at individual cases and accept that they don’t work. But do you abandon the belief that somewhere, somehow, there is a way to make it work? As opposed to the AI saying that there is none?
As you said, it requires a leap of faith. And your answer is mechanistic interpretability. Which is exactly what happened in the end with the 4 color proof. A mechanistically interpretable proof was produced, and mechanistically interpreted by Coq. QED.
But for something as vague as a philosophy, I think it will take a long time to get to mechanistically interpretable demonstrations. And the thing which will do so is likely itself to be an AI...
It will not take a long time if we use collective intelligence to do it together. The technology is already here. I’ve been trying to share it with others that understand the value of doing this before AI learns to do it on its own. If you want to learn more about that, feel free to look me up on the ‘X’ platform @therealkrantz.
It depends on subject matter.
For math, it is already here. Several options exist, Coq is the most popular.
For philosophy, the language requirements alone need AI at the level of reasonably current LLMs. Which brings their flaws as well. Plus you need knowledge of human experience. By the time you put it together, I don’t see how a mechanistic interpreter can be anything less than a (hopefully somewhat limited) AI.
Which again raises the question of how we come to trust in it enough for it not to be a leap of faith.