This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.
When somebody says they are doing A for reason X, then reason X is criticized and they claim they are actually doing A for reason Y, and they have always been, I tend to be wary.
In this case A is “research on mathematical logic and formal proof systems”, X is “self-improving AI is unboxable and untestable, we need to get it provably right on the first try” and Y is “Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control”.
If Y is better than X, as it seems to me in this case, this is indeed an improvement, but when you modify your reasons and somehow conclude that your previously chosen course of action is still optimal, then I doubt your judgment.
“And if Novamente should ever cross the finish line, we all die. That is what I believe or I would be working for Ben this instant.” “I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)” ”Take metaethics, a solved problem: what are the odds that someone who still thought metaethics was a Deep Mystery could write an AI algorithm that could come up with a correct metaethics? I tried that, you know, and in retrospect it didn’t work.” ″Find whatever you’re best at; if that thing that you’re best at is inventing new math[s] of artificial intelligence, then come work for the Singularity Institute. [ … ] Aside from that, though, I think that saving the human species eventually comes down to, metaphorically speaking, nine people and a brain in a box in a basement, and everything else feeds into that.”
X is “self-improving AI is unboxable and untestable, we need to get it provably right on the first try”
But where did somebody from MIRI say “we need to get it provably right on the first try”? Also, what would that even mean? You can’t write a formal specification that includes the entire universe and than formally verify an AI against that formal specification. I couldn’t find any Yudkowsky quotes about “getting it provably right on the first try” at the link you provided.
Again, a simplification, but: we want a sufficient guarantee of stably friendly behavior before we risk pushing things past a point of no return. A sufficient guarantee plausibly requires having robust solutions for indirect normatively, stable self-modification, reflectively consistent decision theory, etc. But that doesn’t mean we expect to ever have a definite “proof” that system will be stably friendly.
Formal methods work for today’s safety-critical software systems never results in a definite proof that a system will be safe, either, but ceteris paribis formal proofs of particular internal properties of the system give you more assurance that the system will behave as intended than you would otherwise have.
Otherwise compared to nothing, or otherwise compared to informal methods?
Are you talking into account that the formal/proveable/unupdateable approach has a drawback in the AI domain that it doesn’t have in the non AI domain, namely you lose the potential to tell an AI “stop doing that,it isn’t nice”
When somebody says they are doing A for reason X, then reason X is criticized and they claim they are actually doing A for reason Y, and they have always been, I tend to be wary.
In this case A is “research on mathematical logic and formal proof systems”,
X is “self-improving AI is unboxable and untestable, we need to get it provably right on the first try”
and Y is “Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control”.
If Y is better than X, as it seems to me in this case, this is indeed an improvement, but when you modify your reasons and somehow conclude that your previously chosen course of action is still optimal, then I doubt your judgment.
Well… (trigger wa-...)
“And if Novamente should ever cross the finish line, we all die. That is what I believe or I would be working for Ben this instant.”
“I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)”
”Take metaethics, a solved problem: what are the odds that someone who still thought metaethics was a Deep Mystery could write an AI algorithm that could come up with a correct metaethics? I tried that, you know, and in retrospect it didn’t work.”
″Find whatever you’re best at; if that thing that you’re best at is inventing new math[s] of artificial intelligence, then come work for the Singularity Institute. [ … ] Aside from that, though, I think that saving the human species eventually comes down to, metaphorically speaking, nine people and a brain in a box in a basement, and everything else feeds into that.”
But where did somebody from MIRI say “we need to get it provably right on the first try”? Also, what would that even mean? You can’t write a formal specification that includes the entire universe and than formally verify an AI against that formal specification. I couldn’t find any Yudkowsky quotes about “getting it provably right on the first try” at the link you provided.
Why talk about unupdateable UFs and “solving morality” if you are not going for that approach?
Again, a simplification, but: we want a sufficient guarantee of stably friendly behavior before we risk pushing things past a point of no return. A sufficient guarantee plausibly requires having robust solutions for indirect normatively, stable self-modification, reflectively consistent decision theory, etc. But that doesn’t mean we expect to ever have a definite “proof” that system will be stably friendly.
Formal methods work for today’s safety-critical software systems never results in a definite proof that a system will be safe, either, but ceteris paribis formal proofs of particular internal properties of the system give you more assurance that the system will behave as intended than you would otherwise have.
Otherwise compared to nothing, or otherwise compared to informal methods?
Are you talking into account that the formal/proveable/unupdateable approach has a drawback in the AI domain that it doesn’t have in the non AI domain, namely you lose the potential to tell an AI “stop doing that,it isn’t nice”
How so?
Do you think that wouldl work on Clippie?