One thing I’m interested in but don’t know where to start looking for it, is seeing people who are working instead on the reverse direction—mathematical approaches which show aligned AI is not possible or likely. By this I mean formal work that suggests something like “almost all AGIs are unsafe”, in the same way that the chances of picking a rational number at random from (0,1) is zero because almost all real numbers are irrational.
I don’t say this to be a downer! I mean it in the sense of a mathematician who spent 7 years attempting to prove X exists, and then sits down one day and spends 4 hours proving why X cannot exist. Progress can take surprising forms!
I have been working on an argument from that angle.
I’ve been developing it independently from my own background in autonomous safety-critical hardware/software systems, but I discovered recently that it’s very similar to Drexler’s CAIS from 2019, except with more focus on low-level evidence or rationale for why certain claims are justified.
It isn’t so much a pure mathematical approach as it is a systems engineering or systems safety perspective on all of the problems[1][2] that would remain even if someone showed up tomorrow and dropped a formally verified algorithm describing an “aligned AGI” onto my desk, and what ramification that has for the development of AGI at all. The only complicated math in it so far is about computational complexity classes and relatively simple if, then logic for analyzing risk vectors.
I guess if I had to pick the “key” insight that I claim I can contribute, and share it now, it would be this:
If you define super-human performance in terms of some abstract thing called “intelligence” (or “general intelligence”), you run into a philosophical question: “what is intelligence?”
There is an answer accepted by this community that intelligence is “efficient cross-domain optimization”.
From this answer, it follows that “general intelligence” is a prerequisite, so we can only rank solutions on tasks by evaluating them in the context of “general intelligence”. In this way, the community can dismiss super-human performance on a specific task if that solution is not immediately generalizable to other tasks.
This also dismisses solutions that can be generalized to other tasks by taking a known algorithm for training a specific solution and deploying that algorithm on a specific task. The latter solution might be something like DeepMind’s research into AlphaGo, then AlphaZero, then AlphaFold, and then Ithaca. That would seem to demonstrate an repeatable engineering process that can be deployed to a specific task and develop a solution with super-human performance on that task without that specific solution generalizing to other tasks.
… [text omitted]
We’ve achieved super-human or “human” performance in Go, Chess, protein folding, image recognition, language recognition, art generation, code generation, translation, and many other fields using AI/ML systems that do not, in any way, demonstrate “general intelligence”.
… [text omitted]
Another way to think about this is to ask if what we call “general intelligence” is ultimately an inefficient algorithm for solving problems, despite the earlier claim that the definition of intelligence was “efficient cross-domain optimization”.
I.e.: what if you can always solve problems faster and more efficiently by deploying AI/ML algorithms without “general intelligence”, than a hypothetical “general intelligence” algorithm would be able to do, even if that algorithm was deployed to specialized hardware?
The closest I’ve seen to someone posing this question was Peter Watts’ sci-fi novel Blindsight [11], but that was more focused on the idea of “consciousness” vs “general intelligence”.
In a world where the algorithm that we’d recognize as “general intelligence” is fundamentally inefficient, we’d see seemingly remarkable and unexplained gains on AI/ML systems across a variety of unrelated problem domains where not one of those AI/ML systems has a capability we’d recognize as “general intelligence”.
If you’ve read CAIS, you might recognize the above argument, where it was worded as:
In particular, taking human learning as a model for machine learning has encouraged the conflation of intelligence-as-learning-capacity with intelligence-as-competence, while these aspects of intelligence are routinely and cleanly separated AI system development: Learning algorithms are typically applied to train systems that do not themselves embody those algorithms. [CAIS 11.7]
When this idea was proposed in 2019, it seems to me like it was criticized because people didn’t see how task-focused AI/ML systems could keep improving and eventually surpass human performance without somehow developing “general intelligence” along the way, plus a general skepticism that there would be rational reasons to not “just” staple every single hypothetical task together inside a system and call it AGI. I really think it’s worth looking at this again in light of the last 3 years and asking if that criticism was justified.
In systems safety, we’re concerned with the safety of a larger system than the usual “product-focused” mindset. It is not enough for there to be a proof that a hypothetical product as-designed is safe. We also need to look at the likelihood of:
design failures (the formal proof was wrong because the verification of it had a bug, there is no formal proof, the “formally verified” proof was actually checked by humans and not by an automated theorem prover)
manufacturing failures (hardware behavior out-of-spec, missed errata, power failures, bad ICs, or other failure of components)
implementation failures (software bugs, compiler bugs, differences between an idealized system in a proof vs the implementation of that system in some runtime or with some language)
verification failures (bugs in tests that resulted in a false claim that the software met the formal spec)
environment or runtime failures (e.g. radiation-induced upsets like bit flips; Does the system use voting? Is the RAM using ECC? What about the processor itself?)
usage failures (is the product still safe if it’s misused? what type of training or compliance might be required? is maintenance needed? is there some type of warning or lockout on the device itself if it is not actively maintained?)
For each of these failure modes, we then look at the worst-case magnitude of that failure. Does the failure result in non-functional behavior, or does it result in erroneous behavior? Can erroneous behavior be detected? By what? Etc. This type of review is called an FMEA. This review process can rule out designs that “seem good on paper” if there’s sufficient likelihood of failures and inability to mitigate them to our desired risk tolerances outside of just the design itself, especially if there exist other solutions in the same design space that do not have similar flaws.
One thing I’m interested in but don’t know where to start looking for it, is seeing people who are working instead on the reverse direction—mathematical approaches which show aligned AI is not possible or likely. By this I mean formal work that suggests something like “almost all AGIs are unsafe”, in the same way that the chances of picking a rational number at random from (0,1) is zero because almost all real numbers are irrational.
I don’t say this to be a downer! I mean it in the sense of a mathematician who spent 7 years attempting to prove X exists, and then sits down one day and spends 4 hours proving why X cannot exist. Progress can take surprising forms!
I have been working on an argument from that angle.
I’ve been developing it independently from my own background in autonomous safety-critical hardware/software systems, but I discovered recently that it’s very similar to Drexler’s CAIS from 2019, except with more focus on low-level evidence or rationale for why certain claims are justified.
It isn’t so much a pure mathematical approach as it is a systems engineering or systems safety perspective on all of the problems[1][2] that would remain even if someone showed up tomorrow and dropped a formally verified algorithm describing an “aligned AGI” onto my desk, and what ramification that has for the development of AGI at all. The only complicated math in it so far is about computational complexity classes and relatively simple if, then logic for analyzing risk vectors.
I guess if I had to pick the “key” insight that I claim I can contribute, and share it now, it would be this:
If you’ve read CAIS, you might recognize the above argument, where it was worded as:
When this idea was proposed in 2019, it seems to me like it was criticized because people didn’t see how task-focused AI/ML systems could keep improving and eventually surpass human performance without somehow developing “general intelligence” along the way, plus a general skepticism that there would be rational reasons to not “just” staple every single hypothetical task together inside a system and call it AGI. I really think it’s worth looking at this again in light of the last 3 years and asking if that criticism was justified.
In systems safety, we’re concerned with the safety of a larger system than the usual “product-focused” mindset. It is not enough for there to be a proof that a hypothetical product as-designed is safe. We also need to look at the likelihood of:
design failures (the formal proof was wrong because the verification of it had a bug, there is no formal proof, the “formally verified” proof was actually checked by humans and not by an automated theorem prover)
manufacturing failures (hardware behavior out-of-spec, missed errata, power failures, bad ICs, or other failure of components)
implementation failures (software bugs, compiler bugs, differences between an idealized system in a proof vs the implementation of that system in some runtime or with some language)
verification failures (bugs in tests that resulted in a false claim that the software met the formal spec)
environment or runtime failures (e.g. radiation-induced upsets like bit flips; Does the system use voting? Is the RAM using ECC? What about the processor itself?)
usage failures (is the product still safe if it’s misused? what type of training or compliance might be required? is maintenance needed? is there some type of warning or lockout on the device itself if it is not actively maintained?)
process failures (“normalization of deviance”)
For each of these failure modes, we then look at the worst-case magnitude of that failure. Does the failure result in non-functional behavior, or does it result in erroneous behavior? Can erroneous behavior be detected? By what? Etc. This type of review is called an FMEA. This review process can rule out designs that “seem good on paper” if there’s sufficient likelihood of failures and inability to mitigate them to our desired risk tolerances outside of just the design itself, especially if there exist other solutions in the same design space that do not have similar flaws.