Below is an edited version of an email I prepared for someone about what CS researchers can do to improve our AGI outcomes in expectation. It was substantive enough I figured I might as well paste it somewhere online, too.
I’m currently building a list of what will eventually be short proposals for several hundred PhD theses / long papers that I think would help clarify our situation with respect to getting good outcomes from AGI, if I could persuade good researchers to research and write them. A couple dozen of these are in computer science broadly: the others are in economics, history, etc. I’ll write out a few of the proposals as 3-5 page project summaries, and the rest I’ll just leave as two-sentence descriptions until somebody promising contacts me and tells me they want to do it and want more detail. I think of these as “superintelligence strategy” research projects, similar to the kind of work FHI typically does on AGI. Most of these projects wouldn’t only be interesting to people interested in superintelligence, e.g. a study building on these results on technological forecasting would be interesting to lots of people, not just those who want to use the results to gain a bit of insight into superintelligence.
Then there’s also the question of “How do we design a high assurance AGI which would pass a rigorous certification process ala the one used for autopilot software and other safety-critical software systems?”
There, too, MIRI has lots of ideas for plausibly useful work that could be done today, but of course it’s hard to predict this far in advance which particular lines of research will pay off. But then, this is almost always the case for long-time-horizon theoretical research, and e.g. applying HoTT to program verification sure seems more likely to help our chances of positive AGI outcomes than, say, research on genetic algorithms for machine vision.
I’ll be fairly inclusive in listing these open problems. Many of the problems below aren’t necessarily typical CS work, but they could plausibly be published in some normal CS venues, e.g. surveys of CS people are sometimes published in CS journals or conferences, even if they aren’t really “CS research” in the usual sense.
First up are ‘superintelligence strategy’ aka ‘clarify our situation w.r.t. getting good AGI outcomes eventually’ projects:
More and larger expert surveys on AGI timelines, takeoff speed, and likely social impacts, besides the one reported in the first chapter of Superintelligence (which isn’t yet published).
Delphi study of those questions including AI/ML people, AGI people, and AI safety+security people.
How big is the field of AI currently? How many quality-adjusted researcher years, funding, and available computing resources per year? How many during each past previous decade in AI? More here.
What is the current state of AI safety engineering? What can and can’t we do? Summary and comparison of approaches in formal verification in AI, hybrid systems control, etc. Right now there are a bunch of different communities doing AI safety and they barely talk to each other, so it’s hard for any one person to figure out what’s going on in general. Also would be nice to know which techniques are being used where, especially in proprietary and military systems for which there aren’t any papers.
Surveys of AI subfield experts on “What percentage of the way to human-level performance in your subfield have we come in the last n years”? More here.
Improved analysis of concept of general intelligence beyond “efficient cross-domain optimization.” Maybe just more specific: canonical environments, etc. Also see work on formal measures of general intelligence by Legg, by Hernandez-Orallo, etc.
Continue Katja’s project on past algorithmic improvement. Filter not for ease of data collection but for real-world importance of the algorithm. Interesting to computer scientists in general, but also potentially relevant to arguments about AI takeoff dynamics.
What software projects does the government tend to monitor? Do they ever “take over” (nationalize) software projects? What kinds of software projects do they invade and destroy?
Are there examples of narrow AI “takeoff”? Eurisko maybe the closest thing I can think of, but the details aren’t clear because Lenat’s descriptions were ambiguous and we don’t have the source code.
Some AI approaches are more and less transparent to human understanding/inspection. How well does each AI approach’s transparency to human inspection scale?More here.
Can computational complexity theory place any bounds on AI takeoff? Daniel Dewey is looking into this; it currently doesn’t look promising but maybe somebody else would find something a bit informative.
To get an AGI to respect the values of multiple humans & groups, we may need significant progress in computational social choice, e.g. fair division theory and voting theory. More here.
Next, high assurance AGI projects that might be publishable in some CS conferences/journals. One way to categorize this stuff is into “bottom-up research” and “top-down research.”
Bottom-up research aimed at high assurance AGI simply builds on current AI safety/security approaches, pushing them along to be more powerful, more broadly applicable, more computationally tractable, easier to use, etc. This work isn’t necessarily focused on AGI specifically but is plausibly pushing in a more safe-AGI-helpful direction than most AI research is. Examples:
More tools to make high assurance methods easier to apply: e.g. better interfaces and training for SPIN.
More work on making more types of AI systems more transparent, so we understand why they work and what bounds they will operate within, so we can have stronger safety+security guarantees for particular approaches than we have now. Much of this work would probably be in computational learning theory, and in dimensionality reduction techniques. Also see here.
Top-down research aimed at high assurance AGI tries to envision what we’ll need a high assurance AGI to do, and starts playing with toy models to see if they can help us build up insights into the general problem, even if we don’t know what an actual AGI implementation will look like. Past examples of top-down research of this sort in computer science more generally include:
Lampson’s original paper on the confinement problem (covert channels), which used abstract models to describe a problem that wasn’t detected in the wild for ~2 decades after the wrote the paper. Nevertheless this gave computer security researchers a head start on the problem, and the covert channel communication field is now pretty big and active. Details here.
Shor’s quantum algorithm for integer factorization (1994) showed, several decades before we’re likely to get a large-scale quantum computer, that (e.g.) the NSA could be capturing and storing strongly encrypted communications and could later break them with a QC. So if you want to guarantee your current communications will remain private in the future, you’ll want to work on post-quantum cryptography and use it.
Hutter’s AIXI is the first fully-specified model of “universal” intelligence. It’s incomputable, but there are computable variants, and indeed tractable variants that can play arcade games successfully. The nice thing about AIXI is that you can use it to concretely illustrate certain AGI safety problems we don’t yet know how to solve even with infinite computing power, which means we must be very confused indeed. Not all AGI safety problems will be solved by first finding an incomputable solution, but that is one common way to make progress. I say more about this in a forthcoming paper with Bill Hibbard to be published in CACM.
But now, here are some top-down research problems MIRI thinks might pay off later for AGI safety outcomes, some of which are within or on the borders of computer science:
Naturalized induction: “Build an algorithm for producing accurate generalizations and predictions from data sets, that treats itself, its data inputs, and its hypothesis outputs as reducible to its physical posits. More broadly, design a workable reasoning method that allows the reasoner to treat itself as fully embedded in the world it’s reasoning about.” (Agents build with the agent-environment framework are effectively Cartesian dualists, which has safety implications.)
Better AI cooperation: How can we get powerful agents to cooperate with each other where feasible? One line of research on this is called “program equilibrium”: in a setup where agents can read each other’s source code, they can recognize each other for cooperation more often than would be the case in a standard Prisoner’s Dilemma. However, these approaches were brittle, and agents couldn’t recognize each other for cooperation if e.g. a variable name was different between them. We got around that problem via provability logic.
Tiling agents: Like Bolander and others, we study self-reflection in computational agents, though for us its because we’re thinking ahead to the point when we’ve got AGIs who want to improve their own abilities and we want to make sure they retain their original purposes as they rewrite their own code. We’ve built some toy models for this, and they run into nicely crisp Gödelian difficulties and then we throw a bunch of math at those difficulties and in some cases they kind of go away, and we hope this’ll lead to insight into the general challenge of self-reflective agents that don’t change their goals on self-modification round #412. See also the procrastination paradox and Fallenstein’s monster.
These are just a few examples: there are lots more. We aren’t happy yet with our descriptions of any of these problems, and we’re working with various people to explain ourselves better, and make it easier for people to understand what we’re talking about and why we’re working on these problems and not others. But nevertheless some people seem to grok what we’re doing, e.g. I pointed Nik Weaver to the tiling agents paper stuff and despite not having past familiarity with MIRI he just ran with it.
Below is an edited version of an email I prepared for someone about what CS researchers can do to improve our AGI outcomes in expectation. It was substantive enough I figured I might as well paste it somewhere online, too.
I’m currently building a list of what will eventually be short proposals for several hundred PhD theses / long papers that I think would help clarify our situation with respect to getting good outcomes from AGI, if I could persuade good researchers to research and write them. A couple dozen of these are in computer science broadly: the others are in economics, history, etc. I’ll write out a few of the proposals as 3-5 page project summaries, and the rest I’ll just leave as two-sentence descriptions until somebody promising contacts me and tells me they want to do it and want more detail. I think of these as “superintelligence strategy” research projects, similar to the kind of work FHI typically does on AGI. Most of these projects wouldn’t only be interesting to people interested in superintelligence, e.g. a study building on these results on technological forecasting would be interesting to lots of people, not just those who want to use the results to gain a bit of insight into superintelligence.
Then there’s also the question of “How do we design a high assurance AGI which would pass a rigorous certification process ala the one used for autopilot software and other safety-critical software systems?”
There, too, MIRI has lots of ideas for plausibly useful work that could be done today, but of course it’s hard to predict this far in advance which particular lines of research will pay off. But then, this is almost always the case for long-time-horizon theoretical research, and e.g. applying HoTT to program verification sure seems more likely to help our chances of positive AGI outcomes than, say, research on genetic algorithms for machine vision.
I’ll be fairly inclusive in listing these open problems. Many of the problems below aren’t necessarily typical CS work, but they could plausibly be published in some normal CS venues, e.g. surveys of CS people are sometimes published in CS journals or conferences, even if they aren’t really “CS research” in the usual sense.
First up are ‘superintelligence strategy’ aka ‘clarify our situation w.r.t. getting good AGI outcomes eventually’ projects:
More and larger expert surveys on AGI timelines, takeoff speed, and likely social impacts, besides the one reported in the first chapter of Superintelligence (which isn’t yet published).
Delphi study of those questions including AI/ML people, AGI people, and AI safety+security people.
How big is the field of AI currently? How many quality-adjusted researcher years, funding, and available computing resources per year? How many during each past previous decade in AI? More here.
What is the current state of AI safety engineering? What can and can’t we do? Summary and comparison of approaches in formal verification in AI, hybrid systems control, etc. Right now there are a bunch of different communities doing AI safety and they barely talk to each other, so it’s hard for any one person to figure out what’s going on in general. Also would be nice to know which techniques are being used where, especially in proprietary and military systems for which there aren’t any papers.
Surveys of AI subfield experts on “What percentage of the way to human-level performance in your subfield have we come in the last n years”? More here.
Improved analysis of concept of general intelligence beyond “efficient cross-domain optimization.” Maybe just more specific: canonical environments, etc. Also see work on formal measures of general intelligence by Legg, by Hernandez-Orallo, etc.
Continue Katja’s project on past algorithmic improvement. Filter not for ease of data collection but for real-world importance of the algorithm. Interesting to computer scientists in general, but also potentially relevant to arguments about AI takeoff dynamics.
What software projects does the government tend to monitor? Do they ever “take over” (nationalize) software projects? What kinds of software projects do they invade and destroy?
Are there examples of narrow AI “takeoff”? Eurisko maybe the closest thing I can think of, but the details aren’t clear because Lenat’s descriptions were ambiguous and we don’t have the source code.
Cryptographic boxes for untrusted AI programs.
Some AI approaches are more and less transparent to human understanding/inspection. How well does each AI approach’s transparency to human inspection scale? More here.
Can computational complexity theory place any bounds on AI takeoff? Daniel Dewey is looking into this; it currently doesn’t look promising but maybe somebody else would find something a bit informative.
To get an AGI to respect the values of multiple humans & groups, we may need significant progress in computational social choice, e.g. fair division theory and voting theory. More here.
Next, high assurance AGI projects that might be publishable in some CS conferences/journals. One way to categorize this stuff is into “bottom-up research” and “top-down research.”
Bottom-up research aimed at high assurance AGI simply builds on current AI safety/security approaches, pushing them along to be more powerful, more broadly applicable, more computationally tractable, easier to use, etc. This work isn’t necessarily focused on AGI specifically but is plausibly pushing in a more safe-AGI-helpful direction than most AI research is. Examples:
Extend current techniques in formal verification (overview for AI applications, also see e.g. higher-order program verification and incremental reverification), program synthesis (overview of hybrid system applications: p1, p2, p3, p4), simplex architectures, etc.
More work following up on Weld & Etzioni’s “call to arms” for “Asimovian agents”: for a 2014 overview see here.
More work on how to do principled formal validation (not just verification), see e.g. Rushby on epistemic doubt and especially Cimatti’s group on formal validation.
Apply HoTT to program verification.
More work on clean-slate hardware/software systems that are built from the ground up for high assurance at every stage, e.g. SAFE and HACMS.
More verified software libraries and compilers, ala the Verified Software Toolchain.
More tools to make high assurance methods easier to apply: e.g. better interfaces and training for SPIN.
More work on making more types of AI systems more transparent, so we understand why they work and what bounds they will operate within, so we can have stronger safety+security guarantees for particular approaches than we have now. Much of this work would probably be in computational learning theory, and in dimensionality reduction techniques. Also see here.
To be continued...
Continued...
Top-down research aimed at high assurance AGI tries to envision what we’ll need a high assurance AGI to do, and starts playing with toy models to see if they can help us build up insights into the general problem, even if we don’t know what an actual AGI implementation will look like. Past examples of top-down research of this sort in computer science more generally include:
Lampson’s original paper on the confinement problem (covert channels), which used abstract models to describe a problem that wasn’t detected in the wild for ~2 decades after the wrote the paper. Nevertheless this gave computer security researchers a head start on the problem, and the covert channel communication field is now pretty big and active. Details here.
Shor’s quantum algorithm for integer factorization (1994) showed, several decades before we’re likely to get a large-scale quantum computer, that (e.g.) the NSA could be capturing and storing strongly encrypted communications and could later break them with a QC. So if you want to guarantee your current communications will remain private in the future, you’ll want to work on post-quantum cryptography and use it.
Hutter’s AIXI is the first fully-specified model of “universal” intelligence. It’s incomputable, but there are computable variants, and indeed tractable variants that can play arcade games successfully. The nice thing about AIXI is that you can use it to concretely illustrate certain AGI safety problems we don’t yet know how to solve even with infinite computing power, which means we must be very confused indeed. Not all AGI safety problems will be solved by first finding an incomputable solution, but that is one common way to make progress. I say more about this in a forthcoming paper with Bill Hibbard to be published in CACM.
But now, here are some top-down research problems MIRI thinks might pay off later for AGI safety outcomes, some of which are within or on the borders of computer science:
Naturalized induction: “Build an algorithm for producing accurate generalizations and predictions from data sets, that treats itself, its data inputs, and its hypothesis outputs as reducible to its physical posits. More broadly, design a workable reasoning method that allows the reasoner to treat itself as fully embedded in the world it’s reasoning about.” (Agents build with the agent-environment framework are effectively Cartesian dualists, which has safety implications.)
Better AI cooperation: How can we get powerful agents to cooperate with each other where feasible? One line of research on this is called “program equilibrium”: in a setup where agents can read each other’s source code, they can recognize each other for cooperation more often than would be the case in a standard Prisoner’s Dilemma. However, these approaches were brittle, and agents couldn’t recognize each other for cooperation if e.g. a variable name was different between them. We got around that problem via provability logic.
Tiling agents: Like Bolander and others, we study self-reflection in computational agents, though for us its because we’re thinking ahead to the point when we’ve got AGIs who want to improve their own abilities and we want to make sure they retain their original purposes as they rewrite their own code. We’ve built some toy models for this, and they run into nicely crisp Gödelian difficulties and then we throw a bunch of math at those difficulties and in some cases they kind of go away, and we hope this’ll lead to insight into the general challenge of self-reflective agents that don’t change their goals on self-modification round #412. See also the procrastination paradox and Fallenstein’s monster.
Ontological crises in AI value systems.
These are just a few examples: there are lots more. We aren’t happy yet with our descriptions of any of these problems, and we’re working with various people to explain ourselves better, and make it easier for people to understand what we’re talking about and why we’re working on these problems and not others. But nevertheless some people seem to grok what we’re doing, e.g. I pointed Nik Weaver to the tiling agents paper stuff and despite not having past familiarity with MIRI he just ran with it.