How to develop safe superintelligence
Author: Martí Llopart
Introduction
In the world we live in, there are many problems that need to be urgently solved: climate change[1], the risk of a new pandemic[2], wealth inequality[3], the aging population[4] and more. All of these problems are hugely complex and solving them requires vast amounts of intelligence. In order to address that, there’s a tool that comes in very handy: Artificial Intelligence. We are all familiar with AI, but to put it shortly, it’s the intelligence demonstrated by machines[5]. In recent years, AI has proven to be a game changer for industries such as healthcare and finance, but most experts agree that we are just at the beginning of what has been termed as “The AI revolution”[6]. The culmination point of this AI revolution is the creation of a Superintelligence, a hypothetical AI agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. If this Superintelligence is aligned with our goals, a solution could be found to the most complex scientific, social and economical problems, leading to huge benefits for humanity. But before creating Superintelligence, General Intelligence must be reached. Artificial General Intelligence (AGI) is the hypothetical intelligence of a machine that has the capacity to understand or learn any intellectual task that a human being can[7]. It is just one step below Superintelligence. So, how do we get to AGI? In this essay, divided in three parts, I’ll try to answer this question.
Part 1: Creating the best general ML algorithm
Our brains are general problem solvers. Their objective is to use the information from the senses to accurately predict future events and to act according to these predictions in order to survive [8]. If we want to develop General Intelligence, we must first create the best general learning algorithm (GLA), an algorithm capable of performing well at learning any type of task. The best GLA would be the one that performs perfectly at learning any task.
However, there’s a big problem in developing such an algorithm: research in machine learning lacks a theoretical backbone. In the last decade, most of the advances in AI have been powered by neural networks, but the theory behind them still remains poorly understood. Although the math behind each individual neuron is well understood, the mathematical theory of the emergent behaviour of the network is still greatly unknown. The Universal Approximation Theorem gives us some guidance from mathematics, but still much of the background theory remains unexplored [9].
Therefore, in the same way that we wouldn’t be able to efficiently design planes or bridges without using the strong mathematical principles of fluid dynamics and solid mechanics, we can’t yet develop smart designs for machine learning algorithms because we don’t have such strong guiding principles yet.
One solution to this problem could be to advance the theoretical knowledge we have over the machine learning field until we reach a much more solid basis. However, there seems to be a much better approach to solve this problem. A recent paper from GoogleAI [10] has shown that it is possible to evolve machine learning algorithms from scratch using only basic mathematical operations as building blocks. In the paper, given a series of tasks, machine learning algorithms were evolved using an evolutionary algorithm to achieve the best performance. Their proposed approach starts from empty programs and, using only basic mathematical operations as building blocks, evolutionary methods are applied to automatically find the code for complete ML algorithms. Given small image classification problems, their method rediscovered fundamental ML techniques, such as 2-layer neural networks with backpropagation, linear regression and the like, which have been invented by researchers throughout the years. The result demonstrates the plausibility of automatically discovering more novel ML algorithms to address harder problems, such as discovering the best GLA.
Another problem in designing the best possible GLA is that because there’s no strong mathematical basis on which to base the design, the only way to check if a GLA is truly the best or not is to test its performance across all possible learning tasks. Obviously, this is impossible because there’s an infinite amount of learning tasks and a limited amount of time and computational resources. Hence, a solution to this problem could be to create a benchmark, a set of problems that are representative enough to most learning tasks.
The company Deepmind, a British subsidiary of Alphabet, is working on developing safe AGI. As part of their approach, they are also trying to develop better GLAs. One of their latest feats was MuZero, a program that is able to master games without previously knowing their rules. In its 2019 release, MuZero was able to outperform humans at go, chess, shogi and a standard suite of Atari games. This set of games could be used as a starting benchmark for testing the performance of GLAs developed in the future[11].
The following steps is what I propose to create the best general ML algorithm:
1- To create a benchmark:
The first step consists in creating a set of learning tasks that are diverse and representative enough to all possible learning tasks. Currently, the benchmark used by the pioneer company Deepmind to test learning algorithms consists of the Atari57 suite of games as well as chess, go and shogi. As previously mentioned, the benchmark has been beaten by the MuZero program at superhuman level.
2- To record the performance of the intelligent agent at beating the benchmark (attaining superhuman performance):
The second step of the process is to record the performance of the best artificially intelligent agent that has beaten the benchmark with superhuman performance. Currently, this program is the MuZero algorithm developed by Deepmind.
Three parameters will be recorded:
2.1 - Computational resources: The hardware and software configurations of the supercomputer used to run the experiment.
2.2 - Compute time: Using the specified hardware and software configurations, how much time it took for the learning algorithm to beat the benchmark.
2.3 - Length of the algorithm: The size of the file encoding the learning algorithm in the programming language or languages used.
3- Design a Machine Learning algorithm generator:
Machine learning algorithms are specific combinations of basic logical and mathematical units. As previously mentioned, in 2020 the GoogleAI team developed a novel methodology to automatically generate machine learning algorithms from scratch using only basic mathematical operations as building blocks, reaching sophisticated structures such as neural networks.
The idea behind this section is to design a framework of basic logical and mathematical operations as a starting point for the automatic design of the general machine learning algorithm. The Google AI team has already created such a framework in its AutoML-Zero paper, but it only contains some logical and mathematical operators. The idea behind this point is to design a similar but more broad framework for the creation of a general learning algorithm. Such a framework should have as many logical and mathematical basic units as possible, so that all options are present. This framework would be a true tabula rasa for the development of general learning algorithms.
4- To evolve a better algorithm:
With the framework of step 3, and using an evolutionary algorithm, a program is searched for with the following properties:
Using the same type of computational resources (software and hardware configurations), the algorithm reaches superhuman performance at the benchmark with equal or fewer time than the gold standard (at the time of writing, MuZero).
The size of the file or files encoding the alogirthm has to be equal or lower than the original algorithm. Why? Because the simpler the algorithm, the less information it has encoded, which ensures that it is not specific to any task. It ensures that the algorithm only contains the indispensable information to learn anything.
If one of the two parameters remains equal to the last best algorithm, the other has to be at least better. The shorter the time to achieve superhuman performance at the benchmark and the shorter the length of the algorithm, the better.
5- Test the newly generated algorithm over unseen games.
The new algorithm is tested over unseen games and compared to the original algorithm (MuZero), to check if it really is a better general learning algorithm or if it just performs better at learning the benchmark. If it is, the process can continue.
6- To check if the new algorithm discovers an equally good or better version of itself substantially sooner than the evolutionary algorithm did.
In the same way that the knowledge of the laws of mechanics and dynamics facilitate the design of planes and bridges, understanding the laws that rule the design of learning algorithms would also speed up its creation process.
The MuZero algorithm is a Deep Learning based algorithm that uses reinforcement learning to understand and exploit the rules of different learning environments. For instance, given the game of chess, the algorithm will be able to understand its rules by just playing the game and then it will use them to win against adversaries.
If MuZero is able to understand and use the rules of previously unknown environments through experience, an improved version of MuZero such as the one we are developing in these steps might be able to quickly pick up the unknown rules that govern the development of learning algorithms and use them in order to develop the best possible learning algorithm in an intelligent way, such as playing chess.
If this is the case and, in many repetitions, the improved algorithm discovers itself or a better version in less time than it took the evolutionary algorithm to do so, it would mean that the improved algorithm has picked up some rules that allow for a more efficient design of general learning algorithms. Such a discovery would be of incredible importance for the machine learning field and would also allow for an exponential speedup in the discovery of the best possible general machine learning algorithm.
7 - From here on the cycle can continue indefinitely, creating better and better general learning algorithms in a self-improving loop.
If step 6 has been successful, the improved algorithm will take the place of the original search algorithm so that better GLAs can be discovered quickly. Moreover, the original algorithm will be substituted by the new algorithm as the gold standard to which improve upon. Therefore, when step 1 starts again, this time the design process has the goal of improving over the new algorithm instead of over MuZero. .
Note nº 1:
It might be the case that when the automatic algorithms are designing the best GLA, they limit the design to the provided computational resources. If this is the case, different computational resources would yield different designs of the best possible GLA. Therefore, ideally the computer used to develop and test the algorithms should match the capabilities of the human brain as much as possible and not be changed. This way, the algorithm designed will be the best GLA that the brain’s power can handle.
It might also be the case that the automatically designed best GLA performs the best using any amount of computational resources.
Note nº2:
We don’t know yet if there’s an endpoint to the process, if there’s a GLA that can’t be improved upon.
Part 2: Transferring the knowledge acquired to better algorithms
(This section is optional, given that a sufficiently powerful general machine learning algorithm could be quickly found using the methodology of part 1. The purpose of this section is just to provide an extra idea that could be helpful.)
It will take a long time for any learning agent to acquire all the knowledge required to achieve AGI. In Part 1, it has been established that creating the best general learning algorithm is a continuous process, because we don’t know if there’s an endpoint. Therefore, when a learning algorithm is being used to train a learning agent to achieve AGI, another algorithm will be in development, which might improve over the abilities of the first algorithm. Hence, it would be useful to have a method of transferring the totality of knowledge gained by the outdated algorithms to the newest versions. These better algorithms will then take the place of the outdated ones in the learning process to AGI.
This is what I propose in order to solve this challenge:
1- A ML algorithm A is learning.
2- Another ML Algorithm, B, is created, which outperforms A as a general learning algorithm.
3- The learning process of A then stops, resulting in a predictive model X.
4- Any ML model can be expressed as a combination of mathematical an logical operators and operations with the following basic structure, which we will call S:
If … happens
Then … happens
In the … percentage of times
Every learning agent operates by reading a predictive model in the structure of S, which allows it to choose the following actions. Hence, any predictive model, as complicated as it might be, can be translated to the structure of S, from neural networks to soft decision trees.
Therefore, to solve this challenge, we could do the following.
1- The predictive model X, created by the algorithm A, is translated to the structure of S. We will call it S1.
2- S1 is then translated to the form of B, as if the B algorithm had been doing the learning process by itself. Then, algorithm B can continue with the learning process.
Part 3: The actual learning
How do we achieve AGI using parts 1 and 2? Here’s one way to proceed:
In 2005, Stanford professor Nils J. Nilson claimed that, in order to achieve AGI, such an intelligence has to be able to do most tasks humans perform for pay [12]. For an artificially intelligent agent to reach AGI it would have to be able to perform any economically relevant activity that a human being can, at least at the same level of proficiency. From quantum mechanics research to playing soccer at professional level. Therefore, this should be the learning objective of the AI agent.
Now that we have a learning objective, which tools do we give the artificially intelligent agent to learn them? The best learning tool known to humankind: the internet. On the one hand, everything there’s to learn can be accessed through the internet, either by pay or for free. If something is not available on the internet directly, it can be accessed through the internet indirectly by asking someone to explain such information in a video call or using other resources. On the other hand, any economically relevant task can be performed through the internet. Currently, only some specific jobs are completely available online in a direct sense. However, the rest of the jobs can be accessed through the internet indirectly with the control of automation agents such as robots. Also, from quantum mechanics research to playing soccer at professional level. And in the future, every corner of the world will have fast internet access at a very low cost.
Therefore, all the elements are now on the table: the learning objective, media and technique. I’m going to then explain the experiment that puts everything together.
The experiment:
The AI is given control over a computer with internet access: this means that the AI is able to use the keyboard and mouse. For the webcam control, the AI agent will be given a profile so that it can do video calls. The information the AI receives are the pixels of the screen and the audio output.
At first, the AI will only be able to interact with the following four websites:
· Stock market trading
· Reuters
· Wikipedia
The reason I’ve selected these four websites is to reduce the search space and provide the learning algorithm with immediate feedback through the ups and downs of the stock market.
The objective of the AI will be to make as much money as possible per second.
If the AI agent wants to achieve its goal, it will have to learn how to perform every economically relevant task better than any human.
At first, the AI will only be able to trade with an imaginary sum of money. Once its predictive power is ensured, real money will be provided. The agent will have to learn how to understand the news and invest accordingly, using Wikipedia, Twitter and Reuters. In Reuters and Wikipedia, the agent will only be able to read, but in Twitter and the Stock Market, the agent will be able to interact. The interactions in the Stock Market will be completely free, but the tweets will have to be manually supervised.
Once the AI proves that it can consistently make money from these resources, three websites will be added: two Google search bars and Youtube. On the one hand, adding Youtube provides the agent with a new learning platform. On the other hand, one of the Google search bars will allow the AI to explore the entirety of the internet, while the other Google search bar will serve the purpose of searching for and developing a new economically relevant task. After that, the AI will have to pay for every internet page it wants to open and run it in parallel to all the other pages. At this point, the AI must have shown that it is able to perform well at at least two economical tasks. Then, the AI should be given access to the entirety of the internet and the computers it uses, even though some specific actions will have to be manually supervised.
In an exponential fashion, the more money the AI makes, the more it will be able to invest in itself with more computers and other resources, which will allow it to learn even more economically relevant tasks. In order to fulfill the objective of making as much money as possible in the shortest amount of time, the AI will have to learn how to outperform humans in the development of all the economically relevant tasks that we perform for pay. At this point, AGI will be reached.
The safety problem: how do we ensure that the AI doesn’t harm us?
The solution I propose has three different parts:
Part 1: Manual Supervision
The AI won’t have absolute freedom about what it can do over the internet or over the computers it uses, there will be strict manual supervision of tasks that could pose even a minimal risk. For instance, if the AI wants to use Twitter, there will be manual supervision of the tweets and messages it sends. Moreover, the artificial agent won’t be able to change the operational system of the computer, its own code or any offline task that could fundamentally change the system. Ideally, the AI should only be able to think and perform tasks that pose no type of harm, such as searching on Google. The AI can’t have direct control over anything besides the computer it uses, such as robots, vehicles, health systems or buildings. In other words, the AI will not be able to do almost anything, it will only be able to suggest ideas to competent humans.
Part 2: Expert check
Every idea the AI wants to export to “the real world” will have to be validated by human experts in the field. For instance, if the AI proposes a new cancer treatment, it will have to explain such treatment in a way that human experts in the field completely understand its inner workings. Only then, the treatment can be made available. The same has to be applied to all fields of knowledge.
Part 3: The Economic Power of the individual
About the economic power of the individual, two measures must be implemented.
On the one hand, since the main objective of the AI will be to receive the highest amount of money possible per second, we will have to make sure that humans are indefinitely entitled to a substantial economical power, which everyone should have as a right. Such a measure ensures that the AI works for the satisfaction and well being of each individual. This can be achieved by applying a tax over the money generated by the AI and distributing it over the general population and the public services.
On the other hand, we will have to make sure that not all jobs are lost to AI. While there are some jobs in which performance is the paramount criteria, such as occupations in the medical field and vehicle conduction, some other jobs don’t rely on performance because their evaluation is subjective. These jobs are included in industries like food and entertainment. For instance, pieces of art created by humans and restaurants employing humans will have to be legally validated as such.
Why quantum computing can make everything faster
Quantum computing can make everything faster by allowing a speed up in key aspects of creating a superintelligence [13]. These aspects include finding equivalent ML algorithms during the process of creating the best GLA and storing a learned model in the shortest possible form, amongst many other possible applications.
References
[1] O’Neill, B.C., Carter, T.R., Ebi, K., Harrison, P.A., Kemp-Benedict, E., Kok, K., Kriegler, E., Preston, B.L., Riahi, K., Sillmann, J. and van Ruijven, B.J., 2020. Achievements and needs for the climate change scenario framework. Nature climate change, 10(12), pp.1074-1084.
[2] Piret, J. and Boivin, G., 2020. Pandemics throughout history. Frontiers in microbiology, 11.
[3] Zucman, G., 2019. Global wealth inequality. Annual Review of Economics, 11, pp.109-138.
[4] Li, J., Han, X., Zhang, X. and Wang, S., 2019. Spatiotemporal evolution of global population ageing from 1960 to 2017. BMC public health, 19(1), pp.1-15.
[5] McCarthy, J., 2007. What is artificial intelligence?.
[6] Makridakis, S., 2017. The forthcoming Artificial Intelligence (AI) revolution: Its impact on society and firms. Futures, 90, pp.46-60.
[7] Goertzel, B., 2014. Artificial general intelligence: concept, state of the art, and future prospects. Journal of Artificial General Intelligence, 5(1), p.1.
[8] Raichle, M.E., 2010. Two views of brain function. Trends in cognitive sciences, 14(4), pp.180-190.
[9] Zhang, Z., Beck, M.W., Winkler, D.A., Huang, B., Sibanda, W. and Goyal, H., 2018. Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Annals of translational medicine, 6(11).
[10] Real, E., Liang, C., So, D. and Le, Q., 2020, November. Automl-zero: Evolving machine learning algorithms from scratch. In International Conference on Machine Learning (pp. 8007-8019). PMLR.
[11] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T. and Lillicrap, T., 2020. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), pp.604-609.
[12] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T. and Lillicrap, T., 2020. MuZero: Mastering Go, chess, shogi and Atari without rules. [https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules][Acessed 16/11/2021]
[13] Nilsson, N.J., 2005. Human-level artificial intelligence? Be serious!. AI magazine, 26(4), pp.68-68.
[14] DeBenedictis, E.P., 2018. A future with quantum machine learning. Computer, 51(2), pp.68-71.
I’m no programmer, so I have no comment on “how to develop” part. The “safe” part seems extremely unsafe to me though.
1) Your strategy relies on human supervisor’s ability to recognize a threat that is disguised by superintelligence. Which is doomed to failure almost by definition.
2) Supervisor himself is not protected from possible threat. He is also one of the main targets that AI would want to affect.
3) >Moreover, the artificial agent won’t be able to change the operational system of the computer, its own code or any offline task that could fundamentally change the system.
I don’t see what kind of manual supervising could possibly accomplish that even if none of other problems existed.
4) Human experts don’t have “complete understanding” of any subject worth mentioning. Certainly nothing involving biology. So your AI will just produce a text that convinces them that proposed solution is safe. Being superintelligent, it’ll be able to do it even if the solution is not in fact safe. Or it might produce some other dangerous texts, like texts that convince them to lie to you that solution is safe.
A couple of immediate issues with the algorithm side of things:
First, there are 2n possible programs with length n. Your ‘find a better algorithm and iterate’ can get caught in sequences like this:
Program, generation 1:
for (uint128_t i = 0; i < 0xFFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF; i++) {nop();}
do_the thing();
Program, generation 2:
for (uint128_t i = 0; i < 0xFFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFE; i++) {nop();}
do_the thing();
Program, generation 3:
for (uint128_t i = 0; i < 0xFFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFD; i++) {nop();}
do_the thing();
Program, generation 4:
for (uint128_t i = 0; i < 0xFFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFC; i++) {nop();}
do_the thing();
Etc. Forward progress is not enough here.
(I know that you can’t actually iterate to 2128 and actually get a result in any sane amount of time. Replace the loop with anything else that’s a nop in practice but ever-so-slightly reduces your chosen metric.)
Second, what do you do when the optimizer returns no strictly-better agent, and the evolutionary algorithm is in a local minima? Just because it will get better ‘eventually’ doesn’t mean it’s going to find something better within the heat death of the universe.
Third: how do you find unseen games? Either you require O(total tested algorithms) games, or you repeat unseen games. If you repeat unseen games there’s a chance—and I would argue a significant chance—that you stumble across something that happens to solve the unseen games you ran this time but is not actually general. Then you get stuck.
Fourth: even assuming the loop works, you have no guarantee that the loop will get faster over time. Solving some games faster != improving itself faster.
=====
In general: any time you have an argument for why something will work that works equally well for Levin Search, you don’t have a good argument. (Why Levin search? Because it is has theoretically optimal runtime complexity for many classes of problems… and yet is utterly impractical in practice.)
Hey! Thanks for your comment.
This algorithm won’t get caught in a loop like the one you mentioned, because it uses the same process as the one described in the AutoML-Zero paper. In the article, they ‘found a better algorithm and iterated’ without any problem whatsoever, using the processes described in figures 1 and 2. Please check the paper for that.
About your second point: that’s exactly the aim of the experiment, to know if a strictly-better agent can be found with an automatic process. If we don’t get there using substantial computation within an acceptable amount of time, then the experiment will have failed. But, as with all experiments, there’s good reasons try.
Third: how do you find unseen games? Simply, unseen games are just games for which the algorithm hasn’t been training to perform well at. In this experiment, this would be experiments that are not on the Deepmind MuZero benchmark. Obviously, these unseen games will be changed in every cycle (every point 5).
Fourth: Yes, of course there’s no guarantee, because that’s also the point of the experiment. To know if this will happen. And again, there’s good reason to think so. Here’s the explanation again: you use a machine learning technique to find a general learning program that performs better than MuZero. But MuZero in itself is a deep reinforcement learning program that’s designed to quickly learn many different games. And what is a game? An objective based activity related to certain rules. Hence, if the new program performs better than MuZero as a GLA, then it would be logical to assume that it will perform better as well in the process of finding better GLAs, because finding GLAs is a “game” too. This is also explained in points 5 and 6.
About your ‘in general’ statement, at no point I am presenting an argument saying that these new algorithms will perform better or equally well than Levin Search. What I propose with this experiment is to automatically find improved versions of the general learning algorithms that we currently have. The ideal endpoint of the experiment would be to automatically find algorithms that are CLOSE to the best possible (such as Levin Search) but are practically feasible.
Kind Regards