AI Alignment and Phenomenal Consciousness

Link post

NB: Originally posted on Map and Territory on Medium, so some of the internal series links go there.

In the initial feedback I’ve received on my attempt to formally state the AI alignment problem, the primary objection I’ve heard is that I assume any AI worth aligning will experience qualia. In particular, some think we have to worry about aligning AI that is cybernetic but not phenomenally conscious. I think they are mistaken in thinking cybernetic-only AGI is possible, but in all fairness this is probably due to a failure on my part to adequately explain phenomenal consciousness, so I’ll try here to make clearer the case for phenomenally conscious AGI and how this relates to AI alignment.

To begin, though, let’s step back and consider what we mean by “artificial intelligence”? The artificial part is easy: we mean something constructed through a deliberate design effort rather than arising naturally, i.e. it is the handiwork of conscious things rather than the outcome of an unconscious process. Thus artificial intelligence stands in contrast to natural intelligence, like the intelligence of animals that arose through evolution. The intelligence part is harder because by intelligence we mean multiple things: possession of goal directed behavior (telos), ability to model the world (ontology), and an ability to combine telos and ontology to come up with new solutions to problems (find undiscovered algorithms). Generally we might say intelligence is something like an ability to systematically increase local informational complexity — optimize the nearby parts of the world — by increasing global entropy. Taken together then, artificial intelligence and artificially intelligent agents can be said to be designed optimization processes.

This means, of course, that things as simple as steam engines are a kind of artificial intelligence even if they aren’t especially intelligent since a steam engine is increasing global entropy in the form of waste heat in order to produce mechanical power. And even if we put a governor on our steam engine it is still only cybernetic — a thing that experiences itself — and not phenomenally conscious — a thing that experiences itself experiencing itself — but, as we’ll see, it doesn’t take a lot to make our steam engine jump into the realm of phenomenal consciousness.

The steam engine with a governor, as I’ve previously explained, is a cybernetic thing that produces at least one bit of information about whether the throttle is open or closed. Although it would be needlessly complex, suppose we added a governor onto the governor to regulate how quickly the governor adjusts the throttle. This governor governor has a simple operation: if the throttle was open within the last second, it doesn’t allow the throttle to close and vice versa. In doing so it creates a kind of memory for the steam engine about the state of the throttle, generates ontology by interpreting and representing the state of the throttle within the governor governor, and experiences itself experiencing itself through the governor governor experiencing the governor experiencing the steam engine. These features all imply this modified steam engine is phenomenally conscious.

To consider an example closer to the edge of our capabilities in artificial intelligence, creating phenomenally consciousness with machine learning is also trivial. A simple machine learning algorithm is a cybernetic process that iterates over data to produce a non-cybernetic model (the model is not cybernetic because it is essentially a lookup table or function that does not experience itself). More complex machine learning algorithms can produce cybernetic models with memory and, depending on the implementation of the algorithm, this can make the algorithm phenomenally conscious. If a machine learning algorithm generates cybernetic models that generate cybernetic models or self improves, then it’s solidly in the realm of phenomenal consciousness. After all, the minimum requirement for phenomenal consciousness is little more than a loop nested inside another loop!

But just because phenomenal consciousness is easy to place in a system and is in use in today’s leading edge of AI development, we don’t necessarily want to create phenomenally conscious AI. Eliezer Yudkowsky, for example, has argued strongly against creating conscious AI for ethical and technical reasons, and although he was referring to a naive sort of consciousness rather than phenomenal consciousness, it would be safer to make AI that are less capable rather than more, so we only want to create phenomenally conscious AI if it’s necessary to our ends. Unfortunately, I think if we want to create AGI — artificial general intelligence — it will necessarily be phenomenally conscious.

Let’s step back again and consider what we mean by the “general” in AGI. “General” stands in opposition to “narrow” in AI where narrow AI are designed optimization processes that work only in one or a few domains, like chess or language translation. Artificial general intelligence, on the other had, is expected to work in arbitrarily many domains, including domains the AI has not been trained on, because it could presumably train itself the way humans can when it encounters novel scenarios or otherwise adapt to new situations. It might fail at first, but it will learn and grow in capability as it addresses a broader space of experiences. It is only such general AI that I posit must be phenomenally conscious, especially since we know by their existence that cybernetic-only narrow AI is possible.

Suppose we could create a cybernetic-only AGI. Such a thing, being not phenomenally conscious, would necessarily have no ontology or ability to model the world, so it would be a kind of philosophical zombie that behaves like a phenomenally conscious thing but is not. P-zombies are possible, but they are not cheap, with a p-zombie requiring exponentially more computational resources than a behaviorally equivalent phenomenally conscious thing in terms of the number of cases it would be expected to handle (Update 2018–03–20: I try to prove this formally). That is, a behaviorally equivalent p-zombie needs a separate cybernetic system to handle every situation it might find itself in because it can’t model the world and must have its “ontology” hard coded. So, if we could create a cybernetic-only AGI, how big would it be, both in terms of cybernetic subsystems and volume? My Fermi estimate:

  • There are on the order of 10,000 unique words needed to fully express ideas in any given human language.

  • A sentence in a human language has on the order of 10 words.

  • Assuming every sentence of 10 words in any language describes a unique scenario, a human-level AGI must handle at least 10,000¹⁰=10⁴⁰ scenarios.

  • Since 3 levels of recursion are enough for anyone, let’s conservatively suppose this means scenarios interact to create at least (10⁴⁰)³=10¹²⁰ situations an AGI must deal with, requiring 10¹²⁰ cybernetic subsystems.

  • AlphaZero can train to handle a new situation, like a new game, in the order of 1 hour, but to be conservative and assume future improvements let’s assume our AGI can train on a scenario in 1 minute.

  • That means our AGI needs to perform 10¹²⁰ minutes, or ~1.9×10¹¹⁴ years, worth of training to build models to handle all the scenarios it needs to be general.

  • Supposing we are willing to take on the order of 10 years to build our AGI and we can parallelize the training over that period, building an AGI would require 1.9×10¹¹⁴/​10=1.9×10¹¹³ computers each on the scale of AlphaZero.

  • It’s unclear what AlphaZero’s computational needs are, but AlphaGo Zero apparently runs on only a single server with 4 TPUs. Let’s conservatively assume this means we need 1U of rack space or ~15,000cm³ or ~0.015m³ to train a model to handle each scenario.

  • So to get an AGI you need 1.9×10¹¹³×0.015m³=1.85×10¹¹¹m³ of compute, not leaving space for cooling, power, etc.

  • The Earth has a volume of ~1.1×10²¹m³, so our AGI would require 1.68×10⁹⁰ Earths worth of computers.

I’m sure we could make these calculations more accurate, but that’s not the point of a Fermi estimate; the point is to show the scale of building an AGI as capable of a human that is also a p-zombie, even if my calculations are wrong by several orders of magnitude. If we tweaked the numbers to be maximally favorable to building p-zombies, taking into account improvements in technology and the problem being easier than I think it is, we would still end up with needing more than an entire Earth’s worth of computers to do it. Building a human-level p-zombie AGI would be asking for a planet-sized brain, and we know how that would turn out.

Jokes aside, I don’t go through this exercise to poke fun at the idea we could build an AGI that is not phenomenally conscious. My objective is to stress that any practical AGI project will necessarily be looking at building something phenomenally conscious because it’s the only way to fit the amount of complexity needed into a reasonable amount of resources. I don’t think people working on AI capability are confused about this: they know that giving systems what I call phenomenal consciousness allows them to do more work with less resources, and doing this seems to be the direction they will naturally go, even with narrow AI, as cybernetic-only solutions become prohibitively expensive to improve. But if AI safety researchers were hoping for cybernetic-only AGI, it alas seems it will definitely be phenomenally conscious.

That said, seed AI might allow us to avoid phenomenally conscious AGI for a while. The idea of seed AI is to first create a simple AI system that will bootstrap itself into a more powerful one by improving itself or designing its successor. Maybe we could design seed AI that is cybernetic-only and let the seed AI take over the responsibility of designing phenomenally conscious AGI. In such a scenario, might we need to consider the alignment of cybernetic-only AI?

In short, no. We would be interested in designing a seed AI such that it used the results of alignment research so that any more powerful, phenomenally conscious AGI it created would be aligned, but the seed AI itself would not need alignment because, to make a broader point, there is a sense in which something that is not phenomenally conscious cannot be aligned because it does not value anything because it doesn’t know what anything is. To put it another way, we can’t build tools — things we use for a particular purpose that are phenomenally unconscious — that cannot be misused because they lack the complexity necessary to even notice they are being misused, let alone do something to avoid that.

To give an example, a crowbar lacks the complexity to know if it’s being used to open a crate or break into a building, much less know if opening the crate or breaking into the building is “good” or “bad”. Yes, we could make fancy, cybernetic crowbars with tiny computers that noticed what they were being used for and would stop working when they detected a “bad” scenario, but being cybernetic-only it would not be general and only able to handle situations it was trained to recognize. Use the crowbar in a novel situation and it may promptly become “unaligned” because it was never “aligned” in the first place: it just did what it was designed to do, even if that design included complex, narrow AI that worked in a lot of cases. If you want to build aligned things, AGI or otherwise, you have to make them phenomenally conscious because that’s the only way the thing can possibly share the operator’s values in general.

I hope this makes clear why I think AGI will be phenomenally conscious and why AI alignment is a problem about phenomenally conscious agents. I invite further feedback on developing these ideas, so please comment or reach out with your thoughts, especially if you disagree.