Vanessa Kosoy comments on Realism about rationality

Vanessa Kosoy 20 Sep 2018 11:13 UTC
LW: 21 AF: 7
AF

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.

As a matter of fact, I emphatically do not agree. “Birds” are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a “theory of self-sustaining, self-replicating machines”.

Let’s consider a clearer example: cars. In order to build a car, it is very useful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective, especially if you don’t want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is “simple”: a spaceship or, let’s say, a gravity wave detector is much more complex than a car, and yet you hardly need less theory to make one.

And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon’s work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually is very hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually we will solve them). However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.

Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what “AI safety” means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior. At the very least, we could have provably safe but impractical machine learning protocols that would be an inspiration to more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.