...what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.
Once again, it seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!
I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance.
It plays a minor role in deep learning, in the sense that some “deep” algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot prove that it is possible to create an abstract theory of intelligence without actually creating the theory. However, the same could be said about any endeavor in history.
It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.
Mathematical models of evolution might help you to build better evolutions. In order to build better birds, you would need mathematical models of birds, which are going to be much more messy.
This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case.
I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason. Possibly I just don’t understand your position. In particular, I don’t know what epistemology is like in the world you imagine. Maybe it’s a subject for your next essay.
Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns
This seems to be confusing between objects and representations of objects. The assumption there is some mathematical theory at the core of human reasoning does not mean that a description of this mathematically theory should automatically exist in the conscious, symbol-manipulating part of the mind. You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.
There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).
The response might depend on the framing if you’re asked a question and given 10 seconds to answer it. If you’re allowed to deliberate on the question, and in particular consider alternative framings, the answer becomes more well-defined. However, even if it is ill-defined, it doesn’t really change anything. We can still ask the question “given the ability to optimize any utility function over the world now, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.
It seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!
...
It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.
It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.
I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason.
Sorry, my response was a little lazy, but at the same time I’m finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn’t seem to me that this implies it’s simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don’t have time to think too much more about this now; will cover it in a follow-up.
You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.
Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation—evidence which might be underweighted if people think of maths proofs as a representative example of reasoning.
We can still ask the question “given the ability to optimize any utility function over the world now, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.
I’m going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I’m not sure there’s any possible utility function which we’d actually be satisfied with maximising.
It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.
As a matter of fact, I emphatically do not agree. “Birds” are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a “theory of self-sustaining, self-replicating machines”.
Let’s consider a clearer example: cars. In order to build a car, it is very useful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective, especially if you don’t want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is “simple”: a spaceship or, let’s say, a gravity wave detector is much more complex than a car, and yet you hardly need less theory to make one.
And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon’s work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually is very hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually we will solve them). However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.
Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what “AI safety” means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior. At the very least, we could have provably safe but impractical machine learning protocols that would be an inspiration to more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.
Once again, it seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!
It plays a minor role in deep learning, in the sense that some “deep” algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot prove that it is possible to create an abstract theory of intelligence without actually creating the theory. However, the same could be said about any endeavor in history.
Mathematical models of evolution might help you to build better evolutions. In order to build better birds, you would need mathematical models of birds, which are going to be much more messy.
I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason. Possibly I just don’t understand your position. In particular, I don’t know what epistemology is like in the world you imagine. Maybe it’s a subject for your next essay.
This seems to be confusing between objects and representations of objects. The assumption there is some mathematical theory at the core of human reasoning does not mean that a description of this mathematically theory should automatically exist in the conscious, symbol-manipulating part of the mind. You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.
The response might depend on the framing if you’re asked a question and given 10 seconds to answer it. If you’re allowed to deliberate on the question, and in particular consider alternative framings, the answer becomes more well-defined. However, even if it is ill-defined, it doesn’t really change anything. We can still ask the question “given the ability to optimize any utility function over the world now, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.
It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.
Sorry, my response was a little lazy, but at the same time I’m finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn’t seem to me that this implies it’s simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don’t have time to think too much more about this now; will cover it in a follow-up.
Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation—evidence which might be underweighted if people think of maths proofs as a representative example of reasoning.
I’m going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I’m not sure there’s any possible utility function which we’d actually be satisfied with maximising.
As a matter of fact, I emphatically do not agree. “Birds” are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a “theory of self-sustaining, self-replicating machines”.
Let’s consider a clearer example: cars. In order to build a car, it is very useful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective, especially if you don’t want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is “simple”: a spaceship or, let’s say, a gravity wave detector is much more complex than a car, and yet you hardly need less theory to make one.
And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon’s work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually is very hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually we will solve them). However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.
Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what “AI safety” means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior. At the very least, we could have provably safe but impractical machine learning protocols that would be an inspiration to more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.