This post really helped me make concrete some of the admittedly gut reaction type concerns/questions/misunderstandings I had about alignment research, thank you. I have a few thoughts after reading:
(1) I wonder how different some of these epistemic strategies are from everyday normal scientific research in practice. I do experimental neuroscience and I would argue that we also are not even really sure what the “right” questions are (in a local sense, as in, what experiment should I do next), and so we are in a state where we kinda fumble around using whatever inspiration we can. The inspiration can take many forms—philosophical, theoretical, emperical, a very simple model, thought experiments of various kinds, ideas or experimental results with an aesthetic quality. It is true that at the end of the day brain’s already exist, so we have that to probe, but I’d argue that we don’t have a great handle on what exactly is the important thing to look at in brains, nor in what experimental contexts we should be looking at them, so it’s not immediately obvious what type of models, experiments, or observations we should be doing. What ends up happening is, I think, a lot of the types of arguments you mention. For instance, trying to make a story using the types of tasks we can run in the lab but applying to more complicated real world scenarios (or vice versa), and these arguments often take a less-than-totally-formal form. There is an analagous conversation occuring within neuroscience that takes the form of “does any of this work even say anything about how the brain works?!”
(2) You used theoretical computer science as your main example but it sounds to me like the epistemic strategies one might want in alignment research are more generally found in pure mathematics. I am not a mathematician but I know a few, and I’m always really intrigued by the difference in how they go about problem solving compared to us scientists.
This post really helped me make concrete some of the admittedly gut reaction type concerns/questions/misunderstandings I had about alignment research, thank you.
Glad it helped! That was definitely one goal, the hardest to check with early feedback because I mostly know people who already work in the field or have never been confronted to it, while you’re in the middle. :)
I wonder how different some of these epistemic strategies are from everyday normal scientific research in practice.
Completely! One thing I tried to make clear in this draft (maybe not successfully given your comment ^^) is that many field, including natural sciences, leverage far more epistemic strategies than the Popperian “make a model, predict something and test it in the real world”. My points are more that:
Alignment is particularly weird in terms of epistemic strategies because neither the problem nor the technology exists
Given the potential urgency of alignment, it’s even more important to clarify these epistemic subtleties.
But I’m convinced that almost all disciplines could be the subject of a deep study in the methods people use to wrestle knowledge from the world. That’s part of my hopes, since I want to steal epistemic strategies from many different fields and see how they apply to alignment.
It is true that at the end of the day brain’s already exist, so we have that to probe, but I’d argue that we don’t have a great handle on what exactly is the important thing to look at in brains, nor in what experimental contexts we should be looking at them, so it’s not immediately obvious what type of models, experiments, or observations we should be doing. What ends up happening is, I think, a lot of the types of arguments you mention. For instance, trying to make a story using the types of tasks we can run in the lab but applying to more complicated real world scenarios (or vice versa), and these arguments often take a less-than-totally-formal form. There is an analagous conversation occuring within neuroscience that takes the form of “does any of this work even say anything about how the brain works?!”
Fascinating! Yeah, I agree with you that the analogy definitely exists, particularly with fundamental science. And that’s part of the difficulty in alignment. Maybe the best comparison would be trying to cure a neurological pathology without having access to a human brain, but only to brains of individuals of very old species in our evolutionary lineage. It’s harder, but linking the experimental results to the concrete object is still part of the problem.
(Would be very interested in having a call about the different epistemic strategies you use in experimental neuroscience by the way)
(2) You used theoretical computer science as your main example but it sounds to me like the epistemic strategies one might want in alignment research are more generally found in pure mathematics. I am not a mathematician but I know a few, and I’m always really intrigued by the difference in how they go about problem solving compared to us scientists.
So I disagree, but you’re touching on a fascinating topic, one that confused me for the longest time.
My claim is that pure maths is fundamentally the study of abstraction (Platonists would disagree, but that’s more of a minority position nowadays). Patterns is also a word commonly used when mathematicians wax poetic. What this means is that pure maths studies ways of looking at the world, of representing it. But, and that’s crucial for our discussion, pure maths doesn’t care about how to apply these representations to the world itself. When pure mathematicians study concrete systems, it’s often to get exciting abstractions out of them, but they don’t check those abstractions against the world again—they study the abstraction in and of themselves.
The closest pure maths gets to caring about how to apply abstractions is when using one type of abstractions to understand another. The history of modern mathematics is full of dualities, representation theorems, correspondences and tricks for seeing one abstraction through the prism of another. By the way that’s one of the most useful aspect of maths in my opinion: if you manage to formalize your intuition/ideas as a well-studied mathematical abstraction, then you get for free a lot of powerful tools and ways of looking at it. But maths itself doesn’t tell you how to do the formalization.
On the other hand, theoretical computer scientists are in this weird position where they almost only work in the abstract world reminiscent of pure maths, but their focus is on an aspect of the real world: computation, its possibilities and its limits. TCS doesn’t care about a cool abstractions if it doesn’t unlock some nice insight about computation.
The post already have some nice examples, but here is another one I like: defining the class of tractable problems. You want to delineate the problems that can be solved in practice, in the sense that the time taken to solve them grows slowly enough with the size of the input that solving for massive inputs is a matter of days, maybe weeks or months of computation, but not thousands of years. Yet there is a fundamental tension between a category that fits the problems for which we have fast algorithms, and mathematical elegance/well-behavedness.
So complexity theorists reached for a compromise with P, the class of problems solvable in polynomial time. Polynomials are nice because a linear sum of them is still a polynomial, which means among other things that if solving problem A is the same than solving problem B 3 times then problem C 2 times, and both B and C are in P, then A is also in P. Yet polynomials can grow pretty damn fast. If n is the size of the input, a problem who can be solved in time growing like n10100 is technically in P, but is completely intractable. The argument for why this is not an issue is that in practice, all the problems we can prove are in P have an algorithm with complexity at most growing like n3, which is the limit of tractable.
So much of the reasoning I presented in the previous paragraphs is reminiscent of troubles and subtleties in alignment, but so different from pure maths, that I really believe TCS is more relevant to the epistemic strategies of alignment than pure maths. Which doesn’t mean maths doesn’t prove crucial when instanciating these strategies, for example by proving theorems we need.
This post really helped me make concrete some of the admittedly gut reaction type concerns/questions/misunderstandings I had about alignment research, thank you. I have a few thoughts after reading:
(1) I wonder how different some of these epistemic strategies are from everyday normal scientific research in practice. I do experimental neuroscience and I would argue that we also are not even really sure what the “right” questions are (in a local sense, as in, what experiment should I do next), and so we are in a state where we kinda fumble around using whatever inspiration we can. The inspiration can take many forms—philosophical, theoretical, emperical, a very simple model, thought experiments of various kinds, ideas or experimental results with an aesthetic quality. It is true that at the end of the day brain’s already exist, so we have that to probe, but I’d argue that we don’t have a great handle on what exactly is the important thing to look at in brains, nor in what experimental contexts we should be looking at them, so it’s not immediately obvious what type of models, experiments, or observations we should be doing. What ends up happening is, I think, a lot of the types of arguments you mention. For instance, trying to make a story using the types of tasks we can run in the lab but applying to more complicated real world scenarios (or vice versa), and these arguments often take a less-than-totally-formal form. There is an analagous conversation occuring within neuroscience that takes the form of “does any of this work even say anything about how the brain works?!”
(2) You used theoretical computer science as your main example but it sounds to me like the epistemic strategies one might want in alignment research are more generally found in pure mathematics. I am not a mathematician but I know a few, and I’m always really intrigued by the difference in how they go about problem solving compared to us scientists.
Thanks!
Thanks for the kind words and thoughtful comment!
Glad it helped! That was definitely one goal, the hardest to check with early feedback because I mostly know people who already work in the field or have never been confronted to it, while you’re in the middle. :)
Completely! One thing I tried to make clear in this draft (maybe not successfully given your comment ^^) is that many field, including natural sciences, leverage far more epistemic strategies than the Popperian “make a model, predict something and test it in the real world”. My points are more that:
Alignment is particularly weird in terms of epistemic strategies because neither the problem nor the technology exists
Given the potential urgency of alignment, it’s even more important to clarify these epistemic subtleties.
But I’m convinced that almost all disciplines could be the subject of a deep study in the methods people use to wrestle knowledge from the world. That’s part of my hopes, since I want to steal epistemic strategies from many different fields and see how they apply to alignment.
Fascinating! Yeah, I agree with you that the analogy definitely exists, particularly with fundamental science. And that’s part of the difficulty in alignment. Maybe the best comparison would be trying to cure a neurological pathology without having access to a human brain, but only to brains of individuals of very old species in our evolutionary lineage. It’s harder, but linking the experimental results to the concrete object is still part of the problem.
(Would be very interested in having a call about the different epistemic strategies you use in experimental neuroscience by the way)
So I disagree, but you’re touching on a fascinating topic, one that confused me for the longest time.
My claim is that pure maths is fundamentally the study of abstraction (Platonists would disagree, but that’s more of a minority position nowadays). Patterns is also a word commonly used when mathematicians wax poetic. What this means is that pure maths studies ways of looking at the world, of representing it. But, and that’s crucial for our discussion, pure maths doesn’t care about how to apply these representations to the world itself. When pure mathematicians study concrete systems, it’s often to get exciting abstractions out of them, but they don’t check those abstractions against the world again—they study the abstraction in and of themselves.
The closest pure maths gets to caring about how to apply abstractions is when using one type of abstractions to understand another. The history of modern mathematics is full of dualities, representation theorems, correspondences and tricks for seeing one abstraction through the prism of another. By the way that’s one of the most useful aspect of maths in my opinion: if you manage to formalize your intuition/ideas as a well-studied mathematical abstraction, then you get for free a lot of powerful tools and ways of looking at it. But maths itself doesn’t tell you how to do the formalization.
On the other hand, theoretical computer scientists are in this weird position where they almost only work in the abstract world reminiscent of pure maths, but their focus is on an aspect of the real world: computation, its possibilities and its limits. TCS doesn’t care about a cool abstractions if it doesn’t unlock some nice insight about computation.
The post already have some nice examples, but here is another one I like: defining the class of tractable problems. You want to delineate the problems that can be solved in practice, in the sense that the time taken to solve them grows slowly enough with the size of the input that solving for massive inputs is a matter of days, maybe weeks or months of computation, but not thousands of years. Yet there is a fundamental tension between a category that fits the problems for which we have fast algorithms, and mathematical elegance/well-behavedness.
So complexity theorists reached for a compromise with P, the class of problems solvable in polynomial time. Polynomials are nice because a linear sum of them is still a polynomial, which means among other things that if solving problem A is the same than solving problem B 3 times then problem C 2 times, and both B and C are in P, then A is also in P. Yet polynomials can grow pretty damn fast. If n is the size of the input, a problem who can be solved in time growing like n10100 is technically in P, but is completely intractable. The argument for why this is not an issue is that in practice, all the problems we can prove are in P have an algorithm with complexity at most growing like n3, which is the limit of tractable.
So much of the reasoning I presented in the previous paragraphs is reminiscent of troubles and subtleties in alignment, but so different from pure maths, that I really believe TCS is more relevant to the epistemic strategies of alignment than pure maths. Which doesn’t mean maths doesn’t prove crucial when instanciating these strategies, for example by proving theorems we need.