Good idea to write down what you think! As someone who is moving toward AI Safety, and who has spent all this year reading, studying, working and talking with people, I disagree with some of what you write.
First, I feel that your classification comes from your decision that deconfusion is most important. I mean, your second category literally only contains things you describe as “hacky” and “not theory-based” without much differentiation, and it’s clear that you believe theory and elegance (for lack of a better word) to be important. I also don’t think that the third cluster makes much sense, as you point out that most of it lies in the second one too. Even deconfusion aims at solving alignment, just in a different way.
A dimension I find more useful is the need for understanding what’s happening inside. This scale goes from MIRI’s embedded agency approach (on the “I need to understand everything about how the system works, at a mathematical level, to make it aligned” end) to prosaic AGI on the other side (on the “I can consider the system as a black box that behaves according to some incentives and build an architecture using it that ensures alignment” end). I like this dimension because I feel a lot of my gut-feeling about AI Safety research comes from my own perspectives on the value of “understanding what happens inside”, and how mathematical this understanding must be.
Here’s an interesting discussion of this distinction.
How do I classify the rest? Something like that:
On the embedded agency end of the spectrum, things like Vanessa’s research agenda and Stuart Armstrong’s research agenda. Probably anything that fits in agents foundations too.
In the middle, I think of DeepMind’s research about incentives, Evan Hubinger’s research about inner alignment and myopia, probably all the cool things about interpretability like the clarity team work at Open AI (see this post for an AI Safety perspective).
On the prosaic AGI end of the spectrum, I would put IDA and AI Safety via debate, Drexler’s CAIS, and probably most of CHAI’s published research (although I am less sure on that, and would be happy to be corrected)
Now, I want to say explicitly that this dimension is not a value scale. I’m not saying that either end is more valuable in of themselves. I’m simply pointing at what I think is an underlying parameter in why people work on what they work on. Personally, I’m more excited about the middle and the embedded agency end, but I still see value and am curious about the other end of the spectrum.
It’s easier to learn all the prerequisites for type-2 research and to actually do it.
I wholeheartedly disagree. I think you point out a very important aspect of AI Safety: the prerequisites are all over the place, sometimes completely different for different approaches. That being said, which prerequisites are easier to learn is more about personal fit and background than intrinsic difficulty. Is it inherently harder to learn model theory than statistical learning theory? Decision theory than Neural networks? What about psychology or philosphy. What you write feels like a judgement stemming from “Math is more demanding”. But try to understand all the interpretability stuff, and you’ll realize that even if it lacks a lot of deep mathematical theorems, it still requires a tremendous amount of work to grok.
(Also, as another argument against your groupings, even the mathematical snobs would not put Vanessa’s work in the “less prerequisite” section. I mean, she uses measure theory, bayesian statistics, online learning and category theory, among others!)
So my position is that thinking in terms of your beliefs about “how much one needs to understand the insides to make something work” will help you choose an approach to try your hand out. It’s also pretty cheap to just talk to people and try a bunch of different approaches to see which ones feel right.
A word on choosing the best approach: I feel like AI Safety as it is doesn’t help at all to do that. Because of the different prerequisites, understanding deeply any approach requires a time investment that triggers a sunk-cost fallacy when evaluating the value of this approach. Also, I think it’s very common to judge a certain approach from the perspective of one’s current approach, which might color this judgement in incorrect ways. The best strategy I know of, which I try to apply, is to try something and update regularly on its value compared to the rest.
(Also, as another argument against your groupings, even the mathematical snobs would not put Vanessa’s work in the “less prerequisite” section. I mean, she uses measure theory, bayesian statistics, online learning and category theory, among others!)
To be fair, he did put Vanessa in all three categories… :P
Good idea to write down what you think! As someone who is moving toward AI Safety, and who has spent all this year reading, studying, working and talking with people, I disagree with some of what you write.
First, I feel that your classification comes from your decision that deconfusion is most important. I mean, your second category literally only contains things you describe as “hacky” and “not theory-based” without much differentiation, and it’s clear that you believe theory and elegance (for lack of a better word) to be important. I also don’t think that the third cluster makes much sense, as you point out that most of it lies in the second one too. Even deconfusion aims at solving alignment, just in a different way.
A dimension I find more useful is the need for understanding what’s happening inside. This scale goes from MIRI’s embedded agency approach (on the “I need to understand everything about how the system works, at a mathematical level, to make it aligned” end) to prosaic AGI on the other side (on the “I can consider the system as a black box that behaves according to some incentives and build an architecture using it that ensures alignment” end). I like this dimension because I feel a lot of my gut-feeling about AI Safety research comes from my own perspectives on the value of “understanding what happens inside”, and how mathematical this understanding must be.
Here’s an interesting discussion of this distinction.
How do I classify the rest? Something like that:
On the embedded agency end of the spectrum, things like Vanessa’s research agenda and Stuart Armstrong’s research agenda. Probably anything that fits in agents foundations too.
In the middle, I think of DeepMind’s research about incentives, Evan Hubinger’s research about inner alignment and myopia, probably all the cool things about interpretability like the clarity team work at Open AI (see this post for an AI Safety perspective).
On the prosaic AGI end of the spectrum, I would put IDA and AI Safety via debate, Drexler’s CAIS, and probably most of CHAI’s published research (although I am less sure on that, and would be happy to be corrected)
Now, I want to say explicitly that this dimension is not a value scale. I’m not saying that either end is more valuable in of themselves. I’m simply pointing at what I think is an underlying parameter in why people work on what they work on. Personally, I’m more excited about the middle and the embedded agency end, but I still see value and am curious about the other end of the spectrum.
I wholeheartedly disagree. I think you point out a very important aspect of AI Safety: the prerequisites are all over the place, sometimes completely different for different approaches. That being said, which prerequisites are easier to learn is more about personal fit and background than intrinsic difficulty. Is it inherently harder to learn model theory than statistical learning theory? Decision theory than Neural networks? What about psychology or philosphy. What you write feels like a judgement stemming from “Math is more demanding”. But try to understand all the interpretability stuff, and you’ll realize that even if it lacks a lot of deep mathematical theorems, it still requires a tremendous amount of work to grok.
(Also, as another argument against your groupings, even the mathematical snobs would not put Vanessa’s work in the “less prerequisite” section. I mean, she uses measure theory, bayesian statistics, online learning and category theory, among others!)
So my position is that thinking in terms of your beliefs about “how much one needs to understand the insides to make something work” will help you choose an approach to try your hand out. It’s also pretty cheap to just talk to people and try a bunch of different approaches to see which ones feel right.
A word on choosing the best approach: I feel like AI Safety as it is doesn’t help at all to do that. Because of the different prerequisites, understanding deeply any approach requires a time investment that triggers a sunk-cost fallacy when evaluating the value of this approach. Also, I think it’s very common to judge a certain approach from the perspective of one’s current approach, which might color this judgement in incorrect ways. The best strategy I know of, which I try to apply, is to try something and update regularly on its value compared to the rest.
To be fair, he did put Vanessa in all three categories… :P