I like this pushback, and I’m a fan of productive mistakes. I’ll have a think about how to rephrase to make that clearer. Maybe there’s just a communication problem, where it’s hard to tell the difference between people claiming “I have an insight (or proto-insight) which will plausibly be big enough to solve the alignment problem”, versus “I have very little traction on the alignment problem but this direction is the best thing I’ve got”. If the only effect of my post is to make a bunch of people say “oh yeah, I meant the second thing all along”, then I’d be pretty happy with that.
Why do I care about this? It has uncomfortable tinges of status regulation, but I think it’s important because there are so many people reading about this research online, and trying to find a way into the field, and often putting the people already in the field on some kind of intellectual pedestal. Stating clearly the key insights of a given approach, and their epistemic status, will save them a whole bunch of time. E.g. it took me ages to work through my thoughts on myopia in response to Evan’s posts on it, whereas if I’d known it hinged on some version of the insight I mentioned in this post, I would have immediately known why I disagreed with it.
As an example of (I claim) doing this right, see the disclaimer on my “shaping safer goals” sequence: “Note that all of the techniques I propose here are speculative brainstorming; I’m not confident in any of them as research directions, although I’d be excited to see further exploration along these lines.” Although maybe I should make this even more prominent.
Lastly, I don’t think I’m actually comparing Darwin and Einstein’s mature theories to Turing’s incomplete theory. As I understand it, their big insights required months or years of further work before developing into mature theories (in Darwin’s case, literally decades).
I like this pushback, and I’m a fan of productive mistakes. I’ll have a think about how to rephrase to make that clearer. Maybe there’s just a communication problem, where it’s hard to tell the difference between people claiming “I have an insight (or proto-insight) which will plausibly be big enough to solve the alignment problem”, versus “I have very little traction on the alignment problem but this direction is the best thing I’ve got”. If the only effect of my post is to make a bunch of people say “oh yeah, I meant the second thing all along”, then I’d be pretty happy with that.
When phrased like that, I agree with you. I am personally relatively suspicious of claims by a bunch of people to have found a path to alignment, but actually excited by some of their productive mistakes (as discussed a bit in my post).
I also fully agree that I want people to use the second, and my “history of alignment” research direction aims at concretely teasing the productive mistakes and revealed bits of evidence without falling for the “this is obviously a solution” or “this is obviously not a solution and thus useless”.
Why do I care about this? It has uncomfortable tinges of status regulation, but I think it’s important because there are so many people reading about this research online, and trying to find a way into the field, and often putting the people already in the field on some kind of intellectual pedestal. Stating clearly the key insights of a given approach, and their epistemic status, will save them a whole bunch of time. E.g. it took me ages to work through my thoughts on myopia in response to Evan’s posts on it, whereas if I’d known it hinged on some version of the insight I mentioned in this post, I would have immediately known why I disagreed with it.
+1000. And teasing out more generally the assumptions, the insights, the new parts of works and approach is I think super necessary and on my research agenda. That’s also part of the reason why I feel asking newcomers to be distillers is not necessarily a great idea: good distillation of the type we’re discussing requires IMO quite a deep understanding of the landscape, the problem and the underlying ideas. Otherwise you at best get a decent summary, and we need more.
As an example of (I claim) doing this right, see the disclaimer on my “shaping safer goals” sequence: “Note that all of the techniques I propose here are speculative brainstorming; I’m not confident in any of them as research directions, although I’d be excited to see further exploration along these lines.” Although maybe I should make this even more prominent.
Haven’t reread your sequence in quite some time, but I think the value of such exploratory sequence is to make clearer the intuitions underlying the direction, even if they haven’t lead yet to productive mistakes. So I like your disclaimer, but I think the even better way of doing this is to clarify for different posts and ideas what are the intuitions you’re building on and where the current formalims/descriptions/analogies are failing to capture them.
Lastly, I don’t think I’m actually comparing Darwin and Einstein’s mature theories to Turing’s incomplete theory. As I understand it, their big insights required months or years of further work before developing into mature theories (in Darwin’s case, literally decades).
This might also be a bit of miscommunication, but I felt like your discussion of Turing could also have applied especially in Darwin’s case, where the initial insight required a lot of additional pieces and clarification to make a clean and ordered theory that you can actually defend. Generally I was pointing at the risk of hindsight bias, where the fact that the insight is clean and powerful once the full theory is known and considered didn’t mean it was so compelling at the time it was thought of. (Which is also a general empirical claim about the history of scientific progress, to explore ;) )
I like this pushback, and I’m a fan of productive mistakes. I’ll have a think about how to rephrase to make that clearer. Maybe there’s just a communication problem, where it’s hard to tell the difference between people claiming “I have an insight (or proto-insight) which will plausibly be big enough to solve the alignment problem”, versus “I have very little traction on the alignment problem but this direction is the best thing I’ve got”. If the only effect of my post is to make a bunch of people say “oh yeah, I meant the second thing all along”, then I’d be pretty happy with that.
Why do I care about this? It has uncomfortable tinges of status regulation, but I think it’s important because there are so many people reading about this research online, and trying to find a way into the field, and often putting the people already in the field on some kind of intellectual pedestal. Stating clearly the key insights of a given approach, and their epistemic status, will save them a whole bunch of time. E.g. it took me ages to work through my thoughts on myopia in response to Evan’s posts on it, whereas if I’d known it hinged on some version of the insight I mentioned in this post, I would have immediately known why I disagreed with it.
As an example of (I claim) doing this right, see the disclaimer on my “shaping safer goals” sequence: “Note that all of the techniques I propose here are speculative brainstorming; I’m not confident in any of them as research directions, although I’d be excited to see further exploration along these lines.” Although maybe I should make this even more prominent.
Lastly, I don’t think I’m actually comparing Darwin and Einstein’s mature theories to Turing’s incomplete theory. As I understand it, their big insights required months or years of further work before developing into mature theories (in Darwin’s case, literally decades).
When phrased like that, I agree with you. I am personally relatively suspicious of claims by a bunch of people to have found a path to alignment, but actually excited by some of their productive mistakes (as discussed a bit in my post).
I also fully agree that I want people to use the second, and my “history of alignment” research direction aims at concretely teasing the productive mistakes and revealed bits of evidence without falling for the “this is obviously a solution” or “this is obviously not a solution and thus useless”.
+1000. And teasing out more generally the assumptions, the insights, the new parts of works and approach is I think super necessary and on my research agenda. That’s also part of the reason why I feel asking newcomers to be distillers is not necessarily a great idea: good distillation of the type we’re discussing requires IMO quite a deep understanding of the landscape, the problem and the underlying ideas. Otherwise you at best get a decent summary, and we need more.
Haven’t reread your sequence in quite some time, but I think the value of such exploratory sequence is to make clearer the intuitions underlying the direction, even if they haven’t lead yet to productive mistakes. So I like your disclaimer, but I think the even better way of doing this is to clarify for different posts and ideas what are the intuitions you’re building on and where the current formalims/descriptions/analogies are failing to capture them.
This might also be a bit of miscommunication, but I felt like your discussion of Turing could also have applied especially in Darwin’s case, where the initial insight required a lot of additional pieces and clarification to make a clean and ordered theory that you can actually defend. Generally I was pointing at the risk of hindsight bias, where the fact that the insight is clean and powerful once the full theory is known and considered didn’t mean it was so compelling at the time it was thought of. (Which is also a general empirical claim about the history of scientific progress, to explore ;) )