Part 2 - rant on LW culture about how to do research
Yesterday I wrote about my object-level updates resulting from me starting on an alignment program. Here I want to talk about a meta point on LW culture about how to do research.
Note: This is about “how things have affected me”, not “what other people have aimed to communicate”. I’m not aiming to pass other people’s ITTs or present the strongest versions of their arguments. I am rant-y at times. I think that’s OK and it is still worth it to put this out.
There’s this cluster of thoughts in LW that includes stuff like:
“I figured this stuff out using the null string as input”—Yudkowsky’s List of Lethalities
“The safety community currently is mostly bouncing off the hard problems and are spending most of their time working on safe, easy, predictable things that guarantee they’ll be able to publish a paper at the end.”—Zvi modeling Yudkowsky
“Alignment is different from usual science in that iterative empirical work doesn’t suffice”—a thought that I find in my head.
I’m having trouble putting it in words, but there’s just something about these memes that’s… just anti-helpful for making research? It’s really easy to interpret the comments above as things that I think are bad. (Proof: I have interpreted them in such a way.)
It’s this cluster that’s kind of suggesting, or at least easily interpreted as saying, “you should sit down and think about how to align a superintelligence”, as opposed to doing “normal research”.
And for me personally this has resulted in doing nothing or something just tangentially related to prevent AI doom. I’m actually not capable of just sitting down and deriving a method for aligning a superintelligence from the null string.
Leaving aside issues whether these things are kind or good for mental health or such, I just think these memes are a bad way about thinking how research works or how to make progress.
I’m pretty fond of the phrase “standing on the shoulders of giants”. Really, people extremely rarely figure stuff out from the ground or from the null string. The giants are pretty damn large. You should climb on top of them. In the real world, if there’s a guide for a skill you want to learn, you read it. I could write a longer rant about the null string thing, but let me leave it here.
About “the safety community currently is mostly bouncing off the hard problems and [...] publish a paper”: I’m not sure who “community” refers to. Sure, Ethical and Responsible AI doesn’t address AI killing everyone, and sure, publish or perish and all that. This is a different claim from “people should sit down and think how to align a superintelligence”. That’s the hard problem, and you are supposed to focus on that first, right?
Taking these together, what you get is something that’s opposite to what research usually looks like. The null string stuff pushes away from scholarship. The ”...that guarantee they’ll be able to publish a paper...” stuff pushes away from having well-scoped projects. The talk about iterative designs failing can be interpreted as pushing away from empirical sources of information. Focusing on the hard part first pushes away from learning from relaxations of the problem.
And I don’t think the “well alignment is different from science, iterative design and empirical feedback loops don’t suffice, so of course the process is different” argument is gonna cut it.
What made me make this update and notice the ways in how LW culture is anti-helpful was seeing how people do alignment research in real life. They actually rely a lot on prior work, improve on those, use empirical sources of information and do stuff that puts us into a marginally better position. Contrary to the memes above, I think this approach is actually quite good.
agreed on all points. and, I think there are kernels of truth from the things you’re disagreeing-with-the-implications-of, and those kernels of truth need to be ported to the perspective you’re saying they easily are misinterpreted as opposing. something like, how can we test the hard part first?
compare also physics—getting lost doing theory when you can’t get data does not have a good track record in physics despite how critically important theory has been in modeling data. but you also have to collect data that weighs on relevant theories so hypotheses can be eliminated and promising theories can be refined. machine learning typically is “make number go up” rather than “model-based” science, in this regard, and I think we do need to be doing model-based science to get enough of the right experiments.
on the object level, I’m excited about ways to test models of agency using things like particle lenia and neural cellular automata. I might even share some hacky work on that at some point if I figure out what it is I even want to test.
Yeah, I definitely grant that there are insights in the things I’m criticizing here. E.g. I was careful to phrase this sentence in this particular way:
The talk about iterative designs failing can be interpreted as pushing away from empirical sources of information.
Because yep, I sure agree with many points in the “Worlds Where Iterative Design Fails”. I’m not trying to imply the post’s point was “empirical sources of information are bad” or anything.
(My tone in this post is “here are bad interpretations I’ve made, watch out for those” instead of “let me refute these misinterpreted versions of other people’s arguments and claim I’m right”.)
On being able to predictably publish papers as a malign goal, one point is standards of publishability in existing research communities not matching what’s useful to publish for this particular problem (which used to be the case more strongly a few years ago). Aiming to publish for example on LessWrong fixes the issue in that case, though you mostly won’t get research grants for that. (The other point is that some things shouldn’t be published at all.)
In either case, I don’t see discouragement from building on existing work, it’s not building arguments out of nothing when you also read all the things as you come up with your arguments. Experimental grounding is crucial but not always possible, in which case giving up on the problem and doing something else doesn’t help with solving this particular problem, other than as part of the rising tide of basic research that can’t be aimed.
Part 2 - rant on LW culture about how to do research
Yesterday I wrote about my object-level updates resulting from me starting on an alignment program. Here I want to talk about a meta point on LW culture about how to do research.
Note: This is about “how things have affected me”, not “what other people have aimed to communicate”. I’m not aiming to pass other people’s ITTs or present the strongest versions of their arguments. I am rant-y at times. I think that’s OK and it is still worth it to put this out.
There’s this cluster of thoughts in LW that includes stuff like:
“I figured this stuff out using the null string as input”—Yudkowsky’s List of Lethalities
“The safety community currently is mostly bouncing off the hard problems and are spending most of their time working on safe, easy, predictable things that guarantee they’ll be able to publish a paper at the end.”—Zvi modeling Yudkowsky
There are worlds where iterative design fails
“Focus on the Hard Part First”
“Alignment is different from usual science in that iterative empirical work doesn’t suffice”—a thought that I find in my head.
I’m having trouble putting it in words, but there’s just something about these memes that’s… just anti-helpful for making research? It’s really easy to interpret the comments above as things that I think are bad. (Proof: I have interpreted them in such a way.)
It’s this cluster that’s kind of suggesting, or at least easily interpreted as saying, “you should sit down and think about how to align a superintelligence”, as opposed to doing “normal research”.
And for me personally this has resulted in doing nothing or something just tangentially related to prevent AI doom. I’m actually not capable of just sitting down and deriving a method for aligning a superintelligence from the null string.
(...to which one could respond with “reality doesn’t grade on a curve”, or that one is “frankly not hopeful about getting real alignment work” out of me, or other such memes.)
Leaving aside issues whether these things are kind or good for mental health or such, I just think these memes are a bad way about thinking how research works or how to make progress.
I’m pretty fond of the phrase “standing on the shoulders of giants”. Really, people extremely rarely figure stuff out from the ground or from the null string. The giants are pretty damn large. You should climb on top of them. In the real world, if there’s a guide for a skill you want to learn, you read it. I could write a longer rant about the null string thing, but let me leave it here.
About “the safety community currently is mostly bouncing off the hard problems and [...] publish a paper”: I’m not sure who “community” refers to. Sure, Ethical and Responsible AI doesn’t address AI killing everyone, and sure, publish or perish and all that. This is a different claim from “people should sit down and think how to align a superintelligence”. That’s the hard problem, and you are supposed to focus on that first, right?
Taking these together, what you get is something that’s opposite to what research usually looks like. The null string stuff pushes away from scholarship. The ”...that guarantee they’ll be able to publish a paper...” stuff pushes away from having well-scoped projects. The talk about iterative designs failing can be interpreted as pushing away from empirical sources of information. Focusing on the hard part first pushes away from learning from relaxations of the problem.
And I don’t think the “well alignment is different from science, iterative design and empirical feedback loops don’t suffice, so of course the process is different” argument is gonna cut it.
What made me make this update and notice the ways in how LW culture is anti-helpful was seeing how people do alignment research in real life. They actually rely a lot on prior work, improve on those, use empirical sources of information and do stuff that puts us into a marginally better position. Contrary to the memes above, I think this approach is actually quite good.
[edit: pinned to profile]
agreed on all points. and, I think there are kernels of truth from the things you’re disagreeing-with-the-implications-of, and those kernels of truth need to be ported to the perspective you’re saying they easily are misinterpreted as opposing. something like, how can we test the hard part first?
compare also physics—getting lost doing theory when you can’t get data does not have a good track record in physics despite how critically important theory has been in modeling data. but you also have to collect data that weighs on relevant theories so hypotheses can be eliminated and promising theories can be refined. machine learning typically is “make number go up” rather than “model-based” science, in this regard, and I think we do need to be doing model-based science to get enough of the right experiments.
on the object level, I’m excited about ways to test models of agency using things like particle lenia and neural cellular automata. I might even share some hacky work on that at some point if I figure out what it is I even want to test.
Yeah, I definitely grant that there are insights in the things I’m criticizing here. E.g. I was careful to phrase this sentence in this particular way:
Because yep, I sure agree with many points in the “Worlds Where Iterative Design Fails”. I’m not trying to imply the post’s point was “empirical sources of information are bad” or anything.
(My tone in this post is “here are bad interpretations I’ve made, watch out for those” instead of “let me refute these misinterpreted versions of other people’s arguments and claim I’m right”.)
On being able to predictably publish papers as a malign goal, one point is standards of publishability in existing research communities not matching what’s useful to publish for this particular problem (which used to be the case more strongly a few years ago). Aiming to publish for example on LessWrong fixes the issue in that case, though you mostly won’t get research grants for that. (The other point is that some things shouldn’t be published at all.)
In either case, I don’t see discouragement from building on existing work, it’s not building arguments out of nothing when you also read all the things as you come up with your arguments. Experimental grounding is crucial but not always possible, in which case giving up on the problem and doing something else doesn’t help with solving this particular problem, other than as part of the rising tide of basic research that can’t be aimed.