I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess. I do think it takes an unusual skillset, though, which is where most of the trouble lives. I.e., I think the pre-paradigmatic skillset requires unusually strong epistemics (because you often need to track for yourself what makes sense), ~creativity (the ability to synthesize new concepts, to generate genuinely novel hypotheses/ideas), good ability to traverse levels of abstraction (connecting details to large level structure, this is especially important for the alignment problem), not being efficient market pilled (you have to believe that more is possible in order to aim for it), noticing confusion, and probably a lot more that I’m failing to name here.
Most importantly, though, I think it requires quite a lot of willingness to remain confused. Many scientists who accomplished great things (Darwin, Einstein) didn’t have publishable results on their main inquiry for years. Einstein, for instance, talks about wandering off for weeks in a state of “psychic tension” in his youth, it took ~ten years to go from his first inkling of relativity to special relativity, and he nearly gave up at many points (including the week before he figured it out). Figuring out knowledge at the edge of human understanding can just be… really fucking brutal. I feel like this is largely forgotten, or ignored, or just not understood. Partially that’s because in retrospect everything looks obvious, so it doesn’t seem like it could have been that hard, but partially it’s because almost no one tries to do this sort of work, so there aren’t societal structures erected around it, and hence little collective understanding of what it’s like.
Anyway, I suspect there are really strong selection pressures for who ends up doing this sort of thing, since a lot needs to go right: smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on. Indeed, the last point seems important to me—many great scientists are obsessed. Spend night and day on it, it’s in their shower thoughts, can’t put it down kind of obsessed. And I suspect this sort of has to be true because something has to motivate them to go against every conceivable pressure (social, financial, psychological) and pursue the strange meaning anyway.
I don’t think the EA pipeline is much selecting for pre-paradigmatic scientists, but I don’t think lack of trying to get physicists to work on alignment is really the bottleneck either. Mostly I think selection effects are very strong, e.g., the Sequences was, imo, one of the more effective recruiting strategies for alignment. I don’t really know what to recommend here, but I think I would anti-recommend putting all the physics post-docs from good universities in a room in the hope that they make progress. Requesting that the world write another book as good as the Sequences is a… big ask, although to the extent it’s possible I expect it’ll go much further in drawing people out who will self select into this rather unusual “job.”
This is the sort of thing I find appealing to believe, but I feel at least somewhat skeptical of. I notice a strong emotional pull to want this to be true (as well as an interesting counterbalancing emotional pull for it to not be true).
I don’t think I’ve seen output from the people aspiring in this direction without being visibly quite smart to make me think “okay yeah it seems like it’s on track in some sense.”
I’d be interested in hearing more explicit cruxes from you about it.
I do think it’s plausible than the “smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on” are sufficient (if you’re at least moderately-but-not-exceptionally-smart). Those are rare enough qualities that it doesn’t necessarily feel like I’m getting a free lunch, if they turn out to be sufficient for groundbreaking pre-paradigmatic research. I agree the x-risk pipeline hasn’t tried very hard to filter for and/or generate people with these qualities.
(well, okay, “smart enough” is doing a lot of work there, I assume from context you mean “pretty smart but not like genius smart”)
But, I’ve only really seen you note positive examples, and this seems like the sort of thing that’d have a lot of survivorship bias. There can be tons of people obsessed, but not necessarily on the right things, and if you’re not naturally the right cluster of obsessed + smart-in-the-right-way, I don’t know whether trying to cultivate the obsession on purpose will really work.
I do nonetheless overall probably prefer people who have all your listed qualities, and who also either can:
a) self-fund to pursue the research without having to make it legible to others b) somehow figure out a way to make it legible along the way
I probably prefer those people to tackle “the hard parts of alignment” over many other things they could be doing, but not overwhelmingly obviously (and I think it should come with a background awareness that they are making a gamble, and if they aren’t the sort of person who must make that gamble due to their personality makeup, they should be prepared for the (mainline) outcome that it just doesn’t work out)
I think this is right. A couple of follow-on points:
There’s a funding problem if this is an important route to progress. If good work is illegible for years, it’s hard to decide who to fund, and hard to argue for people to fund it. I don’t have a proposed solution, but I wanted to note this large problem.
Einstein did his pre-paradigmatic work largely alone. Better collaboration might’ve sped it up.
LessWrong allows people to share their thoughts prior to having publishable journal articles and get at least a few people to engage.
This makes the difficult pre-paradigmatic thinking a group effort instead of a solo effort. This could speed up progress dramatically.
This post and the resulting comments and discussions is an example of the community collectively doing much of the work you describe: traversing levels, practicing good epistemics, and remaining confused.
Having conversations with other LWers (on calls, by DM, or in extended comment threads) is tremendously useful for me. I could produce those same thoughts and critiques, but it would take me longer to arrive at all of those different viewpoints of the issue. I mention this to encourage others to do it. Communication takes time and some additional effort (asking people to talk), but it’s often well worth it. Talking to people who are interested in and knowledgeable on the same topics can be an enormous speedup in doing difficult pre-paradigmatic thinking.
LessWrong isn’t perfect, but it’s a vast improvement on the collaboration tools and communities that have been available to scientists in other fields. We should take advantage of it.
I know very little, but there’s a fun fact here: “During their lifetimes, Darwin sent at least 7,591 letters and received 6,530; Einstein sent more than 14,500 and received more than 16,200.” (Not sure what fraction was technical vs personal.)
Also, this is a brief summary of Einstein’s mathematician friend Marcel Grossmann’s role in general relativity.
In the piece you linked, it sounds like Einstein had the correct geometry for general relativity one day after he asked for help finding one. Of course, that’s one notable success amongst perhaps a lot of collaboration. The number of letters he sent and received implies that he actually did a lot of written collaboration.
I wonder about the value of real-time conversation vs. written exchanges. And the value of being fully engaged; truly curious about your interlocutor’s ideas.
My own experience watching progress happen (and not-happen) in theoretical neuroscience is that fully engaged conversations with other true experts with different viewpoints was rare and often critical for real progress.
My perception is that those conversations are tricky to produce. Experts are often splitting their attention between impressing people and coolheaded, openminded discussion. And they weren’t really seeking out these conversations, just having them when it was convenient, and being really fully engaged only when the interpersonal vibe happened to be right. Even so, the bit of real conversation I saw seemed quite important.
It would be helpful to understand collaboration on difficult theory better, but it would be a whole research topic.
I think the qualitive difference is not as large as you think it is. But I also don’t think this is very crux-y for anything, so I will not try to figure out how to translate my reasoning to words, sorry.
I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess.
To be clear, I wasn’t talking about physics postdocs mainly because of raw g. Raw g is a necessary element, and physics postdocs are pretty heavily loaded on it, but I was talking about physics postdocs mostly because of the large volume of applied math tools they have.
The usual way that someone sees footholds on the hard parts of alignment is to have a broad enough technical background that they can see some analogy to something they know about, and try borrowing tools that work on that other thing. Thus the importance of a large volume of technical knowledge.
Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)
”Broad technical knowledge” should be in some sense the “easiest” (not in terms of time-investment, but in terms of predictable outcomes), by reading lots of textbooks (using similar material as your study guide).
Writing/communication, while more vague, should also be learnable by just writing a lot of things, publishing them on the internet for feedback, reflecting on your process etc.
Something like “solving novel problems” seems like a much “harder” one. I don’t know if this is a skill with a simple “core” or a grab-bag of tactics. Textbook problems take on a “meant-to-be-solved” flavor and I find one can be very good at solving these without being good at tackling novel problems. Another thing I notice is that when some people (myself included) try solving novel problems, we can end up on a path which gets there eventually, but if given “correct” feedback integration would go OOM faster.
I’m sure there are other vague-skills which one ends up picking up from a physics PhD. Can you name others, and how one picks them up intentionally? Am I asking the wrong question?
I currently think broad technical knowledge is the main requisite, and I think self-study can suffice for the large majority of that in principle. The main failure mode I see would-be autodidacts run into is motivation, but if you can stay motivated then there’s plenty of study materials.
For practice solving novel problems, just picking some interesting problems (preferably not AI) and working on them for a while is a fine way to practice.
Why not AI? Is it that AI alignment is too hard? Or do you think it’s likely one would fall into the “try a bunch of random stuff” paradigm popular in AI, which wouldn’t help much in getting better at solving hard problems?
What do you think about the strategy of instead of learning a textbook e.g. on information theory, or compilers you try to write the textbook and only look at existing material if you are really stuck. That’s my primary learning strategy.
It’s very slow and I probably do it too much, but it allows me to train to solve hard problems that aren’t super hard. If you read all the text books all the practice problems remaining are very hard.
(That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn’t come across from the post.)
Agreed. Simply focusing on physics post-docs feels too narrow to me.
Then again, just as John has a particular idea of what good alignment research looks like, I have my own idea: I would lean towards recruiting folk with both a technical and a philosophical background. It’s possible that my own idea is just as narrow.
I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess. I do think it takes an unusual skillset, though, which is where most of the trouble lives. I.e., I think the pre-paradigmatic skillset requires unusually strong epistemics (because you often need to track for yourself what makes sense), ~creativity (the ability to synthesize new concepts, to generate genuinely novel hypotheses/ideas), good ability to traverse levels of abstraction (connecting details to large level structure, this is especially important for the alignment problem), not being efficient market pilled (you have to believe that more is possible in order to aim for it), noticing confusion, and probably a lot more that I’m failing to name here.
Most importantly, though, I think it requires quite a lot of willingness to remain confused. Many scientists who accomplished great things (Darwin, Einstein) didn’t have publishable results on their main inquiry for years. Einstein, for instance, talks about wandering off for weeks in a state of “psychic tension” in his youth, it took ~ten years to go from his first inkling of relativity to special relativity, and he nearly gave up at many points (including the week before he figured it out). Figuring out knowledge at the edge of human understanding can just be… really fucking brutal. I feel like this is largely forgotten, or ignored, or just not understood. Partially that’s because in retrospect everything looks obvious, so it doesn’t seem like it could have been that hard, but partially it’s because almost no one tries to do this sort of work, so there aren’t societal structures erected around it, and hence little collective understanding of what it’s like.
Anyway, I suspect there are really strong selection pressures for who ends up doing this sort of thing, since a lot needs to go right: smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on. Indeed, the last point seems important to me—many great scientists are obsessed. Spend night and day on it, it’s in their shower thoughts, can’t put it down kind of obsessed. And I suspect this sort of has to be true because something has to motivate them to go against every conceivable pressure (social, financial, psychological) and pursue the strange meaning anyway.
I don’t think the EA pipeline is much selecting for pre-paradigmatic scientists, but I don’t think lack of trying to get physicists to work on alignment is really the bottleneck either. Mostly I think selection effects are very strong, e.g., the Sequences was, imo, one of the more effective recruiting strategies for alignment. I don’t really know what to recommend here, but I think I would anti-recommend putting all the physics post-docs from good universities in a room in the hope that they make progress. Requesting that the world write another book as good as the Sequences is a… big ask, although to the extent it’s possible I expect it’ll go much further in drawing people out who will self select into this rather unusual “job.”
This is the sort of thing I find appealing to believe, but I feel at least somewhat skeptical of. I notice a strong emotional pull to want this to be true (as well as an interesting counterbalancing emotional pull for it to not be true).
I don’t think I’ve seen output from the people aspiring in this direction without being visibly quite smart to make me think “okay yeah it seems like it’s on track in some sense.”
I’d be interested in hearing more explicit cruxes from you about it.
I do think it’s plausible than the “smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on” are sufficient (if you’re at least moderately-but-not-exceptionally-smart). Those are rare enough qualities that it doesn’t necessarily feel like I’m getting a free lunch, if they turn out to be sufficient for groundbreaking pre-paradigmatic research. I agree the x-risk pipeline hasn’t tried very hard to filter for and/or generate people with these qualities.
(well, okay, “smart enough” is doing a lot of work there, I assume from context you mean “pretty smart but not like genius smart”)
But, I’ve only really seen you note positive examples, and this seems like the sort of thing that’d have a lot of survivorship bias. There can be tons of people obsessed, but not necessarily on the right things, and if you’re not naturally the right cluster of obsessed + smart-in-the-right-way, I don’t know whether trying to cultivate the obsession on purpose will really work.
I do nonetheless overall probably prefer people who have all your listed qualities, and who also either can:
a) self-fund to pursue the research without having to make it legible to others
b) somehow figure out a way to make it legible along the way
I probably prefer those people to tackle “the hard parts of alignment” over many other things they could be doing, but not overwhelmingly obviously (and I think it should come with a background awareness that they are making a gamble, and if they aren’t the sort of person who must make that gamble due to their personality makeup, they should be prepared for the (mainline) outcome that it just doesn’t work out)
I think this is right. A couple of follow-on points:
There’s a funding problem if this is an important route to progress. If good work is illegible for years, it’s hard to decide who to fund, and hard to argue for people to fund it. I don’t have a proposed solution, but I wanted to note this large problem.
Einstein did his pre-paradigmatic work largely alone. Better collaboration might’ve sped it up.
LessWrong allows people to share their thoughts prior to having publishable journal articles and get at least a few people to engage.
This makes the difficult pre-paradigmatic thinking a group effort instead of a solo effort. This could speed up progress dramatically.
This post and the resulting comments and discussions is an example of the community collectively doing much of the work you describe: traversing levels, practicing good epistemics, and remaining confused.
Having conversations with other LWers (on calls, by DM, or in extended comment threads) is tremendously useful for me. I could produce those same thoughts and critiques, but it would take me longer to arrive at all of those different viewpoints of the issue. I mention this to encourage others to do it. Communication takes time and some additional effort (asking people to talk), but it’s often well worth it. Talking to people who are interested in and knowledgeable on the same topics can be an enormous speedup in doing difficult pre-paradigmatic thinking.
LessWrong isn’t perfect, but it’s a vast improvement on the collaboration tools and communities that have been available to scientists in other fields. We should take advantage of it.
I think this is false. As I remember hearing the story, he where corresponding with several people via letters.
I know very little, but there’s a fun fact here: “During their lifetimes, Darwin sent at least 7,591 letters and received 6,530; Einstein sent more than 14,500 and received more than 16,200.” (Not sure what fraction was technical vs personal.)
Also, this is a brief summary of Einstein’s mathematician friend Marcel Grossmann’s role in general relativity.
In the piece you linked, it sounds like Einstein had the correct geometry for general relativity one day after he asked for help finding one. Of course, that’s one notable success amongst perhaps a lot of collaboration. The number of letters he sent and received implies that he actually did a lot of written collaboration.
I wonder about the value of real-time conversation vs. written exchanges. And the value of being fully engaged; truly curious about your interlocutor’s ideas.
My own experience watching progress happen (and not-happen) in theoretical neuroscience is that fully engaged conversations with other true experts with different viewpoints was rare and often critical for real progress.
My perception is that those conversations are tricky to produce. Experts are often splitting their attention between impressing people and coolheaded, openminded discussion. And they weren’t really seeking out these conversations, just having them when it was convenient, and being really fully engaged only when the interpersonal vibe happened to be right. Even so, the bit of real conversation I saw seemed quite important.
It would be helpful to understand collaboration on difficult theory better, but it would be a whole research topic.
By largely alone I meant without the rich collaboration of having an office in the same campus or phone calls or LessWrong.
I think the qualitive difference is not as large as you think it is. But I also don’t think this is very crux-y for anything, so I will not try to figure out how to translate my reasoning to words, sorry.
To be clear, I wasn’t talking about physics postdocs mainly because of raw g. Raw g is a necessary element, and physics postdocs are pretty heavily loaded on it, but I was talking about physics postdocs mostly because of the large volume of applied math tools they have.
The usual way that someone sees footholds on the hard parts of alignment is to have a broad enough technical background that they can see some analogy to something they know about, and try borrowing tools that work on that other thing. Thus the importance of a large volume of technical knowledge.
Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)
”Broad technical knowledge” should be in some sense the “easiest” (not in terms of time-investment, but in terms of predictable outcomes), by reading lots of textbooks (using similar material as your study guide).
Writing/communication, while more vague, should also be learnable by just writing a lot of things, publishing them on the internet for feedback, reflecting on your process etc.
Something like “solving novel problems” seems like a much “harder” one. I don’t know if this is a skill with a simple “core” or a grab-bag of tactics. Textbook problems take on a “meant-to-be-solved” flavor and I find one can be very good at solving these without being good at tackling novel problems. Another thing I notice is that when some people (myself included) try solving novel problems, we can end up on a path which gets there eventually, but if given “correct” feedback integration would go OOM faster.
I’m sure there are other vague-skills which one ends up picking up from a physics PhD. Can you name others, and how one picks them up intentionally? Am I asking the wrong question?
I currently think broad technical knowledge is the main requisite, and I think self-study can suffice for the large majority of that in principle. The main failure mode I see would-be autodidacts run into is motivation, but if you can stay motivated then there’s plenty of study materials.
For practice solving novel problems, just picking some interesting problems (preferably not AI) and working on them for a while is a fine way to practice.
Why not AI? Is it that AI alignment is too hard? Or do you think it’s likely one would fall into the “try a bunch of random stuff” paradigm popular in AI, which wouldn’t help much in getting better at solving hard problems?
What do you think about the strategy of instead of learning a textbook e.g. on information theory, or compilers you try to write the textbook and only look at existing material if you are really stuck. That’s my primary learning strategy.
It’s very slow and I probably do it too much, but it allows me to train to solve hard problems that aren’t super hard. If you read all the text books all the practice problems remaining are very hard.
(That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn’t come across from the post.)
Agreed. Simply focusing on physics post-docs feels too narrow to me.
Then again, just as John has a particular idea of what good alignment research looks like, I have my own idea: I would lean towards recruiting folk with both a technical and a philosophical background. It’s possible that my own idea is just as narrow.
The post did explicitly say “Obviously that doesn’t mean we exclusively want physics postdocs”.
Thanks for clarifying. Still feels narrow as a primary focus.