Let me conclude by saying things that would have been useful for past-me about “how to contribute to alignment”. As in past posts, my mode here is “personal musings I felt like writing that might accidentally be useful to others”.
So, for me-1-month-ago, the bottleneck was “uh, I don’t really know what to work on”. Let’s talk about that.
First of all, experienced alignment researchers tend to have plenty of ideas. (Come on, me-1-month-ago, don’t be surprised.) Did you know that there’s this forum where alignment people write out their thoughts?
“But there’s so much material there”, me-1-month-ago responds.
what kind of excuse is that Okay so how research programs work is that you have some mentor and you try to learn stuff from them. You can do a version of this alone as well: just take some researcher you think has good takes and go read their texts.
No, I mean actually read them. I don’t mean “skim through the posts”, I mean going above and beyond here: printing the text on paper, going through it line by line, flagging down new considerations you haven’t thought before. Try to actually understand what the author thinks, to understand the worldview that has generated those posts, not just going “that claim is true, that one is false, that’s true, OK done”.
And I don’t mean reading just two or three posts by the author. I mean like a dozen or more. Spending hours on reading posts, really taking the time there. This is what turns “characters on a screen” to “actually learning something”.
A major part of my first week in my program involved reading posts by Evan Hubinger. I learned a lot. Which is silly: I didn’t need to fly to the Bay to access https://www.alignmentforum.org/users/evhub. But, well, I have a printer and some “let’s actually do something ok?” attitude here.
Okay, so I still haven’t a list of Concrete Projects To Work On. The main reason is that going through the process above kind of results in that. You will likely see something promising, something fruitful, something worthwhile. Posts often have “future work” sections. If you really want explicit lists of projects, then you can unsurprisingly find those as well (example). (And while I can’t speak for others, my guess is that if you really have understood someone’s worldview and you go ask them “is there some project you want me to do?”, they just might answer you.)
Me-from-1-ago would have had some flinch reaction of “but are these projects Real? do they actually address the core problems?”, which is why I wrote my previous three posts. Not that they provide a magic wand which waves away this question, rather they point out that past-me’s standard for what counts as Real Work was unreasonably high.
And yeah, you very well might have thoughts like “why is this post focusing on this instead of...” or “meh, that idea has the issue where...”. You know what to do with those.
Part 4⁄4 - Concluding comments on how to contribute to alignment
In part 1 I talked about object-level belief changes, in part 2 about how to do research and in part 3 about what alignment research looks like.
Let me conclude by saying things that would have been useful for past-me about “how to contribute to alignment”. As in past posts, my mode here is “personal musings I felt like writing that might accidentally be useful to others”.
So, for me-1-month-ago, the bottleneck was “uh, I don’t really know what to work on”. Let’s talk about that.
First of all, experienced alignment researchers tend to have plenty of ideas. (Come on, me-1-month-ago, don’t be surprised.) Did you know that there’s this forum where alignment people write out their thoughts?
“But there’s so much material there”, me-1-month-ago responds.
what kind of excuse is thatOkay so how research programs work is that you have some mentor and you try to learn stuff from them. You can do a version of this alone as well: just take some researcher you think has good takes and go read their texts.No, I mean actually read them. I don’t mean “skim through the posts”, I mean going above and beyond here: printing the text on paper, going through it line by line, flagging down new considerations you haven’t thought before. Try to actually understand what the author thinks, to understand the worldview that has generated those posts, not just going “that claim is true, that one is false, that’s true, OK done”.
And I don’t mean reading just two or three posts by the author. I mean like a dozen or more. Spending hours on reading posts, really taking the time there. This is what turns “characters on a screen” to “actually learning something”.
A major part of my first week in my program involved reading posts by Evan Hubinger. I learned a lot. Which is silly: I didn’t need to fly to the Bay to access https://www.alignmentforum.org/users/evhub. But, well, I have a printer and some “let’s actually do something ok?” attitude here.
Okay, so I still haven’t a list of Concrete Projects To Work On. The main reason is that going through the process above kind of results in that. You will likely see something promising, something fruitful, something worthwhile. Posts often have “future work” sections. If you really want explicit lists of projects, then you can unsurprisingly find those as well (example). (And while I can’t speak for others, my guess is that if you really have understood someone’s worldview and you go ask them “is there some project you want me to do?”, they just might answer you.)
Me-from-1-ago would have had some flinch reaction of “but are these projects Real? do they actually address the core problems?”, which is why I wrote my previous three posts. Not that they provide a magic wand which waves away this question, rather they point out that past-me’s standard for what counts as Real Work was unreasonably high.
And yeah, you very well might have thoughts like “why is this post focusing on this instead of...” or “meh, that idea has the issue where...”. You know what to do with those.
Good luck!