Tahp comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Tahp 27 Dec 2024 1:24 UTC
27 points
0
I am a physics PhD student. I study field theory. I have a list of projects I’ve thrown myself at with inadequate technical background (to start with) and figured out. I’ve convinced a bunch of people at a research institute that they should keep giving me money to solve physics problems. I’ve been following LessWrong with interest for years. I think that AI is going to kill us all, and would prefer to live for longer if I can pull it off. So what do I do to see if I have anything to contribute to alignment research? Maybe I’m flattering myself here, but I sound like I might be a person of interest for people who care about the pipeline. I don’t feel like a great candidate because I don’t have any concrete ideas for AI research topics to chase down, but it sure seems like I might start having ideas if I worked on the problem with somebody for a bit. I’m apparently very ok with being an underpaid gopher to someone with grand theoretical ambitions while I learn the material necessary to come up with my own ideas. My only lead to go on is “go look for something interesting in MATS and apply to it” but that sounds like a great way to end up doing streetlight research because I don’t understand the field. Ideally, I guess I would have whatever spark makes people dive into technical research in a pretty low-status field for no money for long enough to produce good enough research which convinces people to pay their rent while they keep doing more, but apparently the field can’t find enough of those that it’s unwilling to look for other options.

I know what to do to keep doing physics research. My TA assignment effectively means that I have a part-time job teaching teenagers how to use Newton’s laws so I can spend twenty or thirty hours a week coding up quark models. I did well on a bunch of exams to convince an institution that I am capable of the technical work required to do research (and, to be fair, I provide them with 15 hours per week of below-market-rate intellectual labor which they can leverage into tuition that more than pays my salary), so now I have a lot of flexibility to just drift around learning about physics I find interesting while they pay my rent. If someone else is willing to throw 30,000 dollars per year at me to think deeply about AI and get nowhere instead of thinking deeply about field theory to get nowhere, I am not aware of them. Obviously the incentives are perverse to just go around throwing money at people who might be good at AI research, so I’m not surprised that I’ve only found one potential money spigot for AI research, but I had so many to choose from for physics.
- DusanDNesic 27 Dec 2024 10:04 UTC
  25 points
  19
  Parent
  It sounds like you should apply for the PIBBSS Fellowship! (https://pibbss.ai/fellowship/)
- Buck 27 Dec 2024 6:56 UTC
  18 points
  3
  Parent
  Going to MATS is also an opportunity to learn a lot more about the space of AI safety research, e.g. considering the arguments for different research directions and learning about different opportunities to contribute. Even if the “streetlight research” project you do is kind of useless (entirely possible), doing MATS is plausibly a pretty good option.
  - TsviBT 27 Dec 2024 11:21 UTC
    41 points
    35
    Parent
    MATS will push you to streetlight much more unless you have some special ability to have it not do that.
    - Buck 27 Dec 2024 15:43 UTC
      13 points
      10
      Parent
      Do you mean during the program? Sure, maybe the only MATS offers you can get are for projects you think aren’t useful—I think some MATS projects are pretty useless (e.g. our dear OP’s). But it’s still an opportunity to argue with other people about the problems in the field and see whether anyone has good justifications for their prioritization. And you can stop doing the streetlight stuff afterwards if you want to.
      Remember that the top-level commenter here is currently a physicist, so it’s not like the usefulness of their work would be going down by doing a useless MATS project :P
      - TsviBT 27 Dec 2024 16:14 UTC
        14 points
        6
        Parent
        
        Remember that the top-level commenter here is currently a physicist, so it’s not like the usefulness of their work would be going down by doing a useless MATS project :P
        
        Yes it would! It would eat up motivation and energy and hope that they could have put towards actual research. And it would put them in a social context where they are pressured to orient themselves toward streetlighty research—not just during the program, but also afterward. Unless they have some special ability to have it not do that.
        
        Without MATS: not currently doing anything directly useful (though maybe indirectly useful, e.g. gaining problem-solving skill). Could, if given $30k/year, start doing real AGI alignment thinking from scratch not from scratch, thereby scratching their “will you think in a way that unlocks understanding of strong minds” lottery ticket that each person gets.
        
        With MATS: gotta apply to extension, write my LTFF grant. Which org should I apply to? Should I do linear probes software engineering? Or evals? Red teaming? CoT? Constitution? Hyperparamter gippity? Honeypot? Scaling supervision? Superalign, better than regular align? Detecting deception?
        Ryan Kidd 27 Dec 2024 19:25 UTC
        15 points
        1
        Parent
        Obviously I disagree with Tsvi regarding the value of MATS to the proto-alignment researcher; I think being exposed to high quality mentorship and peer-sourced red-teaming of your research ideas is incredibly valuable for emerging researchers. However, he makes a good point: ideally, scholars shouldn’t feel pushed to write highly competitive LTFF grant applications so soon into their research careers; there should be longer-term unconditional funding opportunities. I would love to unlock this so that a subset of scholars can explore diverse research directions for 1-2 years without 6-month grant timelines looming over them. Currently cooking something in this space.
- the gears to ascension 27 Dec 2024 11:49 UTC
  16 points
  6
  Parent
  The first step would probably be to avoid letting the existing field influence you too much. Instead, consider from scratch what the problems of minds and AI are, how they relate to reality and to other problems, and try to grab them with intellectual tools you’re familiar with. Talk to other physicists and try to get into exploratory conversation that does not rely on existing knowledge. If you look at the existing field, look at it like you’re studying aliens anthropologically.
- yams 27 Dec 2024 11:37 UTC
  8 points
  2
  Parent
  [was a manager at MATS until recently and want to flesh out the thing Buck said a bit more]
  
  It’s common for researchers to switch subfields, and extremely common for MATS scholars to get work doing something different from what they did at MATS. (Kosoy has had scholars go on to ARC, Neel scholars have ended up in scalable oversight, Evan’s scholars have a massive spread in their trajectories; there are many more examples but it’s 3 AM.)
  
  Also I wouldn’t advise applying to something that seems interesting; I’d advise applying for literally everything (unless you know for sure you don’t want to work with Neel, since his app is very time intensive). The acceptance rate is ~4 percent, so better to maximize your odds (again, for most scholars, the bulk of the value is not in their specific research output over the 10 week period, but in having the experience at all).
  
  Also please see Ryan’s replies to Tsvi on the talent needs report for more notes on the street lighting concern as it pertains to MATS. There’s a pretty big back and forth there (I don’t cleanly agree with one side or the other, but it might be useful to you).
- Ryan Kidd 27 Dec 2024 20:57 UTC
  6 points
  0
  Parent
  You could consider doing MATS as “I don’t know what to do, so I’ll try my hand at something a decent number of apparent experts consider worthwhile and meanwhile bootstrap a deep understanding of this subfield and a shallow understanding of a dozen other subfields pursued by my peers.” This seems like a common MATS experience and I think this is a good thing.
- nc 27 Dec 2024 22:31 UTC
  2 points
  0
  Parent
  I am surprised that you find theoretical physics research less tight funding-wise than AI alignment [is this because the paths to funding in physics are well-worn, rather than better resourced?].
  This whole post was a little discouraging. I hope that the research community can find a way forward.
- plex 27 Dec 2024 22:12 UTC
  2 points
  0
  Parent
  If you’re mobile (able to be in the UK) and willing to try a different lifestyle, consider going to the EA hotel aka CEEALAR, they offer free food and accomodation for a bunch of people, including many people working on AI safety. Alternatively, taking a quick look at https://www.aisafety.com/funders, the current best options are maybe LTFF, OpenPhil, CLR, or maybe AE Studios?