Your posts are coming at the perfect time. I just gave my notice at my current job, I have about 3 years of runway ahead of me in which I can do whatever I want. I should definitely at least evaluate AI Safety research. My background is a bachelor’s in AI (that’s a thing in the Netherlands). The little bits of research I did try got good feedback.
Even though I’m in a great position to try this, it still feels like a huge gamble. I’m aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I’m cut out for this?
Not just asking to reduce financial risk, but also because I feel like my learning trajectory would be quite different if I already knew that it was going to work out in the long run. I’d be able to study the fundamentals a lot more before trying research.
Man, this is a tough question. Evaluating the quality of research in the field is already a tough problem that everybody disagrees on, and as a result people disagree on what sort of people are well-suited to the work. Evaluating it for yourself without already being an expert in the field is even harder. With that in mind, I’ll give an answer which I think a reasonably-broad chunk of people would agree with, but with the caveat that it is very very incomplete.
I had a chat with Evan Hubinger a few weeks ago where we were speculating on how our evaluations of grant applications would compare. (I generally don’t evaluate grant applications, but Evan does.) We have very different views on what-matters-most in alignment, and agreed that our rankings would probably differ a lot. But we think we’d probably mostly agree on the binary cutoff—i.e. which applications are good enough to get funding at all. That’s because at the moment, money is abundant enough that it makes sense to invest in projects based on views which I think are probably wrong but at least have some plausible model under which they could be valuable. If there’s a project where Evan would assign it high value, and Evan’s model is itself a model-which-I-think-is-probably-wrong-but-still-plausible, then that’s enough to merit a grant. (It’s a hits-based grantmaking model.) Likewise, I’d expect Evan to view things-I’d-consider-high-value in a similar way.
Assuming that speculation is correct, the main grants which would not be funded are those which (as far as the grant evaluator can tell) don’t have any plausible model under which they’d be valuable. Thus the importance of building your own understanding of the whole high-level problem and answering the Hamming Questions: if you can do that, then you have a model under which your research will be valuable, and all that’s left is to communicate that model and your plan.
Now back to your perspective. You’re already hanging around and commenting on LessWrong, so right out the gate I have a somewhat-higher-than-default prior that you can evaluate the “some model under which the research is valuable” criterion. You’re likely to already have the concepts of Bottom Line and Trying to Try and so forth (even if you haven’t read those exact posts); you probably already have some intuition for the difference between a plan designed to actually-do-the-thing, versus a plan designed to look-like-it’s-doing-the-thing or to look-like-it’s-trying-to-do-the-thing. That doesn’t mean you already have enough of a model of the alignment/agency problems or a promising thread to tackle them, but hopefully you can at least tell if and when you do have those things.
Based on your comment, I’m more motivated to just sit down and (actually) try to solve AI Safety for X weeks, write up my results and do an application. What is your 95% confidence interval for what X needs to be to reduce the odds of a false negative (i.e. my grant gets rejected but shouldn’t have been) to a single digit?
I’m thinking of doing maybe 8 weeks. Maybe more if I can fall back on research engineering so that I haven’t wasted my time completely.
My main modification to that plan would be “writing up your process is more important than writing up your results”; I think that makes a false negative much less likely.
8 weeks seems like it’s on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:
If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on them, 8 weeks is probably sufficient
If your goal is to see whether you have any large insights or make any significant progress, that usually happens for me on a timescale of ~3 months
It sounds like you want to do something closer to the latter, so 12-16 weeks is probably more appropriate?
I’m aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I’m cut out for this?
My key comment here is that, to be an independent researcher, you will have to rely day-by-day on your own judgement on what has quality and what is valuable. So do you think you have such judgement and could develop it further?
To find out, I suggest you skim a bunch of alignment research agendas, or research overviews like this one, and then read some abstracts/first pages of papers mentioned in there, while trying apply your personal, somewhat intuitive judgement to decide
which agenda item/approach looks most promising to you as an actual method for improving alignment
which agenda item/approach you feel you could contribute most to, based on your own skills.
If your personal intuitive judgement tells you nothing about the above questions, if it all looks the same to you, then you are probably not cut out to be an independent alignment researcher.
Hi John, thanks a lot.
Your posts are coming at the perfect time. I just gave my notice at my current job, I have about 3 years of runway ahead of me in which I can do whatever I want. I should definitely at least evaluate AI Safety research. My background is a bachelor’s in AI (that’s a thing in the Netherlands). The little bits of research I did try got good feedback.
Even though I’m in a great position to try this, it still feels like a huge gamble. I’m aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I’m cut out for this?
Not just asking to reduce financial risk, but also because I feel like my learning trajectory would be quite different if I already knew that it was going to work out in the long run. I’d be able to study the fundamentals a lot more before trying research.
Man, this is a tough question. Evaluating the quality of research in the field is already a tough problem that everybody disagrees on, and as a result people disagree on what sort of people are well-suited to the work. Evaluating it for yourself without already being an expert in the field is even harder. With that in mind, I’ll give an answer which I think a reasonably-broad chunk of people would agree with, but with the caveat that it is very very incomplete.
I had a chat with Evan Hubinger a few weeks ago where we were speculating on how our evaluations of grant applications would compare. (I generally don’t evaluate grant applications, but Evan does.) We have very different views on what-matters-most in alignment, and agreed that our rankings would probably differ a lot. But we think we’d probably mostly agree on the binary cutoff—i.e. which applications are good enough to get funding at all. That’s because at the moment, money is abundant enough that it makes sense to invest in projects based on views which I think are probably wrong but at least have some plausible model under which they could be valuable. If there’s a project where Evan would assign it high value, and Evan’s model is itself a model-which-I-think-is-probably-wrong-but-still-plausible, then that’s enough to merit a grant. (It’s a hits-based grantmaking model.) Likewise, I’d expect Evan to view things-I’d-consider-high-value in a similar way.
Assuming that speculation is correct, the main grants which would not be funded are those which (as far as the grant evaluator can tell) don’t have any plausible model under which they’d be valuable. Thus the importance of building your own understanding of the whole high-level problem and answering the Hamming Questions: if you can do that, then you have a model under which your research will be valuable, and all that’s left is to communicate that model and your plan.
Now back to your perspective. You’re already hanging around and commenting on LessWrong, so right out the gate I have a somewhat-higher-than-default prior that you can evaluate the “some model under which the research is valuable” criterion. You’re likely to already have the concepts of Bottom Line and Trying to Try and so forth (even if you haven’t read those exact posts); you probably already have some intuition for the difference between a plan designed to actually-do-the-thing, versus a plan designed to look-like-it’s-doing-the-thing or to look-like-it’s-trying-to-do-the-thing. That doesn’t mean you already have enough of a model of the alignment/agency problems or a promising thread to tackle them, but hopefully you can at least tell if and when you do have those things.
Based on your comment, I’m more motivated to just sit down and (actually) try to solve AI Safety for X weeks, write up my results and do an application. What is your 95% confidence interval for what X needs to be to reduce the odds of a false negative (i.e. my grant gets rejected but shouldn’t have been) to a single digit?
I’m thinking of doing maybe 8 weeks. Maybe more if I can fall back on research engineering so that I haven’t wasted my time completely.
My main modification to that plan would be “writing up your process is more important than writing up your results”; I think that makes a false negative much less likely.
8 weeks seems like it’s on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:
If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on them, 8 weeks is probably sufficient
If your goal is to see whether you have any large insights or make any significant progress, that usually happens for me on a timescale of ~3 months
It sounds like you want to do something closer to the latter, so 12-16 weeks is probably more appropriate?
My key comment here is that, to be an independent researcher, you will have to rely day-by-day on your own judgement on what has quality and what is valuable. So do you think you have such judgement and could develop it further?
To find out, I suggest you skim a bunch of alignment research agendas, or research overviews like this one, and then read some abstracts/first pages of papers mentioned in there, while trying apply your personal, somewhat intuitive judgement to decide
which agenda item/approach looks most promising to you as an actual method for improving alignment
which agenda item/approach you feel you could contribute most to, based on your own skills.
If your personal intuitive judgement tells you nothing about the above questions, if it all looks the same to you, then you are probably not cut out to be an independent alignment researcher.