Hmm, there might be some mismatch of words here. Like, most of the work so far on the problem has been theoretical. I am confused how you could not be excited about the theoretical work that established the whole problem, the arguments for why it’s hard, and that helped us figure out at least some of the basic parameters of the problem. Given that (I think) you currently think AI Alignment is among the global priorities, you presumably think the work that allowed you to come to believe that (and that allowed others to do the same) was very valuable and important.
My guess is you are somehow thinking of work like Superintelligence, or Eliezer’s original work, or Evan’s work on inner optimization as something different than “theoretical work”?
I was mainly talking about the current margin when I talked about how excited I am about the theoretical vs empirical work I see “going on” right now and how excited I tend to be about currently-active researchers who are doing theory vs empirical research. And I was talking about the future when I said that I expect empirical work to end up with the lion’s share of credit for AI risk reduction.
Eliezer, Bostrom, and co certainly made a big impact in raising the problem to people’s awareness and articulating some of its contours. It’s kind of a matter of semantics whether you want to call that “theoretical research” or “problem advocacy” / “cause prioritization” / “community building” / whatever, and no matter which bucket you put it in I agree it’ll probably end up with an outsized impact for x-risk-reduction, by bringing the problem to attention sooner than it would have otherwise been brought to attention and therefore probably allowing more work to happen on it before TAI is developed.
But just like how founding CEOs tend to end up with ~10% equity once their companies have grown large, I don’t think this historical problem-advocacy-slash-theoretical-research work alone will end up with a very large amount of total credit.
On the main thrust of my point, I’m significantly less excited about MIRI-sphere work that is much less like “articulating a problem and advocating for its importance” and much more like “attempting to solve a problem.” E.g. stuff like logical inductors, embedded agency, etc seem a lot less valuable to me than stuff like the orthogonality thesis and so on.
Unfortunately, empirical work is slowly progressing towards alignment, and truth be told, we might be in a local optima for alignment chances, barring stuff like outright banning AI work, or doing something political. And unfortunately that’s hopelessly going to get us mind killed and probably make it even worse.
Also, at the start of a process towards a solution always boots up slowly, with productive mistakes like these. You will never get perfect answers, and thinking that you can get perfect answers is a Nirvana Fallacy. Exponential growth will help us somewhat, but ultimately AI Alignment is probably in a local optima state, that is, the people and companies that are in the lead to building AGI are sympathetic to Alignment, which is far better than we could reasonably have hoped for, and there’s little arms race dynamics for AGI, which is even better.
We often complain about the AGI issue for real reasons, but we do need to realize how good we’ve likely gotten at this. It’s still shitty, yet there are far worse points we could have ended up here.
In a post-AI PONR world, we’re lucky if we can solve the problem of AI Alignment enough such that we go through slow-takeoff safely. We all hate it, yet empirical work will ultimately be necessary, and we undervalue feedback loops because theory can get wildly out of reality without being in contact with reality.
Hmm, there might be some mismatch of words here. Like, most of the work so far on the problem has been theoretical. I am confused how you could not be excited about the theoretical work that established the whole problem, the arguments for why it’s hard, and that helped us figure out at least some of the basic parameters of the problem. Given that (I think) you currently think AI Alignment is among the global priorities, you presumably think the work that allowed you to come to believe that (and that allowed others to do the same) was very valuable and important.
My guess is you are somehow thinking of work like Superintelligence, or Eliezer’s original work, or Evan’s work on inner optimization as something different than “theoretical work”?
I was mainly talking about the current margin when I talked about how excited I am about the theoretical vs empirical work I see “going on” right now and how excited I tend to be about currently-active researchers who are doing theory vs empirical research. And I was talking about the future when I said that I expect empirical work to end up with the lion’s share of credit for AI risk reduction.
Eliezer, Bostrom, and co certainly made a big impact in raising the problem to people’s awareness and articulating some of its contours. It’s kind of a matter of semantics whether you want to call that “theoretical research” or “problem advocacy” / “cause prioritization” / “community building” / whatever, and no matter which bucket you put it in I agree it’ll probably end up with an outsized impact for x-risk-reduction, by bringing the problem to attention sooner than it would have otherwise been brought to attention and therefore probably allowing more work to happen on it before TAI is developed.
But just like how founding CEOs tend to end up with ~10% equity once their companies have grown large, I don’t think this historical problem-advocacy-slash-theoretical-research work alone will end up with a very large amount of total credit.
On the main thrust of my point, I’m significantly less excited about MIRI-sphere work that is much less like “articulating a problem and advocating for its importance” and much more like “attempting to solve a problem.” E.g. stuff like logical inductors, embedded agency, etc seem a lot less valuable to me than stuff like the orthogonality thesis and so on.
Unfortunately, empirical work is slowly progressing towards alignment, and truth be told, we might be in a local optima for alignment chances, barring stuff like outright banning AI work, or doing something political. And unfortunately that’s hopelessly going to get us mind killed and probably make it even worse.
Also, at the start of a process towards a solution always boots up slowly, with productive mistakes like these. You will never get perfect answers, and thinking that you can get perfect answers is a Nirvana Fallacy. Exponential growth will help us somewhat, but ultimately AI Alignment is probably in a local optima state, that is, the people and companies that are in the lead to building AGI are sympathetic to Alignment, which is far better than we could reasonably have hoped for, and there’s little arms race dynamics for AGI, which is even better.
We often complain about the AGI issue for real reasons, but we do need to realize how good we’ve likely gotten at this. It’s still shitty, yet there are far worse points we could have ended up here.
In a post-AI PONR world, we’re lucky if we can solve the problem of AI Alignment enough such that we go through slow-takeoff safely. We all hate it, yet empirical work will ultimately be necessary, and we undervalue feedback loops because theory can get wildly out of reality without being in contact with reality.