This is a real shame—there are lots of alignment research directions that could really use productive smart people.
I think you might be trapped in a false dichotomy of “impossible” or “easy”. For example, Anthropic/Redwood Research’s safety directions will succeed or fail in large part based on how much good interpretability/adversarial auditing/RLHF-and-its-limitations/etc. work smart people do. Yudkowsky isn’t the only expert, and if he’s miscalibrated then your actions have extremely high value.
This comment is also falling for a version of the ‘impossible’ vs. ‘easy’ false dichotomy. In particular:
For example, Anthropic/Redwood Research’s safety directions will succeed or fail in large part based on how much good interpretability/adversarial auditing/RLHF-and-its-limitations/etc. work smart people do.
Eliezer has come out loudly and repeatedly in favor of Redwood Research’s work as worth supporting and helping with. Your implied ‘it’s only worth working at Redwood if Eliezer is wrong’ is just false, and suggests a misunderstanding of Eliezer’s view.
Yudkowsky isn’t the only expert, and if he’s miscalibrated then your actions have extremely high value.
The relevant kind of value for decision-making is ‘expected value of this option compared to the expected value of your alternative values’, not ‘guaranteed value’. The relative expected value of alignment research, if you’re relatively good at it, is almost always extremely high. Adding ‘but only if Eliezer is wrong’ is wrong.
Specifically, the false dichotomy here is ‘everything is either impossible or not-highly-difficult’. Eliezer thinks alignment is highly difficult, but not impossible (nor negligibly-likely-to-be-achieved). Conflating ‘highly difficult’ with ‘impossible’ is qualitatively the same kind of error as conflating ‘not easy’ with ‘impossible’.
You’re right, and my above comment was written in haste. I didn’t mean to imply Eliezer thought those directions were pointless, he clearly doesn’t. I do think he’s stated, when asked on here by incoming college students what they should do, something to the effect of “I don’t know, I’m sorry”. But I think I did mischaracterize him in my phrasing, and that’s my bad, I’m sorry.
My only note is that, when addressing newcomers to the AI safety world, the log-odds perspective of the benefit of working on safety requires several prerequisites that many of those folks don’t share. In particular, for those not bought into longtermism/pure utilitarianism, “dying with dignity” by increasing humanity’s odds of survival from 0.1% to 0.2% at substantial professional and emotional cost to yourself during the ~10 years you believe you still have, is not prima facie a sufficiently compelling reason to work on AI safety. In that case, arguing that from an outside view the number might not actually be so low seems an important thing to highlight to people, even if they happen to eventually update down that far upon forming an inside view.
This is a real shame—there are lots of alignment research directions that could really use productive smart people.
I think you might be trapped in a false dichotomy of “impossible” or “easy”. For example, Anthropic/Redwood Research’s safety directions will succeed or fail in large part based on how much good interpretability/adversarial auditing/RLHF-and-its-limitations/etc. work smart people do. Yudkowsky isn’t the only expert, and if he’s miscalibrated then your actions have extremely high value.
This comment is also falling for a version of the ‘impossible’ vs. ‘easy’ false dichotomy. In particular:
Eliezer has come out loudly and repeatedly in favor of Redwood Research’s work as worth supporting and helping with. Your implied ‘it’s only worth working at Redwood if Eliezer is wrong’ is just false, and suggests a misunderstanding of Eliezer’s view.
The relevant kind of value for decision-making is ‘expected value of this option compared to the expected value of your alternative values’, not ‘guaranteed value’. The relative expected value of alignment research, if you’re relatively good at it, is almost always extremely high. Adding ‘but only if Eliezer is wrong’ is wrong.
Specifically, the false dichotomy here is ‘everything is either impossible or not-highly-difficult’. Eliezer thinks alignment is highly difficult, but not impossible (nor negligibly-likely-to-be-achieved). Conflating ‘highly difficult’ with ‘impossible’ is qualitatively the same kind of error as conflating ‘not easy’ with ‘impossible’.
You’re right, and my above comment was written in haste. I didn’t mean to imply Eliezer thought those directions were pointless, he clearly doesn’t. I do think he’s stated, when asked on here by incoming college students what they should do, something to the effect of “I don’t know, I’m sorry”. But I think I did mischaracterize him in my phrasing, and that’s my bad, I’m sorry.
My only note is that, when addressing newcomers to the AI safety world, the log-odds perspective of the benefit of working on safety requires several prerequisites that many of those folks don’t share. In particular, for those not bought into longtermism/pure utilitarianism, “dying with dignity” by increasing humanity’s odds of survival from 0.1% to 0.2% at substantial professional and emotional cost to yourself during the ~10 years you believe you still have, is not prima facie a sufficiently compelling reason to work on AI safety. In that case, arguing that from an outside view the number might not actually be so low seems an important thing to highlight to people, even if they happen to eventually update down that far upon forming an inside view.