Has anyone tried to point out expected failure modes of that approach (beyond the general “we don’t know what happens when capabilities increase” that I was pointing at)? I’ll admit I don’t understand the details enough right now to say anything, but it seems worth to look at!
I’m not sure I can follow your Meta-reasoning. I agree that news are overly focused on current problems, but I don’t really see how that applies to AI alignment (except maybe as far as bias etc. are concerned). Personally, I try to go by who has the most logically inevitable-seeming chains of reasoning.
Has anyone tried to point out expected failure modes of that approach (beyond the general “we don’t know what happens when capabilities increase” that I was pointing at)?
Not right now, though more work is necessary in order to show that the improving alignment as it improves in other capabilities other than data. But it’s likely the only shortcoming of the paper.
Personally, I expect that Pretraining from Human Feedback will generalize to other capabilities and couple capabilities and alignment together.
I’m not sure I can follow your Meta-reasoning. I agree that news are overly focused on current problems, but I don’t really see how that applies to AI alignment (except maybe as far as bias etc. are concerned). Personally, I try to go by who has the most logically inevitable-seeming chains of reasoning.
While logic and evidence do matter, my point is that there’s an issue where there’s a general bias towards the negative view of things, since we both like it and the news serves us up more negative views.
This has implications for arguably everything, including X-risk: The major implication is that we should differentially distrust negative updates over positive updates, and thus we should expect to reliably predict that things are better than they seem.
Has anyone tried to point out expected failure modes of that approach (beyond the general “we don’t know what happens when capabilities increase” that I was pointing at)?
I’ll admit I don’t understand the details enough right now to say anything, but it seems worth to look at!
I’m not sure I can follow your Meta-reasoning. I agree that news are overly focused on current problems, but I don’t really see how that applies to AI alignment (except maybe as far as bias etc. are concerned). Personally, I try to go by who has the most logically inevitable-seeming chains of reasoning.
Not right now, though more work is necessary in order to show that the improving alignment as it improves in other capabilities other than data. But it’s likely the only shortcoming of the paper.
Personally, I expect that Pretraining from Human Feedback will generalize to other capabilities and couple capabilities and alignment together.
While logic and evidence do matter, my point is that there’s an issue where there’s a general bias towards the negative view of things, since we both like it and the news serves us up more negative views.
This has implications for arguably everything, including X-risk: The major implication is that we should differentially distrust negative updates over positive updates, and thus we should expect to reliably predict that things are better than they seem.
Here’s the link for the issue of negativity bias:
https://www.vox.com/the-highlight/23596969/bad-news-negativity-bias-media