More than any one defeater, I’m confident that most people in the alignment field don’t understand the defeaters. Why? I mean, from talking to many of them, and from their choices of research.
People in these fields understand very well the problem you are pointing towards.
I don’t believe you.
if the alignment community would outlaw mechinterp/slt/ neuroscience
This is an insane strawman. Why are you strawmanning what I’m saying?
I dont think progress on this question will be made by blanket dismissals
Progress could only be made by understanding the problems, which can only be done by stating the problems, which you’re calling “blanket dismissals”.
Defeater, in my mind, is a failure mode which if you don’t address you will not succeed at aligning sufficiently powerful systems.[1] It does not mean work outside of that focused on them is useless, but at some point you have to deal with the defeaters, and if the vast majority of people working towards alignment don’t get them clearly, and the people who do get them claim we’re nowhere near on track to find a way to beat the defeaters, then that is a scary situation.
This is true even if some of the work being done by people unaware of the defeaters is not useless, e.g. maybe it is successfully averting earlier forms of doom than the ones that require routing around the defeaters.
Not best considered as an argument against specific lines of attack, but as a problem which if unsolved leads inevitably to doom. People with a strong grok of a bunch of these often think that way more timelines are lost to “we didn’t solve these defeaters” than the problems being even plausibly addressed by the class of work being done by most of the field. This does unfortunately make it get used as (and feel like) an argument against those approaches by people who don’t and don’t claim to understand those approaches, but that’s not the generator or important nature of it.
More than any one defeater, I’m confident that most people in the alignment field don’t understand the defeaters. Why? I mean, from talking to many of them, and from their choices of research.
I don’t believe you.
This is an insane strawman. Why are you strawmanning what I’m saying?
Progress could only be made by understanding the problems, which can only be done by stating the problems, which you’re calling “blanket dismissals”.
Okay seems like the commentariat agrees I am too combative. I apologize if you feel strawmanned.
Feels like we got a bit stuck. When you say “defeater” what I hear is a very confident blanket dismissal. Maybe that’s not what you have in mind.
Defeater, in my mind, is a failure mode which if you don’t address you will not succeed at aligning sufficiently powerful systems.[1] It does not mean work outside of that focused on them is useless, but at some point you have to deal with the defeaters, and if the vast majority of people working towards alignment don’t get them clearly, and the people who do get them claim we’re nowhere near on track to find a way to beat the defeaters, then that is a scary situation.
This is true even if some of the work being done by people unaware of the defeaters is not useless, e.g. maybe it is successfully averting earlier forms of doom than the ones that require routing around the defeaters.
Not best considered as an argument against specific lines of attack, but as a problem which if unsolved leads inevitably to doom. People with a strong grok of a bunch of these often think that way more timelines are lost to “we didn’t solve these defeaters” than the problems being even plausibly addressed by the class of work being done by most of the field. This does unfortunately make it get used as (and feel like) an argument against those approaches by people who don’t and don’t claim to understand those approaches, but that’s not the generator or important nature of it.