TsviBT comments on Alexander Gietelink Oldenziel’s Shortform

TsviBT 29 Dec 2024 15:31 UTC
14 points
3

why you are so confident in these “defeaters”

More than any one defeater, I’m confident that most people in the alignment field don’t understand the defeaters. Why? I mean, from talking to many of them, and from their choices of research.

People in these fields understand very well the problem you are pointing towards.

I don’t believe you.

if the alignment community would outlaw mechinterp/slt/ neuroscience

This is an insane strawman. Why are you strawmanning what I’m saying?

I dont think progress on this question will be made by blanket dismissals

Progress could only be made by understanding the problems, which can only be done by stating the problems, which you’re calling “blanket dismissals”.
- Alexander Gietelink Oldenziel 30 Dec 2024 11:47 UTC
  8 points
  0
  Parent
  Okay seems like the commentariat agrees I am too combative. I apologize if you feel strawmanned.
  
  Feels like we got a bit stuck. When you say “defeater” what I hear is a very confident blanket dismissal. Maybe that’s not what you have in mind.
  - plex 30 Dec 2024 22:03 UTC
    5 points
    0
    Parent
    Defeater, in my mind, is a failure mode which if you don’t address you will not succeed at aligning sufficiently powerful systems.^[1] It does not mean work outside of that focused on them is useless, but at some point you have to deal with the defeaters, and if the vast majority of people working towards alignment don’t get them clearly, and the people who do get them claim we’re nowhere near on track to find a way to beat the defeaters, then that is a scary situation.
    This is true even if some of the work being done by people unaware of the defeaters is not useless, e.g. maybe it is successfully averting earlier forms of doom than the ones that require routing around the defeaters.
    ^
    Not best considered as an argument against specific lines of attack, but as a problem which if unsolved leads inevitably to doom. People with a strong grok of a bunch of these often think that way more timelines are lost to “we didn’t solve these defeaters” than the problems being even plausibly addressed by the class of work being done by most of the field. This does unfortunately make it get used as (and feel like) an argument against those approaches by people who don’t and don’t claim to understand those approaches, but that’s not the generator or important nature of it.