I now understand the key question as being “what baseline of inferential distance should we expect all orgs to have reached?”
Yes—let’s make an analogy to a startup accelerator. Suppose that you have to get 20 inferential steps right in a row to be a successful startup, where (say) 10 of those are about necessary how-to-start-a-startup skills (things like hiring, user interviews, understanding product-market-fit) and 10 of those are details about your particular product. YC wants everyone to have the same first 10, (I think that’s mainly what they select on, but will try to to teach you the rest) but it’s important to have lots of variance in the second set of 10. If most startups fail, it’s good to have lots of good startups trying lots of different products.
In alignment research, the disagreement is what are the fundamentals that we know you definitely require to make sure your alignment research has a chance of being useful (aka ‘actually part of the field of alignment), and what are the bits that we’ll call ‘ongoing genuine disagreement in the field’. Here’s a public note saying I’ll come back later this week to give a guess as to what some of those variables are.
This isn’t “models I use when thinking about the object level alignment problem” or “models I’d use l if I were doing alignment research”. Those are a set of more detailed models of how intelligence works in general, and I do intend to write a post about those sometime.
On this:
Yes—let’s make an analogy to a startup accelerator. Suppose that you have to get 20 inferential steps right in a row to be a successful startup, where (say) 10 of those are about necessary how-to-start-a-startup skills (things like hiring, user interviews, understanding product-market-fit) and 10 of those are details about your particular product. YC wants everyone to have the same first 10, (I think that’s mainly what they select on, but will try to to teach you the rest) but it’s important to have lots of variance in the second set of 10. If most startups fail, it’s good to have lots of good startups trying lots of different products.
In alignment research, the disagreement is what are the fundamentals that we know you definitely require to make sure your alignment research has a chance of being useful (aka ‘actually part of the field of alignment), and what are the bits that we’ll call ‘ongoing genuine disagreement in the field’. Here’s a public note saying I’ll come back later this week to give a guess as to what some of those variables are.
I thought about it for a while, and ended up writing a nearby post: “Models I use when making plans to reduce AI x-risk”.
This isn’t “models I use when thinking about the object level alignment problem” or “models I’d use l if I were doing alignment research”. Those are a set of more detailed models of how intelligence works in general, and I do intend to write a post about those sometime.