I now understand the key question as being “what baseline of inferential distance should we expect all orgs to have reached?”. Should they all have internalised deep security mindset? Should they have read the Sequences? Or does it suffice that they’ve read Superintelligence? Or that they have a record of donating to charity? And so forth.
Ben seems to think this baseline is much higher than Albert. Which is why he is happy to support Paul’s agenda, because it agrees on most of the non-mainstream moving parts that also go into MIRI’s agenda, whereas orgs working on algorithmic bias, say, lack most of those. Now in order to settle the debate we can’t really push Ben to explain why all the moving parts of his baseline are correct—that’s essentially the voice of Pat Modesto. He might legitimately be able to offer no better explanation than that it’s the combined model built through years of thinking, reading, trying and discussing. But this also makes it difficult to settle the disagreement.
ben is happy to support Paul’s agenda, because it agrees on most of the non-mainstream moving parts that also go into MIRI’s agenda, whereas orgs working on algorithmic bias, say, lack most of those.
I don’t actually know how many inferential steps Paul’s and agent foundations agendas agree on, whether it’s closer to 10% or to 50% or to 90% (and I would love to know more about this) but they do seem to me qualitatively different to things like algorithmic bias.
...we can’t really push Ben to explain why all the moving parts of his baseline are correct—that’s essentially the voice of Pat Modesto. He might legitimately be able to offer no better explanation than that it’s the combined model built through years of thinking, reading, trying and discussing.
I would change the wording of the second sentence to
He might legitimately not be able to fully communicate his models, because it’s built from years of thinking, reading, trying and discussing. Nonetheless, it’s valuable to probe it for its general structure, run consistency checks, and see if can make novel predictions, even if full communication is not reachable.
This seems similar to many experts in all fields (e.g. chess, economics, management). Regarding HPMOR, Eliezer wasn’t able to (or at least didn’t) fully communicate his models of how to write rationalist fiction, and Pat Modesto would certainly say “I can’t see how you get to be so confident in your models, therefore you’re not allowed to be so confident in your models”, but this isn’t a good enough reason for Eliezer not to believe the (subtle) evidence he has observed. And this is borne out by the fact he was able to predictably build something surprising and valuable.
Similarly in this position, it seems perfectly appropriate to ask me things like “What are some examples of the inferential steps you feel confident a research path must understand in order to be correct, and how do you get those?” and also to ask me to make predictions based on this, even if the answer to the first question doesn’t persuade you that I’m correct.
I now understand the key question as being “what baseline of inferential distance should we expect all orgs to have reached?”
Yes—let’s make an analogy to a startup accelerator. Suppose that you have to get 20 inferential steps right in a row to be a successful startup, where (say) 10 of those are about necessary how-to-start-a-startup skills (things like hiring, user interviews, understanding product-market-fit) and 10 of those are details about your particular product. YC wants everyone to have the same first 10, (I think that’s mainly what they select on, but will try to to teach you the rest) but it’s important to have lots of variance in the second set of 10. If most startups fail, it’s good to have lots of good startups trying lots of different products.
In alignment research, the disagreement is what are the fundamentals that we know you definitely require to make sure your alignment research has a chance of being useful (aka ‘actually part of the field of alignment), and what are the bits that we’ll call ‘ongoing genuine disagreement in the field’. Here’s a public note saying I’ll come back later this week to give a guess as to what some of those variables are.
This isn’t “models I use when thinking about the object level alignment problem” or “models I’d use l if I were doing alignment research”. Those are a set of more detailed models of how intelligence works in general, and I do intend to write a post about those sometime.
I now understand the key question as being “what baseline of inferential distance should we expect all orgs to have reached?”. Should they all have internalised deep security mindset? Should they have read the Sequences? Or does it suffice that they’ve read Superintelligence? Or that they have a record of donating to charity? And so forth.
Ben seems to think this baseline is much higher than Albert. Which is why he is happy to support Paul’s agenda, because it agrees on most of the non-mainstream moving parts that also go into MIRI’s agenda, whereas orgs working on algorithmic bias, say, lack most of those. Now in order to settle the debate we can’t really push Ben to explain why all the moving parts of his baseline are correct—that’s essentially the voice of Pat Modesto. He might legitimately be able to offer no better explanation than that it’s the combined model built through years of thinking, reading, trying and discussing. But this also makes it difficult to settle the disagreement.
Mostly agreement, a few minor points:
I don’t actually know how many inferential steps Paul’s and agent foundations agendas agree on, whether it’s closer to 10% or to 50% or to 90% (and I would love to know more about this) but they do seem to me qualitatively different to things like algorithmic bias.
I would change the wording of the second sentence to
This seems similar to many experts in all fields (e.g. chess, economics, management). Regarding HPMOR, Eliezer wasn’t able to (or at least didn’t) fully communicate his models of how to write rationalist fiction, and Pat Modesto would certainly say “I can’t see how you get to be so confident in your models, therefore you’re not allowed to be so confident in your models”, but this isn’t a good enough reason for Eliezer not to believe the (subtle) evidence he has observed. And this is borne out by the fact he was able to predictably build something surprising and valuable.
Similarly in this position, it seems perfectly appropriate to ask me things like “What are some examples of the inferential steps you feel confident a research path must understand in order to be correct, and how do you get those?” and also to ask me to make predictions based on this, even if the answer to the first question doesn’t persuade you that I’m correct.
On this:
Yes—let’s make an analogy to a startup accelerator. Suppose that you have to get 20 inferential steps right in a row to be a successful startup, where (say) 10 of those are about necessary how-to-start-a-startup skills (things like hiring, user interviews, understanding product-market-fit) and 10 of those are details about your particular product. YC wants everyone to have the same first 10, (I think that’s mainly what they select on, but will try to to teach you the rest) but it’s important to have lots of variance in the second set of 10. If most startups fail, it’s good to have lots of good startups trying lots of different products.
In alignment research, the disagreement is what are the fundamentals that we know you definitely require to make sure your alignment research has a chance of being useful (aka ‘actually part of the field of alignment), and what are the bits that we’ll call ‘ongoing genuine disagreement in the field’. Here’s a public note saying I’ll come back later this week to give a guess as to what some of those variables are.
I thought about it for a while, and ended up writing a nearby post: “Models I use when making plans to reduce AI x-risk”.
This isn’t “models I use when thinking about the object level alignment problem” or “models I’d use l if I were doing alignment research”. Those are a set of more detailed models of how intelligence works in general, and I do intend to write a post about those sometime.