This list of benefit logically pushed multiple people to argue that we should make AI Alignment paradigmatic.
Who? It would be helpful to have some links so I can go read what they said.
I disagree. Or to be more accurate, I agree that we should have paradigms in the field, but I think that they should be part of a bigger epistemological structure. Indeed, a naive search for a paradigm either results in a natural science-like paradigm, that put too little emphasis on applications and usefulness, or in a premature constraint on the problem we’re trying to solve.
In the simple model of paradigms and fields, there is some pre-existing division into fields, and then for each field it can either be pardigmatic or non-paradigmatic, and if it is non-paradigmatic it can contain multiple paradigms unified in some overall structure, or not. I’d like to go to a more complicated model: There’s a big space of research being done, and some of the research naturally lumps together into paradigms, and we carve up the space of research into “fields” at least partly influenced by where the paradigms are—a sufficiently large paradigm will be called a field, for example. Fields can have sub-fields, and paradigms can have sub-paradigms. Also, paradigmaticness is not a binary property; there are some lumps of research that are in a grey area, depending on how organized they are, how unified and internally self-aware they are in some sense.
On this more complicated (but IMO more accurate) model, your post is itself an attempt to make AI alignment paradigmatic! After all, you are saying we should have multiple paradigms (i.e. you push to make parts of AI alignment more paradigmatic) and that they should fit together into this overall epistemic structure you propose. Insofar as your proposed epistemic structure is more substantial than the default epistemic structure that always exists between paradigms (e.g. the one that exists now) then it’s an attempt make the whole of AI alignment more paradigmatic too, even if not maximally paradigmatic.
Of course, that’s not necessarily a bad thing—your search for a paradigm is not naive, and the paradigm you propose is flexible and noncommital (i.e. not-maximally-paradigmatic?) enough that it should be able to avoid the problems you highlight. (I like the paradigm you propose! It seems like a fairly solid, safe first step.)
I think you could instead have structured your post like this:
1. Against Premature Paradigmitization: [Argues that when a body of ongoing research is sufficiently young/confused, pushing to paradigmitize it results in bad assumptions being snuck in, bad constraints on the problem, too little attention on what actually matters, etc. Gives some examples.]
2. Paradigmiticization of Alignment is Premature: [Argues that it would be premature to push for paratigmization now. Maybe lists some major paradigms or proto-paradigms proposed by various people and explains why it would be bad to make any one of them The King. Maybe argues that in general it’s best to let these things happen naturally than to push for them.]
I think overall my reaction is: This is too meta; can you point to any specific, concrete things people are doing that they should do differently? For example, I think of Richard Ngo’s “AI Safety from First Principles,” Bostrom’s Superintelligence, maybe Christiano’s stuff, MIRI’s stuff, and CAIS as attempts to build paradigms that (if things go as well as their authors hope) could become The Big Paradigm we All Follow. Are you saying people should stop trying to write things like this? Probably not… so then what are you recommending? That people not get too invested into any one particular paradigm, and start thinking it is The One, until we’ve had more time to process everything? Well, I feel like people are pretty good about that already.
I very much like your idea of testing this out. It’ll be hard to test, since it’s up to your subjective judgment of how useful this way of thinking is, but it’s worth a shot! I’ll be looking forward to the results.
Who? It would be helpful to have some links so I can go read what they said.
That was one of my big frustrations when writing this post: I only saw this topic pop up in personal conversation, not really in published posts. And so I didn’t want to give names of people who just discussed that with me on a zoom call or in a chat. But I totally feel you—I’m always annoyed by posts that pretend to answer a criticism without pointing to it.
On this more complicated (but IMO more accurate) model, your post is itself an attempt to make AI alignment paradigmatic! After all, you are saying we should have multiple paradigms (i.e. you push to make parts of AI alignment more paradigmatic) and that they should fit together into this overall epistemic structure you propose. Insofar as your proposed epistemic structure is more substantial than the default epistemic structure that always exists between paradigms (e.g. the one that exists now) then it’s an attempt make the whole of AI alignment more paradigmatic too, even if not maximally paradigmatic.
Of course, that’s not necessarily a bad thing—your search for a paradigm is not naive, and the paradigm you propose is flexible and noncommital (i.e. not-maximally-paradigmatic?) enough that it should be able to avoid the problems you highlight. (I like the paradigm you propose! It seems like a fairly solid, safe first step.)
That’s a really impressive comment, because my last rewrite of the post was exactly to hint that this was the “right way” (in my opinion) to make the field paradigmatic, instead of arguing that AI Alignment should be made paradigmatic (what my previous draft attempted). So I basically agree with what you say.
I think you could instead have structured your post like this:
1. Against Premature Paradigmitization: [Argues that when a body of ongoing research is sufficiently young/confused, pushing to paradigmitize it results in bad assumptions being snuck in, bad constraints on the problem, too little attention on what actually matters, etc. Gives some examples.]
2. Paradigmiticization of Alignment is Premature: [Argues that it would be premature to push for paratigmization now. Maybe lists some major paradigms or proto-paradigms proposed by various people and explains why it would be bad to make any one of them The King. Maybe argues that in general it’s best to let these things happen naturally than to push for them.]
If I agreed with what you wrote before, this part strikes me as quite different from what I’m saying. Or more like you’re only focusing on one aspect. Because I actually argue for two things:
That we should have a paradigm of the “AIs” part, a paradigm of the “well-behaved” part, and from that we get a paradigm of the solving part. This has nothing to do with the field being young and/or confused, and all about the field being focused on solving a problem. (That’s the part I feel your version is missing)
That in the current state of our knowledge, fixing those paradigms is too early; we should instead do more work on comparing and extending multiple paradigms for each of the “slots” from the previous point, and similarly have a go at solving different variants of the problem. That’s what you mostly get right.
It’s partly my fault, because I’m not stating it that way.
I think overall my reaction is: This is too meta; can you point to any specific, concrete things people are doing that they should do differently? For example, I think of Richard Ngo’s “AI Safety from First Principles,” Bostrom’s Superintelligence, maybe Christiano’s stuff, MIRI’s stuff, and CAIS as attempts to build paradigms that (if things go as well as their authors hope) could become The Big Paradigm we All Follow. Are you saying people should stop trying to write things like this? Probably not… so then what are you recommending? That people not get too invested into any one particular paradigm, and start thinking it is The One, until we’ve had more time to process everything? Well, I feel like people are pretty good about that already.
My point about this is that thinking of your examples as “big paradigms of AI” is the source of the confusion, and a massive problem within the field. If we use my framing instead, then you can split these big proposals into their paradigm for “AIs”, their paradigm for “well-behaved”, and so the paradigm for the solving part. This actually show you where they agree and where they disagree. If you’re trying to build a new perspective on AI Alignment, then I also think my framing is a good lens to crystallize your insatisfactions with the current proposals.
Ultimately, this framing is a tool of philosophy of science, and so it probably won’t be useful to anyone not doing philosophy of science. The catch is that we all do a bit of philosophy of science regularly: when trying to decide what to work on, when interpreting work, when building these research agendas and proposals. I hope that this tool will help on these occasions.
I very much like your idea of testing this out. It’ll be hard to test, since it’s up to your subjective judgment of how useful this way of thinking is, but it’s worth a shot! I’ll be looking forward to the results.
That’s why I asked to people who are not as invested in this framing (and can be quite critical) to help me do these reviews—hopefully that will help make them less biased! (We also choose some posts specifically because they didn’t fit neatly into my framing).
Who? It would be helpful to have some links so I can go read what they said.
In the simple model of paradigms and fields, there is some pre-existing division into fields, and then for each field it can either be pardigmatic or non-paradigmatic, and if it is non-paradigmatic it can contain multiple paradigms unified in some overall structure, or not. I’d like to go to a more complicated model: There’s a big space of research being done, and some of the research naturally lumps together into paradigms, and we carve up the space of research into “fields” at least partly influenced by where the paradigms are—a sufficiently large paradigm will be called a field, for example. Fields can have sub-fields, and paradigms can have sub-paradigms. Also, paradigmaticness is not a binary property; there are some lumps of research that are in a grey area, depending on how organized they are, how unified and internally self-aware they are in some sense.
On this more complicated (but IMO more accurate) model, your post is itself an attempt to make AI alignment paradigmatic! After all, you are saying we should have multiple paradigms (i.e. you push to make parts of AI alignment more paradigmatic) and that they should fit together into this overall epistemic structure you propose. Insofar as your proposed epistemic structure is more substantial than the default epistemic structure that always exists between paradigms (e.g. the one that exists now) then it’s an attempt make the whole of AI alignment more paradigmatic too, even if not maximally paradigmatic.
Of course, that’s not necessarily a bad thing—your search for a paradigm is not naive, and the paradigm you propose is flexible and noncommital (i.e. not-maximally-paradigmatic?) enough that it should be able to avoid the problems you highlight. (I like the paradigm you propose! It seems like a fairly solid, safe first step.)
I think you could instead have structured your post like this:
1. Against Premature Paradigmitization: [Argues that when a body of ongoing research is sufficiently young/confused, pushing to paradigmitize it results in bad assumptions being snuck in, bad constraints on the problem, too little attention on what actually matters, etc. Gives some examples.]
2. Paradigmiticization of Alignment is Premature: [Argues that it would be premature to push for paratigmization now. Maybe lists some major paradigms or proto-paradigms proposed by various people and explains why it would be bad to make any one of them The King. Maybe argues that in general it’s best to let these things happen naturally than to push for them.]
I think overall my reaction is: This is too meta; can you point to any specific, concrete things people are doing that they should do differently? For example, I think of Richard Ngo’s “AI Safety from First Principles,” Bostrom’s Superintelligence, maybe Christiano’s stuff, MIRI’s stuff, and CAIS as attempts to build paradigms that (if things go as well as their authors hope) could become The Big Paradigm we All Follow. Are you saying people should stop trying to write things like this? Probably not… so then what are you recommending? That people not get too invested into any one particular paradigm, and start thinking it is The One, until we’ve had more time to process everything? Well, I feel like people are pretty good about that already.
I very much like your idea of testing this out. It’ll be hard to test, since it’s up to your subjective judgment of how useful this way of thinking is, but it’s worth a shot! I’ll be looking forward to the results.
Thanks for the feedback!
That was one of my big frustrations when writing this post: I only saw this topic pop up in personal conversation, not really in published posts. And so I didn’t want to give names of people who just discussed that with me on a zoom call or in a chat. But I totally feel you—I’m always annoyed by posts that pretend to answer a criticism without pointing to it.
That’s a really impressive comment, because my last rewrite of the post was exactly to hint that this was the “right way” (in my opinion) to make the field paradigmatic, instead of arguing that AI Alignment should be made paradigmatic (what my previous draft attempted). So I basically agree with what you say.
If I agreed with what you wrote before, this part strikes me as quite different from what I’m saying. Or more like you’re only focusing on one aspect. Because I actually argue for two things:
That we should have a paradigm of the “AIs” part, a paradigm of the “well-behaved” part, and from that we get a paradigm of the solving part. This has nothing to do with the field being young and/or confused, and all about the field being focused on solving a problem. (That’s the part I feel your version is missing)
That in the current state of our knowledge, fixing those paradigms is too early; we should instead do more work on comparing and extending multiple paradigms for each of the “slots” from the previous point, and similarly have a go at solving different variants of the problem. That’s what you mostly get right.
It’s partly my fault, because I’m not stating it that way.
My point about this is that thinking of your examples as “big paradigms of AI” is the source of the confusion, and a massive problem within the field. If we use my framing instead, then you can split these big proposals into their paradigm for “AIs”, their paradigm for “well-behaved”, and so the paradigm for the solving part. This actually show you where they agree and where they disagree. If you’re trying to build a new perspective on AI Alignment, then I also think my framing is a good lens to crystallize your insatisfactions with the current proposals.
Ultimately, this framing is a tool of philosophy of science, and so it probably won’t be useful to anyone not doing philosophy of science. The catch is that we all do a bit of philosophy of science regularly: when trying to decide what to work on, when interpreting work, when building these research agendas and proposals. I hope that this tool will help on these occasions.
That’s why I asked to people who are not as invested in this framing (and can be quite critical) to help me do these reviews—hopefully that will help make them less biased! (We also choose some posts specifically because they didn’t fit neatly into my framing).