Not planning to answer more on this thread, but given how my last messages seem to have confused you, here is my last attempt of sharing my mental model (so you can flag in an answer where I’m wrong in your opinion for readers of this thread)
Also, I just checked on the publication list, and I’ve read or skimmed most things MIRI published since 2014 (including most newsletters and blog posts on MIRI website).
My model of MIRI is that initially, there was a bunch of people including EY who were working mostly on decision theory stuff, tiling, model theory, the sort of stuff I was pointing at. That predates Nate’s arrival, but in my model it becomes far more legible after that (so circa 2014/2015). In my model, I call that “old school MIRI”, and that was a big chunk of what I was pointing out in my original comment.
Then there are a bunch of thing that seem to have happened:
Newer people (Abram and Scott come to mind, but mostly because they’re the one who post on the AF and who I’ve talked to) join this old-school MIRI approach and reshape it into Embedded Agency. Now this new agenda is a bit different from the old-school MIRI work, but I feel like it’s still not that far from decision theory and logic (with maybe a stronger emphasis on the bayesian part for stuff like logical induction). That might be a part where we’re disagreeing.
A direction related to embedded agency and the decision theory and logic stuff, but focused on implementations through strongly typed programming languages like Haskell and type theory. That’s technically practical, but in my mental model this goes in the same category as “decision theory and logic stuff”, especially because that sort of programming is very close to logic and natural deduction.
MIRI starts it’s ML-focused agenda, which you already mentioned. The impression I still have is that this didn’t lead to much published work that was actually experimental, instead focusing on recasting questions of alignment through ML theory. But I’ve updated towards thinking MIRI has invested efforts into looking at stuff from a more prosaic angle, based on looking more into what has been published there, because some of these ML papers had flown under my radar (there’s also the difficulty that when I read a paper by someone who has a position elsewhere now — say Ryan Carey or Stuart Armstrong — I don’t think MIRI but I think of their current affiliation, even though the work was supported by MIRI (and apparently Stuart is still supported by MIRI)). This is the part of the model where I expect that we might have very different models because of your knowledge of what was being done internally and never released.
Some new people hired by MIRI fall into what I call the “Bells Lab MIRI” model, where MIRI just hires/funds people that have different approaches from them, but who they think are really bright (Evan and Vanessa come to mind, although I don’t know if that’s the though process that went into hiring them).
Based on that model and some feedback and impressions I’ve gathered from people of some MIRI researchers being very doubtful of experimental work, that lead to my “all experimental work is useless”. I tried to include Redwood and Chris Olah’s work in there with the caveat (which is a weird model but makes sense if you have a strong prior for “experimental work is useless for MIRI”).
Our discussion made me think that there’s probably far better generators for this general criticism of experimental work, and that they would actually make more sense than “experimental work is useless except this and that”.
Not planning to answer more on this thread, but given how my last messages seem to have confused you, here is my last attempt of sharing my mental model (so you can flag in an answer where I’m wrong in your opinion for readers of this thread)
Also, I just checked on the publication list, and I’ve read or skimmed most things MIRI published since 2014 (including most newsletters and blog posts on MIRI website).
My model of MIRI is that initially, there was a bunch of people including EY who were working mostly on decision theory stuff, tiling, model theory, the sort of stuff I was pointing at. That predates Nate’s arrival, but in my model it becomes far more legible after that (so circa 2014/2015). In my model, I call that “old school MIRI”, and that was a big chunk of what I was pointing out in my original comment.
Then there are a bunch of thing that seem to have happened:
Newer people (Abram and Scott come to mind, but mostly because they’re the one who post on the AF and who I’ve talked to) join this old-school MIRI approach and reshape it into Embedded Agency. Now this new agenda is a bit different from the old-school MIRI work, but I feel like it’s still not that far from decision theory and logic (with maybe a stronger emphasis on the bayesian part for stuff like logical induction). That might be a part where we’re disagreeing.
A direction related to embedded agency and the decision theory and logic stuff, but focused on implementations through strongly typed programming languages like Haskell and type theory. That’s technically practical, but in my mental model this goes in the same category as “decision theory and logic stuff”, especially because that sort of programming is very close to logic and natural deduction.
MIRI starts it’s ML-focused agenda, which you already mentioned. The impression I still have is that this didn’t lead to much published work that was actually experimental, instead focusing on recasting questions of alignment through ML theory. But I’ve updated towards thinking MIRI has invested efforts into looking at stuff from a more prosaic angle, based on looking more into what has been published there, because some of these ML papers had flown under my radar (there’s also the difficulty that when I read a paper by someone who has a position elsewhere now — say Ryan Carey or Stuart Armstrong — I don’t think MIRI but I think of their current affiliation, even though the work was supported by MIRI (and apparently Stuart is still supported by MIRI)). This is the part of the model where I expect that we might have very different models because of your knowledge of what was being done internally and never released.
Some new people hired by MIRI fall into what I call the “Bells Lab MIRI” model, where MIRI just hires/funds people that have different approaches from them, but who they think are really bright (Evan and Vanessa come to mind, although I don’t know if that’s the though process that went into hiring them).
Based on that model and some feedback and impressions I’ve gathered from people of some MIRI researchers being very doubtful of experimental work, that lead to my “all experimental work is useless”. I tried to include Redwood and Chris Olah’s work in there with the caveat (which is a weird model but makes sense if you have a strong prior for “experimental work is useless for MIRI”).
Our discussion made me think that there’s probably far better generators for this general criticism of experimental work, and that they would actually make more sense than “experimental work is useless except this and that”.