Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies. My feeling is that paragraphs like this one, while relevant, don’t provide strong enough arguments to overcome the prior:
As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm’s evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals). Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.
I would greatly appreciate further elaboration on why this is the right problem to be working on right now.
Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies.
On the other hand, trying to solve many things that have a significant probability of being important so that you’re likely to eventually solve something that actually is important as a result, seems like a better idea than not doing anything because you can’t prove that any particular sub-problem is important.
I agree with this principle but think my claims are consistent with it. Doing stuff other than “technical problems in the future of AI” is an alternative worth considering.
Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies.
I disagree. I would go in the other direction: if it seems relatively plausible that something is relevant, then I’m happy to have someone working on it. This is why I am happy to have MIRI work on this and related problems (to the extent that I even attended one of the Lob workshops), even if I do not personally think they are likely to be relevant.
ETA: My main reason for believing this is that, among scientific fields, willingness to entertain such research programs seems to correlate with the overall health of the field.
My question was more of a request for information than a challenge; Eliezer could say some things that would make doing mathematics on the Lob problem look more promising to me. It seems likely to me that I am missing some important aspect of the situation.
If I’m not missing anything major, then I think that, within the realm of AI risk, general strategic work addressing questions like, “Will the world’s elites navigate the creation of AI just fine?” would be preferable. That’s just one example; I do not mean to be claiming that this is the best thing to do. As I said in another comment, it’s very hard to argue that one course is optimal.
For convenience: “SuperBenefit” = increasing the probability that advanced machine intelligence has a positive impact.
I agree that MIRI has a lot left to explain with respect to questions #2 and #3, but it’s easier to explain those issues when we’ve explained #1 already, and we’ve only just begun to do that with AI forecasting, IEM, and Tiling Agents.
Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?
In contrast, it does seem like the relevance of the Tiling Agents work to SuperBenefit is unclear to many people, and that more explanation is needed there. Now that Tiling Agents has been published, Eliezer has begun to explain its relevance to SuperBenefit in various places on this page, but it will take a lot of trial and error for us to discover what is and isn’t clear to people.
As for question #3, we’ve also only just begun to address that issue in detail.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.
Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?
Yes.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.
I basically agree with these comments, with a couple of qualifications.
I think it’s unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn’t make sense for global health charities to be expected to explain why improving global health makes the far future go better. This means MIRI has an asymmetrically hard job, but I do think it’s a reasonable division of labor.
I think it makes sense for other people who care about the far future to evaluate how the other strategies you mentioned are expected to affect the far future, and try to find the best ones. There is an overwhelming amount of work to do.
I think it’s unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn’t make sense for global health charities to be expected to explain why improving global health makes the far future go better.
Right. Very few charities are even claiming to be good for the far future. So there’s an asymmetry between MIRI and other charities w.r.t. responsibility to explain plausible effects on the far future. But among parties (including MIRI) who care principally about the far future and are trying to do something about it, there seems to be no such asymmetry — except for other reasons, e.g. asymmetry in resource use.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
I agree with this. Typically people justify their research on other grounds than this, for instance by identifying an obstacle to progress and showing how their approach might overcome it in a way that previously tried approaches were not able to. My impression is that one reason for doing this is that it is typically much easier to communicate along these lines, because it brings the discourse towards much more familiar technical questions while still correlating well with progress more generally.
Note that under this paradigm, the main thing MIRI needs to do to justify their work is to explain why the Lob obstacle is insufficiently addressed by other approaches (for instance, statistical learning theory). I would actually be very interested in understanding the relationship of statistics to the Lob obstacle, so look forward to any writeup that might exist in the future.
Another aspect of where I’m coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won’t be relevant for distant future technologies. My feeling is that paragraphs like this one, while relevant, don’t provide strong enough arguments to overcome the prior:
I would greatly appreciate further elaboration on why this is the right problem to be working on right now.
On the other hand, trying to solve many things that have a significant probability of being important so that you’re likely to eventually solve something that actually is important as a result, seems like a better idea than not doing anything because you can’t prove that any particular sub-problem is important.
I agree with this principle but think my claims are consistent with it. Doing stuff other than “technical problems in the future of AI” is an alternative worth considering.
I disagree. I would go in the other direction: if it seems relatively plausible that something is relevant, then I’m happy to have someone working on it. This is why I am happy to have MIRI work on this and related problems (to the extent that I even attended one of the Lob workshops), even if I do not personally think they are likely to be relevant.
ETA: My main reason for believing this is that, among scientific fields, willingness to entertain such research programs seems to correlate with the overall health of the field.
Are there other problems that you think it would be better to be working on now?
My question was more of a request for information than a challenge; Eliezer could say some things that would make doing mathematics on the Lob problem look more promising to me. It seems likely to me that I am missing some important aspect of the situation.
If I’m not missing anything major, then I think that, within the realm of AI risk, general strategic work addressing questions like, “Will the world’s elites navigate the creation of AI just fine?” would be preferable. That’s just one example; I do not mean to be claiming that this is the best thing to do. As I said in another comment, it’s very hard to argue that one course is optimal.
Thanks, that, at least for me, provides more context for your questions.
All that said, I do think that where MIRI has technical FAI questions to work on now, I think it is a very reasonable to write up:
what the question is
why answering the question is important for making machine intelligence benefit humanity
why we shouldn’t expect the question to be answered by default by whoever makes AGI
In this particular case, I am asking for more info about the second two questions.
For convenience: “SuperBenefit” = increasing the probability that advanced machine intelligence has a positive impact.
I agree that MIRI has a lot left to explain with respect to questions #2 and #3, but it’s easier to explain those issues when we’ve explained #1 already, and we’ve only just begun to do that with AI forecasting, IEM, and Tiling Agents.
Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?
In contrast, it does seem like the relevance of the Tiling Agents work to SuperBenefit is unclear to many people, and that more explanation is needed there. Now that Tiling Agents has been published, Eliezer has begun to explain its relevance to SuperBenefit in various places on this page, but it will take a lot of trial and error for us to discover what is and isn’t clear to people.
As for question #3, we’ve also only just begun to address that issue in detail.
So, MIRI still has a lot of explaining to do, and we’re working on it. But allow me a brief reminder that this gap isn’t unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.
E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.
Yes.
I basically agree with these comments, with a couple of qualifications.
I think it’s unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn’t make sense for global health charities to be expected to explain why improving global health makes the far future go better. This means MIRI has an asymmetrically hard job, but I do think it’s a reasonable division of labor.
I think it makes sense for other people who care about the far future to evaluate how the other strategies you mentioned are expected to affect the far future, and try to find the best ones. There is an overwhelming amount of work to do.
Right. Very few charities are even claiming to be good for the far future. So there’s an asymmetry between MIRI and other charities w.r.t. responsibility to explain plausible effects on the far future. But among parties (including MIRI) who care principally about the far future and are trying to do something about it, there seems to be no such asymmetry — except for other reasons, e.g. asymmetry in resource use.
Yes.
I agree with this. Typically people justify their research on other grounds than this, for instance by identifying an obstacle to progress and showing how their approach might overcome it in a way that previously tried approaches were not able to. My impression is that one reason for doing this is that it is typically much easier to communicate along these lines, because it brings the discourse towards much more familiar technical questions while still correlating well with progress more generally.
Note that under this paradigm, the main thing MIRI needs to do to justify their work is to explain why the Lob obstacle is insufficiently addressed by other approaches (for instance, statistical learning theory). I would actually be very interested in understanding the relationship of statistics to the Lob obstacle, so look forward to any writeup that might exist in the future.