This post is about epistemics, not about safety techniques, which are covered in later parts of the sequence. Machine learning, specifically deep learning, is the dominant paradigm that people believe will lead to AGI. The researchers who are advancing the machine learning field have proven quite good at doing so, insofar as they have created rapid capabilities advancements. This post sought to give an overview of how they do this, which is in my view extremely useful information! We strongly do not favor advancements of capabilities in the name of safety, and that is very clear in the rest of this sequence. But it seems especially odd to say that one should not even talk about how capabilities have been advanced.
The amount of capabilities research is simply far greater than safety research. Thus to answer the question “what kind of research approaches generally work for shaping machine learning systems?” it is quite useful to engage with how they have worked in capabilities advancements. In machine learning, theoretical (in “math proofs” sense of the word) approaches to advancing capabilities have largely not worked. This suggests deep learning is not amenable to these kinds of approaches.
It sounds like you believe that these approaches will be necessary for AI safety, so no amount of knowledge of their inefficacy should persuade us in favor of more iterative, engineering approaches. To put it another way: if iterative engineering practices will never ensure safety, then it does not matter if they worked for capabilities, so we should be working mainly on theory.
I pretty much agree with the logic of this, but I don’t agree with the premise: I don’t think it’s the case that iterative engineering practices will never ensure safety. The reasons for this are covered in this sequence. Theory, on the other hand, is a lot more ironclad than iterative engineering approaches, if useable theory could actually be produced. Knowledge that useable theory has not really been produced in deep learning suggests to me that it’s unlikely to for safety, either. Thus, to me, an interative engineering approach appears to be more favorable despite the fact that it leaves more room for error. However, I think that if you believe that iterative engineering approachs will never work, then indeed you should work on theory despite what should be a very strong prior (based on what is said in this post) that theory will also not work.
This post sought to give an overview of how they do this, which is in my view extremely useful information!
This is what I was trying to question with my comment above: Why do you think this? How am I to use this information? It’s surely true that this is a community that needs to be convinced of the importance of work on safety, as you point out in the next post in the sequence, but how does information about, say, the turnover of ML PhD students help me do that?
Thus to answer the question “what kind of research approaches generally work for shaping machine learning systems?” it is quite useful to engage with how they have worked in capabilities advancements. In machine learning, theoretical (in “math proofs” sense of the word) approaches to advancing capabilities have largely not worked. This suggests deep learning is not amenable to these kinds of approaches.
There is conflation happening here which undermines your argument: theoretical approaches dominated how machine learning systems were shaped for decades, and you say so at the start of this post. It turned out that automated learning produced better results in terms of capabilities, and it is that success that makes it the continued default. But the former fact surely says a lot more about whether or not theory can “shape machine learning systems” than the latter. Following through with your argument, I would instead conclude that implementing theoretical approaches to safety might require us to compromise on capabilities, and this is indeed exactly what I expect: learning systems would have access to much more delicious data if they ignored privacy regulations and other similar ethical boundaries, but safety demands that capability is not the singular shaping consideration in AI systems.
Knowledge that useable theory has not really been produced in deep learning suggests to me that it’s unlikely to for safety, either.
This is simply not true. Failure modes which were identified by purely theoretical arguments have been realised in ML systems. System attacks and pathological behaviour (for image classifiers, say) are regularly built in theory before they ever meet real systems. It’s also worth noting that any architecture choices or to, say, make backprop more algorithmically efficient, are driven by theory.
In the end, my attitude is not that “iterative engineering practices will never ensure safety”, but rather that there are plenty of people already doing iterative engineering, and that while it’s great to convince as many of those as possible to be safety-conscious, there would be further benefits to safety if some of their experience could be applied to the theoretical approaches that you’re actively dismissing.
(Not reviewed by Dan Hendrycks.)
This post is about epistemics, not about safety techniques, which are covered in later parts of the sequence. Machine learning, specifically deep learning, is the dominant paradigm that people believe will lead to AGI. The researchers who are advancing the machine learning field have proven quite good at doing so, insofar as they have created rapid capabilities advancements. This post sought to give an overview of how they do this, which is in my view extremely useful information! We strongly do not favor advancements of capabilities in the name of safety, and that is very clear in the rest of this sequence. But it seems especially odd to say that one should not even talk about how capabilities have been advanced.
The amount of capabilities research is simply far greater than safety research. Thus to answer the question “what kind of research approaches generally work for shaping machine learning systems?” it is quite useful to engage with how they have worked in capabilities advancements. In machine learning, theoretical (in “math proofs” sense of the word) approaches to advancing capabilities have largely not worked. This suggests deep learning is not amenable to these kinds of approaches.
It sounds like you believe that these approaches will be necessary for AI safety, so no amount of knowledge of their inefficacy should persuade us in favor of more iterative, engineering approaches. To put it another way: if iterative engineering practices will never ensure safety, then it does not matter if they worked for capabilities, so we should be working mainly on theory.
I pretty much agree with the logic of this, but I don’t agree with the premise: I don’t think it’s the case that iterative engineering practices will never ensure safety. The reasons for this are covered in this sequence. Theory, on the other hand, is a lot more ironclad than iterative engineering approaches, if useable theory could actually be produced. Knowledge that useable theory has not really been produced in deep learning suggests to me that it’s unlikely to for safety, either. Thus, to me, an interative engineering approach appears to be more favorable despite the fact that it leaves more room for error. However, I think that if you believe that iterative engineering approachs will never work, then indeed you should work on theory despite what should be a very strong prior (based on what is said in this post) that theory will also not work.
This is what I was trying to question with my comment above: Why do you think this? How am I to use this information? It’s surely true that this is a community that needs to be convinced of the importance of work on safety, as you point out in the next post in the sequence, but how does information about, say, the turnover of ML PhD students help me do that?
There is conflation happening here which undermines your argument: theoretical approaches dominated how machine learning systems were shaped for decades, and you say so at the start of this post. It turned out that automated learning produced better results in terms of capabilities, and it is that success that makes it the continued default. But the former fact surely says a lot more about whether or not theory can “shape machine learning systems” than the latter. Following through with your argument, I would instead conclude that implementing theoretical approaches to safety might require us to compromise on capabilities, and this is indeed exactly what I expect: learning systems would have access to much more delicious data if they ignored privacy regulations and other similar ethical boundaries, but safety demands that capability is not the singular shaping consideration in AI systems.
This is simply not true. Failure modes which were identified by purely theoretical arguments have been realised in ML systems. System attacks and pathological behaviour (for image classifiers, say) are regularly built in theory before they ever meet real systems. It’s also worth noting that any architecture choices or to, say, make backprop more algorithmically efficient, are driven by theory.
In the end, my attitude is not that “iterative engineering practices will never ensure safety”, but rather that there are plenty of people already doing iterative engineering, and that while it’s great to convince as many of those as possible to be safety-conscious, there would be further benefits to safety if some of their experience could be applied to the theoretical approaches that you’re actively dismissing.