This post sought to give an overview of how they do this, which is in my view extremely useful information!
This is what I was trying to question with my comment above: Why do you think this? How am I to use this information? It’s surely true that this is a community that needs to be convinced of the importance of work on safety, as you point out in the next post in the sequence, but how does information about, say, the turnover of ML PhD students help me do that?
Thus to answer the question “what kind of research approaches generally work for shaping machine learning systems?” it is quite useful to engage with how they have worked in capabilities advancements. In machine learning, theoretical (in “math proofs” sense of the word) approaches to advancing capabilities have largely not worked. This suggests deep learning is not amenable to these kinds of approaches.
There is conflation happening here which undermines your argument: theoretical approaches dominated how machine learning systems were shaped for decades, and you say so at the start of this post. It turned out that automated learning produced better results in terms of capabilities, and it is that success that makes it the continued default. But the former fact surely says a lot more about whether or not theory can “shape machine learning systems” than the latter. Following through with your argument, I would instead conclude that implementing theoretical approaches to safety might require us to compromise on capabilities, and this is indeed exactly what I expect: learning systems would have access to much more delicious data if they ignored privacy regulations and other similar ethical boundaries, but safety demands that capability is not the singular shaping consideration in AI systems.
Knowledge that useable theory has not really been produced in deep learning suggests to me that it’s unlikely to for safety, either.
This is simply not true. Failure modes which were identified by purely theoretical arguments have been realised in ML systems. System attacks and pathological behaviour (for image classifiers, say) are regularly built in theory before they ever meet real systems. It’s also worth noting that any architecture choices or to, say, make backprop more algorithmically efficient, are driven by theory.
In the end, my attitude is not that “iterative engineering practices will never ensure safety”, but rather that there are plenty of people already doing iterative engineering, and that while it’s great to convince as many of those as possible to be safety-conscious, there would be further benefits to safety if some of their experience could be applied to the theoretical approaches that you’re actively dismissing.
I can say from first and second-hand experience that a hard part of supervising a PhD or Masters student in research (there are many) is taking someone who lies at one end of the bird-frog spectrum and pushing them to acquire the skills they need from the other end. To get to the point of pursuing research in the first place, you’re likely to be either someone technically skilled who can easily work out the fine details of a problem and habitually focuses on examples or someone who has enough of an appreciation for the overarching ideas to be motivated to build them further—it sounds like you are/were of the latter variety. If you don’t acquire some skills and perspective from the other end, you’ll inevitably drive yourself into a dead end: in the former case, one risks sinking much time into elaborating specific cases while missing a general result that simplifies matters; in the latter, one can work for a long time on a false claim because there is insufficient grounding in verifiable cases.
At the start of a research career, a responsible supervisor must push their student to be independent, but there is a compromise between giving space and giving guidance. It seems like your adviser wasn’t paying close enough attention to your work to see that you hadn’t done the basics, which is how you ended up spending so long on this without realising that you didn’t have an ‘empirical’ basis for what you were trying to prove in the first place. The fact that you weren’t getting pushback on your reluctance to read references also seems like a red flag.
All this is to say that a moral of the story could be for PhD supervisors (who, by the way, almost universally get not specific training for that role): just because a student is confident doesn’t mean they have everything it takes to do research, and you need to make sure that they aren’t wasting their time.