I think the preparadigmatic science frame has been overrated by this community compared to case studies of complex engineering like the Apollo program. But I do think it will be increasingly useful as we continue to develop capability evals, and even more so as we become able to usefully measure and iterate on agency, misalignment, control, and other qualities crucial to the value of the future.
That’s very interesting—could you talk a bit more about that? I have a guess about why, but would rather hear it straight than risk poisoning the context.
Why I think it’s overrated? I basically have five reasons:
Thomas Kuhn’s ideas are not universally accepted and don’t have clear empirical support apart from the case studies in the book. Someone could change my mind about this by showing me a study operationalizing “paradigm”, “normal science”, etc. and using data since the 1960s to either support or improve Kuhn’s original ideas.
AI safety has the goal of producing a particular artifact, a superintelligence that’s good for humanity. Much of Kuhn’s writing relates to scientific fields motivated by discovery, like physics, where people can be in complete disagreement about ends (what progress means, what it means to explain something, etc) without shared frames. But in AI safety we agree much more about ends and are confused about means.
In physics you are very often able to discover some concept like ‘temperature’ such that the world follows very simple, elegant laws in terms of that concept, and Occam’s razor carries you far, perhaps after you do some difficult math. ML is already very empirical and I would expect agents to be hard to predict and complex, so I’d guess that future theories of agents will not be as elegant as physics, more like biology. This means that more of the work will happen after we mostly understand what’s going on at a high level—and so researchers know how to communicate—but don’t know the exact mechanisms and so can’t get the properties we want.
Until now we haven’t had artificial agents to study, so we don’t have the tools to start developing theories of agency, alignment, etc. that make testable predictions. We do have somewhat capable AIs though, which has allowed AI interpretability to get off the ground, so I think the Kuhnian view is more applicable to interpretability than a different area of alignment or alignment as a whole.
I think the preparadigmatic science frame has been overrated by this community compared to case studies of complex engineering like the Apollo program. But I do think it will be increasingly useful as we continue to develop capability evals, and even more so as we become able to usefully measure and iterate on agency, misalignment, control, and other qualities crucial to the value of the future.
That’s very interesting—could you talk a bit more about that? I have a guess about why, but would rather hear it straight than risk poisoning the context.
Why I think it’s overrated? I basically have five reasons:
Thomas Kuhn’s ideas are not universally accepted and don’t have clear empirical support apart from the case studies in the book. Someone could change my mind about this by showing me a study operationalizing “paradigm”, “normal science”, etc. and using data since the 1960s to either support or improve Kuhn’s original ideas.
Terms like “preparadigmatic” often cause misunderstanding or miscommunication here.
AI safety has the goal of producing a particular artifact, a superintelligence that’s good for humanity. Much of Kuhn’s writing relates to scientific fields motivated by discovery, like physics, where people can be in complete disagreement about ends (what progress means, what it means to explain something, etc) without shared frames. But in AI safety we agree much more about ends and are confused about means.
In physics you are very often able to discover some concept like ‘temperature’ such that the world follows very simple, elegant laws in terms of that concept, and Occam’s razor carries you far, perhaps after you do some difficult math. ML is already very empirical and I would expect agents to be hard to predict and complex, so I’d guess that future theories of agents will not be as elegant as physics, more like biology. This means that more of the work will happen after we mostly understand what’s going on at a high level—and so researchers know how to communicate—but don’t know the exact mechanisms and so can’t get the properties we want.
Until now we haven’t had artificial agents to study, so we don’t have the tools to start developing theories of agency, alignment, etc. that make testable predictions. We do have somewhat capable AIs though, which has allowed AI interpretability to get off the ground, so I think the Kuhnian view is more applicable to interpretability than a different area of alignment or alignment as a whole.
Dunno if this is a complete answer but Thomas Kwa had a shortform awhile back arguing against at least some uses of “preparadigmatic”
https://www.lesswrong.com/posts/Zr37dY5YPRT6s56jY/thomas-kwa-s-shortform?commentId=mpEfpinZi2wH8H3Hb