Rather than Popper, we’re probably more likely to go with Kuhn and call this “pre-paradigmatic.” Studying something without doing science experiments isn’t the real problem (history departments do fine, as does math, as do engineers designing something new), the problem is that we don’t have a convenient and successful way of packaging the problems and expected solutions (a paradigm).
That said, it’s not like people aren’t trying. Some papers that I think represent good (totally non-sciency) work are Quantilizers, Logical Induction, and Cooperative Inverse Reinforcement Learning. These are all from a while ago, but that’s because I picked things that have stood the test of time.
Thanks for your reply. Popper-falsifiable does not mean experiment-based in my books. Math is falsifiable—you can present a counterexample, error in reasoning, a paradoxical result, etc. Similarly to history, you can often falsify certain claims by providing evidence against. But you can not falsify a field where every definition is hand-waved and nothing is specified in detail. I agree that AI Alignment has pre-paradigmic features as far as Kuhn goes. But Kuhn also says that pre-paradigmic science is rarely rigorous or true, even though it might produce some results that will lead to something interesting in the future.
Yeah, but also this is the sort of response that goes better with citations.
Like, people used to make a somewhat hand-wavy argument that AIs trained on goal X might become consequentialists which pursued goal Y, and gave the analogy of the time when humans ‘woke up’ inside of evolution, and now are optimizing for goals different from evolution’s goals, despite having ‘perfect training’ in some sense (and the ability to notice the existence of evolution, and its goals). Then eventually someone wrote Risks from Learned Optimization in Advanced Machine Learning Systems in a way that I think involves substantially less hand-waving and substantially more specification in detail.
Of course there are still parts that remain to be specified in detail—either because no one has written it up yet (Risks from Learned Optimization came from, in part, someone relatively new to the field saying “I don’t think this hand-wavy argument checks out”, looking into it a bunch, being convinced, and then writing it up in detail), or because we don’t know what we’re looking for yet. (We have a somewhat formal definition of ‘corrigiblity’, but is it the thing that we actually want in our AI designs? It’s not yet clear.)
In terms of trying to formulate rigorous and consistent definitions, a major goal of the Causal Incentives Working Group is to analyse features of different problems using consistent definitions and a shared framework. In particular, our paper “Path-specific Objectives for Safer Agent Incentives” (AAAI-2022) will go online in about month, and should serve to organize a handful of papers in AIS.
Rather than Popper, we’re probably more likely to go with Kuhn and call this “pre-paradigmatic.” Studying something without doing science experiments isn’t the real problem (history departments do fine, as does math, as do engineers designing something new), the problem is that we don’t have a convenient and successful way of packaging the problems and expected solutions (a paradigm).
That said, it’s not like people aren’t trying. Some papers that I think represent good (totally non-sciency) work are Quantilizers, Logical Induction, and Cooperative Inverse Reinforcement Learning. These are all from a while ago, but that’s because I picked things that have stood the test of time.
If you only want more “empirical” work (even though it’s still in simulation) you might be interested in Deep RL From Human Preferences, An Introduction to Circuits, or the MineRL Challenges (which now have winners).
Thanks for your reply. Popper-falsifiable does not mean experiment-based in my books. Math is falsifiable—you can present a counterexample, error in reasoning, a paradoxical result, etc. Similarly to history, you can often falsify certain claims by providing evidence against. But you can not falsify a field where every definition is hand-waved and nothing is specified in detail. I agree that AI Alignment has pre-paradigmic features as far as Kuhn goes. But Kuhn also says that pre-paradigmic science is rarely rigorous or true, even though it might produce some results that will lead to something interesting in the future.
“Every definition is hand-waved and nothing is specified in detail” is an unfair caricature.
Yeah, but also this is the sort of response that goes better with citations.
Like, people used to make a somewhat hand-wavy argument that AIs trained on goal X might become consequentialists which pursued goal Y, and gave the analogy of the time when humans ‘woke up’ inside of evolution, and now are optimizing for goals different from evolution’s goals, despite having ‘perfect training’ in some sense (and the ability to notice the existence of evolution, and its goals). Then eventually someone wrote Risks from Learned Optimization in Advanced Machine Learning Systems in a way that I think involves substantially less hand-waving and substantially more specification in detail.
Of course there are still parts that remain to be specified in detail—either because no one has written it up yet (Risks from Learned Optimization came from, in part, someone relatively new to the field saying “I don’t think this hand-wavy argument checks out”, looking into it a bunch, being convinced, and then writing it up in detail), or because we don’t know what we’re looking for yet. (We have a somewhat formal definition of ‘corrigiblity’, but is it the thing that we actually want in our AI designs? It’s not yet clear.)
In terms of trying to formulate rigorous and consistent definitions, a major goal of the Causal Incentives Working Group is to analyse features of different problems using consistent definitions and a shared framework. In particular, our paper “Path-specific Objectives for Safer Agent Incentives” (AAAI-2022) will go online in about month, and should serve to organize a handful of papers in AIS.
Thanks, this looks very good.