Hi there, my background is in AI research and recently I have discovered some AI Alignment communities centered around here. The more I read about AI Alignment, the more I have a feeling that the whole field is basically a fictional-world-building exercise.
Some problems I have noticed: The basic concepts (e.g. what are the basic properties of the AI that are being discussed) are left undefined. The questions answered are build on unrealistic premises about how AI systems might work. Mathiness—using vaguely defined mathematical terms to describe complex problems and then solving them with additional vaguely defined mathematical operations. Combination of mathematical thinking and hand-wavy reasoning that lead to preferred conclusions.
Maybe I am reading it wrong. How would you steelman the argument that AI Alignment is actually a rigorous field? Do you consider AI Alignment to be scientific? If so, how is it Popper-falsifiable?
Rather than Popper, we’re probably more likely to go with Kuhn and call this “pre-paradigmatic.” Studying something without doing science experiments isn’t the real problem (history departments do fine, as does math, as do engineers designing something new), the problem is that we don’t have a convenient and successful way of packaging the problems and expected solutions (a paradigm).
That said, it’s not like people aren’t trying. Some papers that I think represent good (totally non-sciency) work are Quantilizers, Logical Induction, and Cooperative Inverse Reinforcement Learning. These are all from a while ago, but that’s because I picked things that have stood the test of time.
If you only want more “empirical” work (even though it’s still in simulation) you might be interested in Deep RL From Human Preferences, An Introduction to Circuits, or the MineRL Challenges (which now have winners).
Thanks for your reply. Popper-falsifiable does not mean experiment-based in my books. Math is falsifiable—you can present a counterexample, error in reasoning, a paradoxical result, etc. Similarly to history, you can often falsify certain claims by providing evidence against. But you can not falsify a field where every definition is hand-waved and nothing is specified in detail. I agree that AI Alignment has pre-paradigmic features as far as Kuhn goes. But Kuhn also says that pre-paradigmic science is rarely rigorous or true, even though it might produce some results that will lead to something interesting in the future.
“Every definition is hand-waved and nothing is specified in detail” is an unfair caricature.
Yeah, but also this is the sort of response that goes better with citations.
Like, people used to make a somewhat hand-wavy argument that AIs trained on goal X might become consequentialists which pursued goal Y, and gave the analogy of the time when humans ‘woke up’ inside of evolution, and now are optimizing for goals different from evolution’s goals, despite having ‘perfect training’ in some sense (and the ability to notice the existence of evolution, and its goals). Then eventually someone wrote Risks from Learned Optimization in Advanced Machine Learning Systems in a way that I think involves substantially less hand-waving and substantially more specification in detail.
Of course there are still parts that remain to be specified in detail—either because no one has written it up yet (Risks from Learned Optimization came from, in part, someone relatively new to the field saying “I don’t think this hand-wavy argument checks out”, looking into it a bunch, being convinced, and then writing it up in detail), or because we don’t know what we’re looking for yet. (We have a somewhat formal definition of ‘corrigiblity’, but is it the thing that we actually want in our AI designs? It’s not yet clear.)
In terms of trying to formulate rigorous and consistent definitions, a major goal of the Causal Incentives Working Group is to analyse features of different problems using consistent definitions and a shared framework. In particular, our paper “Path-specific Objectives for Safer Agent Incentives” (AAAI-2022) will go online in about month, and should serve to organize a handful of papers in AIS.
Thanks, this looks very good.
The objects in question (super-intelligent AIs) don’t currently exist, so we don’t have access to real examples of them to study. One might still want to study them because it seems like there’s a high chance they will exist. So indirect access seems necessary, e.g. conceptual analysis, mathematics, hand-wavy reasoning (specifically, reasoning that’s hand-wavy about some things but tries to be non-hand-wavy about at least some other things), reasoning by analogy with non-super-intelligent things like humans, animals, evolution, or contemporary machine learning (on which we can do more rigorous reasoning and experiments). This is unfortunate but seems unavoidable. Do you see a way to study super-intelligent AI more rigorously or scientifically?
The field of AI alignment is definitely not a rigorous scientific field, but nor is it anything like a fictional-world-building exercise. It is a crash program to address an existential risk that appears to have a decent chance of happening, and soon in the timescale of civilization, let alone species.
By its very nature it should not be a scientific field in the Popperian sense. By the time we have any experimental data on how any artificial general superintelligence behaves, the field is irrelevant. If we could be sure that it wasn’t possible to happen soon, we could take more time to probe out the field and start the likely centuries-long process to make it more rigorous.
So I answer your question by rejecting it. You have presented a false dichotomy.
In my experience, doing something poorly takes more time than doing it properly.
There are multiple questions here: is AGI an existential threat?, and if so, how can we safely make and use AGI? Or if that is not possible, how can we prevent it being made?
There are strong arguments that the answer to the first question is yes. See, for example, everything that Eliezer has said on the subject. Many others agree; some disagree. Read and judge.
What can be done to avoid catastrophe? The recent dialogues with Eliezer posted here indicate that he has no confidence in most of the work that has been done on this. The people who are doing it presumably disagree. Since AGI has not yet been created, the work is necessarily theoretical. Evidence here consists of mathematical frameworks, arguments, and counterexamples.
I’d rather call it proto- not pseudo- science. Currently it’s alchemy before chemistry was a thing.
There is a real field somewhere adjacent to the discussions lead here and people are actively searching for it. AGI is coming , you can argue the timeline, but not the event (well, unless humanity destroys itself with something else first). And artificial systems we now have often shows unexpected and difficult to predict properties. So the task “how can we increase difficulty and capabilities of AI systems, possibly to the point of AGI, while simultaneously decreasing unpredictable and unexpected side effects” is perfectly reasonable.
The problem is that current understanding of the systems and entire framework is on the level of Ptolemy astronomy. A lot of things discussed at this moment will be discarded, but some grains of gold will become new science.
TBH I have a lot of MAJOR questions to the current discourse, it’s plagued by misunderstanding of what and how is possible in artificial intelligence systems, but I don’t think it should stop. The only way we can find the solution is by working on it, even if 99% of the work will be meaningless in the end.
There is a huge diversity in posts on AI alignment on this forum. I’d agree that some of them are pseudo-scientific, but many more posts fall in one of the following categories:
authors follow the scientific method of some discipline, or use multidisciplinary methods,
authors admit outright that they are in a somewhat pre-scientific state, i.e. they do not have a method/paradigm yet that they have any confidence in, or
authors are talking about their gut feelings of what might be true, and again freely admit this
Arguably, posts of type 2 and 3 above are not scientific, but as they do not pretend to be, we can hardly call them pseudo-scientific.
That being said, this forum is arguably a community, but its participants do not cohere into anything as self-consistent as a single scientific or even pseudo-scientific field.
In a scientific or pseudo-scientific field, the participants would at least agree somewhat on what the basic questions and methods are, and would agree somewhat on which main questions are open and which have been closed. On this forum, there is no such agreement. Notably, there are plenty of people here who make a big deal out of distrusting not just their own paradigms, but also those used by everybody else, including of course those used by ‘mainstream’ AI research.
If there is any internally coherent field this forum resembles, it is the field of philosophy, where you can score points by claiming to have a superior lack of knowledge, compared to all these other deep thinkers.
It’s a mixed bag. A lot of near term work is scientific, in that theories are proposed and experiments run to test them, but from what I can tell that work is also incredibly myopic and specific to the details of present day algorithms and whether any of it will generalize to systems further down the road is exceedingly unclear.
The early writings of Bostom and Yudkowsky I would classify as a mix of scientifically informed futurology and philosophy. As with science fiction, they are laying out what might happen. There is no science of psychohistory and while there are better and worse ways of forecasting the future (see “Superforecasting”) when it comes to forecasting how future technology will play out it’s especially impossible because future technology depends on knowledge we by definition don’t have right now. Still, the work has value even if it is not scientific, by alerting us to what might happen. It is scientifically informed because at the very least the futures they describe don’t violate any laws of physics. That sort of futurology work I think is very valubale because it explores the landscape of possible futures so we can identify the futures we don’t want so we we can takes steps to avoid those futures, even if the probability of any given future scenario is not clear.
A lot of the other work is pre-paradigmatic, as others have mentioned, but that doesn’t make it pseudoscience. Falsifiability is the key to demarcation. The work that borders on pseudoscience revolves heavily around the construction of what I call “free floating” systems. These are theoretical systems that are not tied into existing scientific theory (examples: laws of physics, theory of evolution, theories of cognition, etc) and also not grounded in enough detail that we can test whether the ideas / theories are useful/correct right now. They aren’t easily falsifiable. These free-floating sets of ideas tend to be hard for outsiders to learn since they involve a lot of specialized jargon and because sorting wheat from chaffe is hard because they don’t bother to subject their work to the rigors of peer review and publication in conferences / journals, which provide valuable signals to outsiders as to what is good or bad (instead we end up with a huge lists of Alignment Forum posts and other blog posts and PDFs with no easy way of figuring out what is worth reading). Some of this type of work blends into abstract mathematics. Safety frameworks like iterated distillation & debate, iterated amplification, and a lot of the MIRI work on self-modifying agents seem pretty free-floating to me (some of these ideas may be testable in some sort of absurdly simple toy environment today, but what these toy models tell us about more general scenarios is hard to say without a more general theory). A lot of the futurology stuff is also free floating (a hallmark of free floating stuff is zany large concept maps like here). These free floating things are not worthless but they also aren’t scientific.
Finally, there’s much that is philosophy. First, of course, there’s debates about ethics. Secondly there’s debates about how to define basic terms that are heavily used like intelligence, general vs narrow intelligence, information, explanation, knowledge, and understanding.
For the two to be similar, there needs to be an equivalent to the laws of physics. Then the cranks would be the people who are ignoring them. But, despite the expenditure of a lot of effort, no specific laws of AGI have been found .
(Of course, AGI is subject to the same general laws as any form of computation).
It is your opinion that despite the expenditure of a lot of effort, no specific laws of AGI have been found. This opinion is common on this forum, it puts you in what could be called the ‘pre-paradigmatic’ camp.
My opinion is that the laws of AGI are the general laws of any form of computation (that we can physically implement), with some extreme values filled in. See my original comment. Plenty of useful work has been done based on this paradigm.
Maybe it’s common now.
During the high rationalist era, early 2010s, there was supposed to be a theory of AGI based on rationality. The problem was that ideal rationality is uncomputable, so that approach would involve going against what is already known about computation, and therefore crankish. (And the claim that any AI is non ideally rational, whilst defensible for some values of non ideallyrational, is not useful, since there are many ways of being non-ideal).
I am not familiar with the specific rationalist theory of AGI developed in the high rationalist era of the early 2010s. I am not a rationalist, but I do like histories of ideas, so I am delighted to learn that such a thing as the high rationalist era of the early 2010s even exists.
If I were to learn more about the actual theory, I suspect that you and I would end up agreeing that the rationalist theory of AGI developed in the high rationalist era was crankish.
Yes. I was trying to avoid the downvote demon by hinting quietly.
PS looks like he winged me.
I agree, I wouldn’t consider AI alignment to be scientific either. How is it a “problem” though?