I often find myself disagreeing with most of the things I read about AI alignment. The closest I probably get to accepting a Berkely-rationalism or Bostrom-inspired take on AI is something like Nintil’s essay on the subject. But even that, to me, seems rather extreme, and I think most people that treat AI alignment as a job would view it as too unconcerned a take on the subject.
This might boil down to a reasoning error on my end, but:
I know a lot of people that seem unconcerned about the subject. Including people working in ML with an understanding of the field much better than mine, and people with an ability to reason conceptually much better than mine, and people at the intersection of those two groups. Including some of my favorite authors and researchers.
And, I know a lot of people that seem scared to death about the subject. Including people working in ML with an understanding of the field much better than mine, and people with an ability to reason conceptually much better than mine, and people at the intersection of those two groups. Including some of my favorite authors and researchers.
So I came to think that there might be some generators of disagreement around the subject that are a bit more fundamental than simple engineering questions about efficiency and scaling. After reading nintil’s (linked above) and VKRs most recent essays on the subject, I think I can finally formulate what those might be.
I—Accuracy Of Quantifiable Information
A very good way to classify information is whether or not that information is quantifiable and currently being quantified.
Yearly statistics about crime rates in Brabant, Brussels is a quantifiable piece of information. The “vibe” I get when walking through Brabant, Brussels, is not a quantifiable piece of information, it’s something internal to me, I may use it to derive the same action or world-model update I would from crime statistics (e.g. this place seems unsafe, I’ll stay away), but it is not the same thing.
But, you say, you could describe the “vibe” you get from walking around there. And I agree, but I couldn’t do this perfectly, I could do it better than most, and some could do it better than me, and it’s hard for me to quantify how meaningful that description is in any way.
But, someone else says, the crime statistics don’t reflect reality, under-reporting, fake reports, and the fuzzy details that get shoved aside to fit it into a neat category lead to information loss.
The accuracy and type of information that can be quantified changes as technology progresses. In the 17th century, it was hard to quantify a view, one could paint it, but that wouldn’t be very accurate. In the 19th century cameras existed, but they still couldn’t capture the richness of the eye. In the 20th century, they could record video, but it still lacked the kind of depth a human walking around could get. In the 21st century, we can generate ever better representation of landscape changing in time using, e.g, drones, flying around and taking videos from different perspective, focusing on places where most change seems to be happening.
I think a large point of discontent between people is how accurate this quantifiable information is. At one extreme end someone might say “all science is bullshit, all news is fake, words can’t describe reality, maps are not territories” — That’s the crazy hippie hooked up on shrooms and meditation search for god/enlightenment/absolute-meaning. At the other extreme some might say “science can describe any process better than you or I could by interacting with it, filtering news will generate a picture of reality that’s more accurate than what we’d derive from our limited n=1 observations, words usually describe things pretty well, maps can be made to represent the territory with such accuracy that the distinction becomes pedantic to make” — That’s the crazy mathematician hooked upon on amphetamines and niche inconsistencies in book-length proofs, trying to get a fields medal.
Basically none lives at the extremes, and I think it’s hard to say where everyone is, because everyone has their own bias as to which type of quantifiable information is or isn’t accurate.
But, a large premise of why AI would be dangerous, is that it can build much better models of the world, without a lot of physical presence, via access to giant repositories of quantified information (i.e. the internet). So how accurate you think this information is matters, a lot.
II—The Value Of Quantifiable Information
Figuring out protein structures is very valuable, and it can be done using quantifiable information, AIs are much better at it than people. Figuring out how to massage away tightness causing pain in the neck is very valuable, and it can be done using fuzzy tactile information, trained masseurs are much better at it than massage chairs (presumably, even than massage chairs running a very fancy AI).
The question then arises of how valuable quantifiable information is. Or, more importantly, for what it is valuable. Some things, such as solving physics equations, are obviously solved by quantifiable information, other things, such as creating meaningful relationships with people, aren’t. The fuzzier you get with your goal, the less valuable quantifiable information is.
But there is an awkward problem space for which we don’t know the value of quantifiable information. This problem space includes many engineering problems as they related to creating things in the real world, It also includes most social “problems”, from the easy “how do I get this person to like me” to the hard “how do I create consensus among a nation of 300 million people”.
To think that quantifiable information has near-infinite value in these grey areas looks something like:
One day the AGI is throwing cupcakes at a puppy in a very precisely temperature-controlled room. A few days later, a civil war breaks out in Brazil. Then 2 million people die of an unusually nasty flu, and also it’s mostly the 2 million people who are best at handling emergencies but that won’t be obvious for a while, because of course first responders are exposed more than most. At some point there’s a Buzzfeed article on how, through a series of surprising accidents, a puppy-cupcake meme triggered the civil war in Brazil (Wentworth, 2022)
To think that quantifiable information has near-zero value in these grey area looks something like:
Drop those fucking book nerd, grab a coffee with the potential customer and sell our product. I’m the best fucking salesperson in this entire company, and I’ll tell it to you straight, all those hundreds of books on economics, psychology and sociology, utter fucking garbage. If they’d be worth anythin’ those knowitall would be making millions, but they ain’t. Midwit assholes like myself are the best at this, because we just talk to people, we use the monkey brain, we encourage them to use theirs, it’s 2% about the words you use, 20% about the posture you have, and 200% about what your eyes tell them. And if you’re thinking “those numbers ain’t adding up” then you’re missing the fucking point.
III—The Value Of Thinking
The other important question, regardless of what information can or will be quantified in the future, and how valuable or accurate it will be, is how much thinking can help you derive value from that information with better processing.
If most of us are 99% efficient at building relevant models from available information, and AI isn’t scary at all, because 1% extra efficiency is pointless.
If most of us are 0.0…99% efficient, then AI becomes a very scary idea.
I won’t object to the obvious fact that most people are inefficient at modeling most systems. The question is if this happens because of inherent limitations to cognition, or because most people have no reason for going through the arduous process of doing so for most systems. And, maybe more importantly, if there are a lot of high-ROI systems in need of modeling, or if our capacity to model collectively exceeded or closely matches the ROI we can get from available information.
Here, again, I think the intuitionist-vs-conceptualizer dividing line is very obvious. An intuitionist skims an article and 3 reddit comments and thinks they understand quantum physics about as well as anybody, a conceptualizer thinks that understanding quantum physics is reading every single article in the field and being able to properly grok every single equation, ideally being able to solve them on your own, without the author pointing you to the solution.
I’m a bit fuzzy about listing this one, because it seems to me that most AI alignment people actually fall towards the intuitionist side of the spectrum, i.e. they are the kind of wannabe polymaths that think they can skim through a field with 200 years worth of research and “get” most of the value without dedicating their lives to it. While most “normal” people fall towards the conceptualizer side of the spectrum, in that they are very afraid to “think for themselves” and go through raw data, even if they want to challenge or pursue an idea, they will find someone already challenging or pursuing that idea to do the thinking for them.
Indeed, the whole field of AI alignment as we know it today was formed by very capable intuitionists joining sparse ideas and information from many, at the time not-that-related fields, rather than by conceptualizers that dedicated their lives to studying the subject.
It might be that what I call an “intuitionist” here, i.e. someone that sparsely parses information and assumes they’ve reached a near-perfect understanding of available knowledge, can have two modes of thinking:
Information mainly consists of different framings repeating the same core ideas. People are afraid to interact with it but, once you put your mind to it, it’s not that hard. It’s silly to think people dedicate their lives to single sub-sub-sub-fields of study, all they are doing is spinning in circle, you need to have the broad picture.
Information is so hard to process that most people, even ones dedicating their lives to studying a subject, are too dumb to do it. I might be smart enough to figure this out better than them, but even I am probably missing out on most of the derivable values, and I couldn’t fathom to think about parsing all information available to us in something even vaguely efficient compare to the smartest human, or, worst, the smartest thing with human-like thinking abilities.
IV—The Intelligence Of Systems
We aren’t agents acting independently in the world, we’re part of a super-organism containing 8 billion brains, all working towards different goals, but with some shared objectives.
I haven’t heard many try to contemplate the intelligence of AIs against the intelligence of systems, and presumably, that’s because of an assumption that systems, compared to individuals, aren’t that smart. That systems derive more so as a compromise for distribution mechanical, rather than thinking work.
The two extremes here might look something like this:
The first nuclear fission bomb was invented by Oppenheimer and Von Neumann, with the former doing 80% of the work. The US government provided the offices and the manual labor, and some other bright physicists help hash out the details.
and
The first nuclear fission bomb was invented by the collective intelligence of most of Europe, Canada, the US, and millions of other people around the world. They each contributed their insight in hard-to-see ways toward the completion of the project. It might be fair to say that Oppenheimer did more than 1 / 5-billionths of the work, but this contribution might be equal to that of a steel-plant laborer in Kentucky that figured out a way for this factory to increase production by 1%, which eventually tricked down the supply chain. Oppenheimer being clever is about as critical to the success as that random laborer and millions of other “side characters” like him being clever.
Nobody takes either of these extremes literally, but people do vary widely on this axis, in part, this can be seen at a political level in how people want dessert to be shared.
How well the intelligence of agents adds up is a very relevant question for AI, even assuming all of the other 3 assumptions turn out to be in favor of the “scary AGI scenario that can maybe only be prevented with alignment research”.
— I’m not a fan of quantifying intelligence but for the sake of argument let’s do it —
If an AI system becomes scary once it’s a few times as “smart” as the smartest man along certain relevant axes, then we already have plenty of reasons to be scared, because it’s been beating people in solo thinking competitions left and right.
If, on the other hand, an AI gets scary when it’s dozens of times smarter than the added-up intelligence of all members of the US government, or of Google, or of the Chinese army, then that’s a bar requiring hundreds of thousands of times the amount of compute we have right now to cross.
Also, how smart you think human-based systems are is an important question in alignment, because if you do think of a government as a superintelligence, then it’s a pretty good example of how far alignment work can bring you.
V—Conclusion
I don’t mean to convince anyone to rethink their position on the above 4 axes, nor do I think this ontology is set in stone, it’s a random way to conceptualize the space of unstated axioms and I have no reason to think it comprehends everything nor that it’s better than any alternatives.
For me though, it was a very interesting framework because it helped me see how, with just a few slightly different and mainly arbitrarily-chosen priors on these topics, I might go from my current position to Eliezer Yudkowsky levels of doom saying, or to Steven Pinker level of not-caring. Whereas before both of their points of view seemed impenetrable to me.
Personally, I find the topic of debating how fast AI will advance and how “influential” and “agentic” it will be to be a red-herring in the safety debate. I think the more important question is how useful AI alignment research is.
In the 40s one could argue that “nuclear fission alignment” research might lead to shielding or controlled explosions that could help guard against or surgically direct the effects of fission bombs. The fact that this view would be completely misguided doesn’t seem obvious, given how much wo-wo people, even smart ones, attributed to harnessing the power of nuclei.
In the 40s, one could also argue that better diplomatic relationships and well-stocked bunkers deep underground could, combined, guard fairly well against any “x risk” from nuclear war. The fact that this is now taken for granted, and nuclear war is hardly viewed as an existential risk, also seems non-obvious at the time, where I can see even a mildly progress-minded engineer pointing out how in 20 more years of progress the explosions could be powerful enough to usurp Earth’s very crust. How the ease of creating bombs will place them in the hand of every nation, and then even in those of small rogue actors, making diplomacy impossible. And, how diplomacy is a tool for an age of ancient weapons, and won’t apply to governments and armies wielding powers at this scale.
I often see AI alignment research as being the equivalent of “nuclear fission alignment”. And I see the equivalent of diplomacy and bunkers as work that’s not even registering as “AI safety”: constructing less legible and more human systems of governance (e.g. restorative justice), re-writing security-critical applications in Rust or functional languages, and decoupling security-critical software from the internet and from other security-critical software.
Again, I don’t claim I can convince you of this view or that I’ve said anything that could constitute proof here. But I think this view becomes “obvious” if your priors are set at certain points along those 4 axes, as those a view that only AI alignment research might stand a chance of staying doom. So figuring out where your priors are is important in so far as I’d direct your decisions in terms of funding, personal precautions and overall outlook on the negative change AI might bring.
On the whole, though, I am fairly optimistic about the bunkers and diplomacy track, that is to say, treating AI as a hard engineering safety problem rather than a whole different magisterium. But, were such a thing as “nuclear fission alignment” research to exist and be popular in the 40s, even if it would have been misguided, it may have well led to us having better fission reactors and maybe even constructing fusion reactors sooner than we (hopefully) will.
I don’t buy this framework where we split information into “quantifiable” and “non-quantifiable” categories. Information is just information. If it turns out general AIs can solve their problem best using myriad hard coded results from psychology studies then we will figure out how to get them to do that. If it turns out an AI can solve its problems best by using a raw camcorder and a series of fuzzy evolved heuristics we will figure out how to get them to do that. Modern ML researchers are already capable of teaching AIs how to
understand[edit: do/learn how to do] things humans have only a very rudimentary theoretical foundation for, and which humans have a very hard time describing the fundamentals of in words, like language and vision. Even if they weren’t, the brain doing something means it’s possible to do, and so people will eventually figure out how to build a machine that is making whatever specific inferences you’re concerned with when you reference “non-quantifiable information”.I do in fact think, in the limit of ThinkOomph, this is not a real problem. But… if the accuracy of online information is actually a literal bottleneck in preventing AIs from learning what they need to take over the world, couldn’t an AI just pay someone to gather raw data for them? Sure, the AI has to trust that person, but an intelligent system could reasonably find a way to ensure their trustworthiness. What specifically do you think an AI has to learn outside the internet to build nanomachines?
Again, this is a really bad way of thinking about the problem. Massage chairs are bad because they don’t respond to cues and are too mechanically unsophisticated to make precise movements, not because they’re more “quantitative”. Everything is numbers. If a computer can understand language and video feeds, I don’t know why AIs would be unable to massage well, if that computer actually had access to a robot that could do the same things humans can. The brain is in fact doing something like automatically decoding human emotional cues, ergo it is possible to develop an ML system that does the same thing, even if it turns out that’s harder than getting a fields medal for some reason.
Even if this were true (and I really don’t think it is), presumably an AI would not get tired and stop finetuning, in parallel, its expert-physicist-model or expert-marketing-model in the same way a human would decide not to learn about geopolitics, due to constraints on time or energy. So I don’t see why it’s relevant.
Read this post before continuing.
I don’t think you really quite understand what automating intelligence means. In order to create a new person a woman has to go through 9 months of pregnancy after some very complicated and expensive social rituals, in a manner closely regulated by Earth Governance. An AI can run
fork(); exec();
, until it runs out of the hardware to do so. Generally training them is the hard part; after that, you can scale them up pretty much at will.If necessary to wrest control of the future, there will be orders of magnitude more “artificial intelligences” than there are humans. AIs can speed along the production process for GPUs. More importantly, an AI doesn’t have to recruit other AIs with different goals than itself. They can coordinate on things absurdly more effectively than people can, because people have built-in conflicts of interest with they must monitor and control each other for. AIs do not need to be smarter than people in order to beat them, there just needs to be enough of them. They will be able to cooperate with one another in a way that we can’t, because humans have no way of editing our own code, or cloning ourselves, or analyzing the behavior of each other in a sandbox environment.
But this isn’t my real objection. My real objection is that a superintelligent AI is just unlikely to remain (for very long) the kind of thing you can analogize as a group of people. One giant supercomputer running a number of specialized finetune copies of a given seed AI isn’t accurately described as a one-agent or multi-agent system. The AI gathers more GPU power and then autoscales… It shuts parts of itself down, modifies them, and reallocates portion of its total thought-power at will of a grand Overseer. It’s not really constrained to a human body like a person with a distinct identity and objective from other people. It’s just this giant optimizer trying to make Number Go Up.
To be honest, I vaguely suspect a latent pattern here. You seem to be applying a sort of thinking typified by Hero Licensing to intelligent systems.
I’m psychologizing and you’re not supposed to do that, but I suspect that if you introspect a little bit you’ll realize that you’re assuming Civilizational adequacy in things like, say, cybersecurity, not because you’ve actually inspected the field of cybersecurity for economic efficiency in the face of unexpected superintelligent AI, but because there are all these important Cybersecurity People out there doing important seeming stuff and to reveal that you think they’re not actually doing it that well would be a Status Grab. Likewise you’ve got an instinctual aversion to the idea that ML researchers could create something to outcompete existing institutions; doing otherwise might be construed as Immodest, or a critique of high status people. Thus why you seem to be a smart person who is also nevertheless saying things like “‘one’ AI couldn’t necessarily beat ‘eight billion’ humans”, which wouldn’t make any sense at all if it weren’t being derived from a sub-instinct that says “a small group of AI researchers shouldn’t be allowed to claim that their product can ‘beat’ eight billion humans”.
Well, they are capable of doing them. Humans can do them without understanding them, so I don’t see why Ais would need to understand them.
Sure, I misspoke. Capable of doing them is the important part tho
I don’t disagree on any fundamental level, but don’t underestimate the entropy accumulation problem in any kind of self improvement, including scaling. an AI that has not solved some degree of distributed network inter-being alignment will most likely initially break if scaled in a way far outside its training, and the learning process to correct this doesn’t have to be easy. being duplicate does not make game theory trivial when you are a very complex agent who can make different mistakes in different contexts. I mean it certainly helps and it probably wouldn’t be good for humanity for this to happen but I don’t think scaling up has the same kind of terrifying danger that self hyper distillation ‘foom inwards’ does. because the latter implies very strong denoising at a level we haven’t seen from current machine learning, and as far as I can tell, eliezer’s predictions are all based on some sort of self hyper distillation improvement process. I think your model is solid here, to be clear, I’m not disagreeing about any of your main points at all.
I’m not arguing you should take any position on those axi, I am just suggesting them as potential axi.
I think that falling on one extreme of the spectrum is equivalent to thinking the spectrum doesn’t exist—so yes, I guess people that are very aligned with a MIRI style position on AI wouldn’t even find the spectrum valid or useful. Much like, say, an atheist wouldn’t find a “how much you believe in the power of prayer” spectrum insightful or useful. This was not something I considered while originally writing this, but even with it in mind now, I can’t think of any way I could address it.
In-so-far as your object level arguments against the spectrums I present being valid and/or of one extreme being nonsensical, I can’t say that, right now, I could say anything of much value on those topics that you haven’t probably already considered yourself.
To address your later point, I doubt I fall into that particular fallacy. Rather, I’d say, I’m on the opposite spectrum where I’d consider most people and institutions to be beyond incompetent.
Hence why I’ve reached the conclusion that improving on rationally legible metrics seems low ROI, because otherwise rationlandia would have arisen and ushered prosperity and unimaginable power in a seemingly dumb world.
But I think that’s neither here nor there, as I said, I’m really not trying to argue my view here is correct, I’m trying to figure out why wide differences in view in both directions exist.
I don’t understand what you mean? I’m an atheist and am clearly at the bottom of the spectrum. If you disagree with my objections to your axis, can you e.g. clarify what you mean when you say some datum is “non-quantifiable” and why that would prevent an AI from being able to use it decisively better than humans?
There are several things at the extreme of non-quantifiable:
There’s “data” which can be examined in so much detail by human senses (which are intertwined with our thinking) that it would be inefficient to extract even with SF-level machinery. I gave as an example being able to feel another persons muscles and the tension within (hence the massage chair, but I agree smart-massage-chairs aren’t that advanced so it’s a poor analogy). Maybe a better example is “what you can tell from looking into someone’s eyes”
There’s data that is interwound with our internal experience. So, for example, I can’t tell you the complex matrix of muscular tension I feel, but I can analyze my body and almost subconsciously decide “I need to stretch my left leg”. Similarly, I might not be able to tell you what the perfect sauce is for me or what patterns of activity it triggers in my brain, or how its molecules bind to my taste buds, but I can keep tasting the sauce and adding stuff and conclude “voila, this is perfect”
There are things beyond data that one can never quantify, like revelations from god or querying the global consciousness or whatever
I myself am pretty convinced there are a lot of things falling under <1> and <2> that are practically impossible to quantify (not fundamentally or theoretically impossible), even provided 1000x better camera, piezo, etc sensors and even provided 0.x nm transistors making perfect use of all 3 dimensions in their packing (so, something like 1000x better GPUs).
I think <3> is false and mainly make fun of the people that believe in it (I’ve taken enough psychedelics not to be able to say this conclusively, but still). However, I still think it will be a generator of disagreement with AI alignment for the vast majority of people.
I can see very good arguments that both 1 and 2 are uncritical and not that hard to quantify, and obviously that 3 is a giant hoax. Alas, my positions have remained unchanged on those, hence why I said a discussion around them may be unproductive.
I think it’s pretty clear that more of us should pay attention to generators of generators of disagreement with AI alignment, the generation process itself is worth consideration. It’s really rare to see solid arguments against AI safety such as these, as opposed to total disinterest or vague thought processes, and the fact that it’s been that way for 10-20 years is no guarantee that it will stay that way for even one more year.
I like this way of thinking about how quickly AI will grow smarter, and how much of the world will be amenable to its methods. Is understanding natural language sufficient to take over the world? I would argue yes, but my NLP professor disagrees — he thinks physical embodiment and the accompanying social cues would be very important for achieving superintelligence.
Your first two points make a related argument: that ML requires lots of high quality data, and that our data might not be high quality, or not in the areas it needs to be. A similar question would be whether AI can generalize to performing the various novel long-term planning challenges of a CEO or politician solely by training on short time-horizon tasks like next token prediction. Again, I take seriously the possibility that they could, but it doesn’t seem inconsistent with our evidence to believe that deep learning will only succeed in domains where we have lots of training data and rapid feedback loops.