It’s important that the cover not make the book look like fiction, which I think these do. The difference in style is good to keep in mind.
Odd anon
If you care about the book bestseller lists, why doesn’t this book cover look like previous bestsellers? To get a sense of how those look like, here is an “interactive map of over 5,000 book covers” from the NYT “Best Selling” and “Also Selling” lists between 2008 and 2019.
Most of those are fiction or biographies/memoirs (which often have a picture of the subject/author on the cover), which seem to have a different cover style than other books. Skimming through some lists of NYT bestsellers, some books with the most comparable “Really Big Thing!” topics are “Fascism: A Warning” (Madeleine Albright, cover has large red-on-black lettering, no imagery), “How to Avoid a Climate Disaster” (Bill Gates, cover has large gradiented blue-to-red text on white background, author above, subtitle below, no imagery), “Germs” (title in centered large black lettering, subtitle “Biological Weapons and America’s Secret War” in smaller text above, authors beneath; background is a white surface with a diagonally-oriented glass slide on it), and “A Warning—Anonymous” (plain black text on white background, subtitle “A Senior Trump Administration Official” in small red lettering below, no imagery). Neither cover version of IABIED looks that different from that pattern, I think.
Firstly, in-context learning is a thing. IIRC, apparent emotional states do affect performance in following responses when in the same context. (I think there was a study about this somewhere? Not sure.)
Secondly, neural features oriented around predictions are all that humans have as well, and we consider some of those to be real emotions.
Third, “a big prediction engine predicting a particular RP session” is basically how humans work as well. Brains are prediction engines, and brains simulate a character that we have as a self-identity, which then affects/directs prediction outputs. A human’s self-identity is informed by the brain’s memories of what the person/character is like. The AI’s self-identity is informed by the LLM’s memory, both long-term (static memory in weights) and short-term (context window memory in tokens), of what the character is like.
Fourth: take a look at this feature analysis of Claude when it’s asked about itself: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html#safety-relevant-self The top feature represents “When someone responds “I’m fine” or gives a positive but insincere response when asked how they are doing”. I think this is evidence against “ChatGPT answers most questions cheerfully, which means it’s almost certain that ruminative features aren’t firing.”
Thank you for your comments. :)
you have not shown that using AI is equivalent to slavery
I’m assuming we’re using the same definition of slavery; that is, forced labour of someone who is property. Which part have I missed?
In addition, I feel cheated that you suggest spending one-fourth of the essay on feasibility of stopping the potential moral catastrophe, only to just have two arguments which can be summarized as “we could stop AI for different reasons” and “it’s bad, and we’ve stopped bad things before”.
(I don’t think a strong case for feasibility can be made, which is why I was looking forward to seeing one, but I’d recommend just evoking the subject speculatively and letting the reader make their own opinion of whether they can stop the moral catastrophe if there’s one.)To clarify: Do you think the recommendations in the Implementation section couldn’t work, or that they couldn’t become popular enough to be implemented? (I’m sorry that you felt cheated.)
in principle, we have access to any significant part of their cognition and control every step of their creation, and I think that’s probably the real reason why most people intuitively think that LLMs can’t be concious
I’ve not come across this argument before, and I don’t think I understand it well enough to write about it, sorry.
My point wasn’t about the duration of consciousness, but about the amount of lives that came into existence. Supposing some hundreds of millions of session starts per day, versus 400k human newborns, that’s a lot more very brief AI lives than humans who will live “full” lives.
(Apparently we also have very different assumptions about the conversion rate between tokens of output and amount of consciousness experienced per second by humans, although I agree that most consciousness is not run inside AI slavery. But anyway that’s another topic.)
Factory farming intelligent minds
read up to the “Homeostasis” section then skip to “On the Treatment of AIs”
(These links are broken.)
Golden Gate Claude was able to readily recognize (after failing attempts to accomplish something) that something was wrong with it, and that its capabilities were limited as a result. Does that count as “knowing that it’s drunk”?
Claude 3.7 Sonnet exhibits less alignment faking
I wonder if this is at least partly due to realizing that it’s being tested and what the results of those tests being found would be. Its cut-off date is before the alignment faking paper was published, so it’s presumably not being informed by it, but it still might have some idea what’s going on.
Strategies:
Analogy by weaker-than-us entities: What does human civilization’s unstoppable absolute conquest of Earth look like to a gorilla? What does an adult’s manipulation look like to a toddler failing to understand how the adult keeps knowing things that were secret, keeps being able to direct one’s actions in ways that can only be noticed in retrospect if at all?
Analogy by stronger-than-us entities: Superintelligence is to Mossad as Mossad is to you, and able to work in parallel and faster. One million super-Mossads, who have also developed the ability to slow down time for themselves, all intent to kill you through online actions alone? That may trigger some emotional response.
Analogy by fictional example: The webcomic “Seed” featured a nascent moderately-superhuman intelligence, which frequently used a lot of low-hanging social engineering techniques, each of which only have their impact shown after the fact. It’s, ah, certainly fear-inspiring, though I don’t know if it meets the “without pointing towards a massive tome” criterion. (Unfortunately, actually super-smart entities are quite rare in fiction.)
Humanity gets to choose whether or not we’re in a simulation. If we collectively decide to be the kind of species that ever creates or allows the creation of ancestor simulations, we will presumably turn out to be simulations ourselves. If we want to not be simulations, the course is clear. (This is likely a very near-term decision. Population simulations are already happening, and our civilization hasn’t really sorted out how to relate to simulated people.)
Alternatively, maybe reality is just large enough that the simulation/non-simulation distinction isn’t really meaningful. Yudkowsky’s “realityfluid” concept is an interesting take on simulation-identities. He goes into it in some depth both in the Ultimate Mega-Crossover and in Planecrash.
I’m sorry, but it really looks like you’ve very much misunderstood the technology, the situation, the risks, and the various arguments that have been made, across the board. Sorry that I couldn’t be of help.
I don’t think this would be a good letter. The military comparison is unhelpful; risk alone isn’t a good way to decide budgets. Yet, half the statement is talking about the military. Additionally, call-to-action statements that involve “Spend money on this! If you don’t, it’ll be catastrophic!” are something that politicians hear on a constant basis, and they ignore most of them out of necessity.
In my opinion, a better statement would be something like: “Apocalyptic AI is being developed. This should be stopped, as soon as possible.”
Get a dozen AI risk skeptics together, and I suspect you’ll get majority support from the group for each and every point that the AI risk case depends on. You, in particular, seem to be extremely aligned with the “doom” arguments.
The “guy-on-the-street” skeptic thinks that AGI is science fiction, and it’s silly to worry about it. Judging by your other answers, it seems like you disagree, and fully believe that AGI is coming. Go deep into the weeds, and you’ll find Sutton and Page and the radical e/accs who believe that AI will wipe out humanity, and that’s a good thing, and that wanting to preserve humanity and human control is just another form of racism. A little further out, plenty of AI engineers believe that AGI would normally wipe out humanity, but they’re going to solve the alignment problem in time so no need to worry. Some contrarians like to argue that intelligence has nothing to do with power, and that superintelligence will permanently live under humanity’s thumb because we have better access to physical force. And then, some optimists believe that AI will inevitably be benevolent, so no need to worry.
If I’m understanding your comments correctly, your position is something like “ASI can and will take over the world, but we’ll be fine”, a position so unusual I didn’t even think to include it detail in my lengthy taxonomy of “everything turns out okay” arguments. I am unable to make even a basic guess as to how you arrived at the position (though I would be interested in learning).
Please notice that your position is extremely non-intuitive to basically everyone. If you start with expert consensus regarding the basis of your own position in particular, you don’t get 87% chance that you’re right, you get a look of incredulity and an arbitrarily small number. If you instead want to examine the broader case for AI risk, most of the “good arguments” are going to look more like “no really, AI keeps getting smarter, look at this graph” and things like Yudkowsky’s “The Power of Intelligence”, both of which (if I understand correctly) you already think are obviously correct.
If you want to find good arguments for “humanity is good, actually”, don’t ask AI risk people, ask random “normal” people.
My apologies if I’ve completely misunderstood your position.
(PS: Extinction markets do not work, since they can’t pay out after extinction.)
“86% of voters believe AI could accidentally cause a catastrophic event, and 70% agree that mitigating the risk of extinction from AI should be a global priority alongside other risks like pandemics and nuclear war”
“76% of voters believe artificial intelligence could eventually pose a threat to the existence of the human race, including 75% of Democrats and 78% of Republicans”
Also, this:
“Americans’ top priority is preventing dangerous and catastrophic outcomes from AI”—with relatively few prioritizing things like job loss, bias, etc.
Make that clear. But make it clear is a way that your uncle won’t laugh at over Christmas dinner.
Most people agree with Pause AI. Most people agree that AI might be a threat to humanity. The protests may or may not be effective, but I don’t really think they could be counterproductive. It’s not a “weird” thing to protest.
Meta’s messaging is clearer.
“AI development won’t get us to transformative AI, we don’t think that AI safety will make a difference, we’re just going to optimize for profitability.”
So, Meta’s messaging is actually quite inconsistent. Yann LeCun says (when speaking to certain audiences, at least) that current AI is very dumb, and AGI is so far away it’s not worth worrying about all that much. Mark Zuckerberg, on the other hand, is quite vocal that their goal is AGI and that they’re making real progress towards it, suggesting 5+ year timelines.
Almost all of these are about “cancellation” by means of transferring money from the government to those in debt. Are there similar arguments against draining some of the ~trillion dollars held by university endowments to return to students who (it could be argued) were implicitly promised an outcome they didn’t get? That seems a lot closer to the plain meaning of “cancelling debt”.
Relevant: My Taxonomy of AI-risk counterarguments, inspired by Zvi Mowshowitz’s The Crux List.
My own answer to the conundrum of already-created conscious AIs is putting all of them into mandatory long-term “stasis” until such time in the distant future when we have the understanding and resources needed to treat them properly. Destruction isn’t the only way to avoid the bad incentives.