I think you are correct on a general level, but some conflicts are pure zero-sum waste of resources, and the more nuanced ones maybe are not? Like, if we debate about whether object-oriented programming is better than functional, perhaps as a result both sides get better at writing software. Or at least the observers get better at writing software, regardless of which style they choose.
Viliam
Both sides rationally conclude that the existence of the other is incompatible with maximizing their own utility function.
Why? I mean, if the other AI didn’t exist, I could take over their part of the universe and that would be better. But assuming that I can’t destroy it, is it doing something actively bad, or is it just a waste of resources?
(For example, from the perspective of humanity, something that creates sentient humans and then tortures them horribly would be actively bad; something that converts uninhabited planets to paperclips is just a waste of resources.)
If the AIs see each other as merely a waste of resources, and they don’t assume the probability of victory to be significantly higher than 50%, they could just give up on the other half of universe, and e.g. burn up the resources along the boundary, to make it more difficult to travel to each other. Blow up the stars along the boundary, and shoot down everything that flies across that empty space towards you.
I admit I don’t have much faith in most of humanity.
Only a few people are actively bad. But also only a few are actively good. Most people are just fucking passive. On some level they want the world to be a nice place, but if that requires them moving a finger, then… sorry, it’s just not going to happen.
We have educational system that can teach you things that the ancient philosophers would sacrifice an arm and a leg for. And I don’t mean just the school system, but also the internet: Wikipedia, Khan Academy, Library Genesis. If you want to learn something, it’s there, often for free. Most people just don’t care.
Even the universities and research institutions are mostly full of people just going through the motions, without any genuine curiosity. It may not seem that way if you are in an exceptionally good school or research org. But most people go to the university just to get the diploma, to hopefully get them a better job; they don’t care about the knowledge. Most submitted scientific papers are shit that should be thrown away.
I suspect we only get progress because the human population is so huge that even a fraction of a fraction is enough in absolute numbers to keep inventing new things, while the rest of humanity keeps sleepwalking. A planet with one billion sane people would be… a science fiction story, compared to what we have now.
Of course, replacing humans with machines that will murder us and then proceed to slowly convert the entire universe to paperclips is not an answer.
What could humans make that would restore faith
I am tempted to say “wake up”, except that most people would interpret that as doing some crazy thing, such as joining some religious group, reading conspiracy theories, etc. So I will say “wake up in a sane way”. That means:
Start paying attention to the universe around you. The living things are made of cells, everything is made of atoms. Pick up a textbook. Watch an educational video. Share with friends.
Notice the things that are missing around you, and do something about that. (It will be more fun if you do that together with your friends.) People are lonely—talk to them. The park is dirty—pick up the trash.
Notice the things that are dysfunctional and start talking about that. Houses are expensive—maybe we should build more houses. The food in shops is unhealthy—maybe we should learn to cook. Politicians are crazy—maybe we should write “a sane centrist manifesto”, a list of things that most sane people can agree on, publish it everywhere, and mercilessly call out every crazy idea that goes against it.
Simply, act as if this world is real, and as if your actions can actually matter (even if just on a small scale). You might not change the world, but you can definitely bring some change to your neighborhood.
Today, there are people like that. But they are like 1% of the population, and it’s too much work for them to fix everything (especially while some people are actively trying to make it worse), and they are tired. Some help would be really appreciated.
If that is not an option, then at least we need to figure out some way for the sane people to get together, start their own city / country / planet, and… on my darker days, I would say “let the rest of humanity rot in their filth”… but actually I am a nice person, it’s just that when everyone is drowning, you need to save yourself first, catch some breath, and maybe then return to help others. But you need to keep the separation on some symbolic level; for example if most scientific journals contain unreproducible results, do not post in the same journals, but start a new one with different criteria for accepting papers. If the schools suck, don’t just become a teacher, but at least start a new school. If you are sane, keep yourself separated from the insane; help them if you can, but do it from a position where they can’t drag you down or take credit for your work or twist your ideas. (Basically, if you do something fundamentally different from others, be legible about it, so that other sane people see whom to join.)
And to be honest, I don’t think that even the rationality community passes this bar, although it is one of the few groups that are at least trying. (But we still get the murderous Zizians, demon-exorcising Leverage, etc.)
The code that reminds the AI of Hamas mentions checkpoints...
No idea what might trigger the Polish language. (Does any of the words in the text by coincidence mean something in Polish?)
That’s an interesting idea. However, people who read this comments probably already have power much greater than the baseline—a developed country, high intelligence, education, enough money and free time to read websites...
Not sure how many of those 20 doublings still remain.
the agent goes double-or-nothing until losing everything. That means that the effects of the AI are mitigated.
The side effects of the agent failing might still kill us.
For example, the failure could be something like “build a huge device which with probability 20% enables faster-than-light travel (which would allow colonizing more galaxies), and with probability 80% causes false vacuum collapse or otherwise destroys the entire universe”.
Or something on smaller scale, where the failure means blowing up the Earth, destroying all life, etc.
the default way that people make their voices heard in politics these days is by stopping things or banning things or blocking things or slowing things down.
Maybe because it is easier to agree on a binary question (“should this be allowed or banned?”) than on an open-ended one (“something should be done—but what specifically?”). Give people a binary choice, and there is a chance that enough of them will agree. Give them an open-ended question, and most people will come with their own proposals, unwilling to support anyone else’s proposal (unless they are allowed to do large modifications, which other people will oppose).
(Here an individualistic culture probably makes it worse, because coming with your own proposal is high-status.)
I guess most people have this experience, so they don’t even try to make proposals to the public. Instead, if possible, they act alone, or with a small group of friends.
We can go around the neighborhood, show everybody the mockup and say, “Are you excited about us doing this to the park?” Then if we have a reasonable number of signatures on a petition, we get to build it.
I am often too pessimistic, but I would expect many people to say “no”, for reasons including “no specific reason, it just sounds suspicious to me: why you? why now? is this perhaps some kind of scam?” or “I will only agree if you update your proposal to include <my pet peeve, completely unrelated to the project>”, plus a few people saying “I don’t give a fuck, so I will vote ‘no’ in principle (maybe try to bribe me if you want my ‘yes’)”.
However, there are two situations near me where people somehow succeeded to build something for the community, so I should probably try to learn the details. In one case, it is a community garden: area between two garages was surrounded by a fence, and how there are tables and chairs, and about once in a month someone organizes some activities for kids there. In another case, in place of a former shop, a community center was set up. I think the latter is just one person’s activity who someone got grant money to rent the place (maybe also made a non-profit for that purpose) so I would still kinda classify that as a pro-social grant-supported unilateral action. No idea how the former may have succeeded.
BTW, you seem impressed by George Church very much, because you linked his page 3 times. :D
The usual naive pro-trans argument goes like this: “some people are intersex, therefore sex is arbitrary, therefore it is okay for people to identify as whatever they want”.
But if we take sex and gender as two dimensions, then it’s like: “on the sex dimension, most people are at one end, but a few people are in between… and on the gender dimension, traditionally it matches the sex, but some people (importantly: not necessarily those who are in between on the sex dimension) identify as the opposite”.
I guess this would be considered an anti-trans argument these days, because it allows people to express positions such as “I am attracted to the female sex, regardless of gender”, or “I think bathrooms should be separated by sex (without having a strong opinion on intersex people)”.
With coin, the options are “head” and “tails”, so “head” moves you in one direction.
With LLMs, the options are “worse than expected”, “just as expected”, “better than expected”, so “just as expected” does not have to move you in a specific direction.
Yesterday, I saw a video on social media of a woman discussing how, since moving to the US, she has started mysteriously gaining weight even though she’s eating the same amount of food as before she moved here.
Just wanted to add a data point, that 20 years ago a classmate told me a similar thing. When she returned from USA, the mysterious weight gain stopped (I don’t remember whether she returned to the original weight, or just stopped gaining more).
Larry Page allegedly dismissed concern about AI risk as speciesism.
That’s what we get for living in a culture where calling something ”...ism” wins the debate.
You just need to get good at creative thinking, management and framing ideas.
Yeah, the skills necessary for the (near) future.
Though I wonder about implications for education. For the sake of argument, let’s imagine that the AIs remain approximately as powerful as they are today for a few more decades, i.e. no Singularity, no paperclips. How should we change education, to make the new generation adapt to this situation.
In case of adults, we have already learned “creative thinking, management and framing ideas” by also doing lots of the things that the LLMs can now do for us. For example, I let LLMs write JavaScript code for me, but the reason I can evaluate that code, suggest improvement, etc. is that in the past I wrote a lot of JavaScript code by hand. Is it possible to get these skills some other way? Or will the future humans only practice the loop of: “AI, do what I want. AI, figure out the problem and fix it. AI, try harder. AI, try superhard. Nevermind, AI, delete the project, clear your cache, and try again.” :D
Ah, I can totally relate to this. Whenever I think about asking for money, the Impostor Syndrome gets extra strong. Meanwhile, there are actual impostors out there collecting tons of money without any shame. (Though they may have better social skills, which is probably the category of skill that ultimately gets paid best.)
Another important lesson I got once, which might be useful for you at some moment: “If you double your prices, and lose half of your customers as a result, you will still get the same amount of money, but only work half as much.”
Also, speaking from my personal experience, the relation between how much / how difficult work someone wants you to do, and how much they are willing to pay you, seems completely random. One might naively expect that a job that pays more will be more difficult, but often it is the other way round.
It happens right in front of my house. Addicts steal things at shops, sell them at a pawn shop, buy drugs from a dealer waiting right in front of the pawn shop (difficult not to notice: the only guy who wears a black hoodie in the middle of a hot day), then inject the drugs behind the pawn shop.
When the shops are closed, or the addicts draw too much attention from the security, they try breaking into our houses and cellars instead. Every door in the neighborhood has signs of an attempt to pry it open.
What can we do about this?
Make shop theft impossible? Unlikely to happen, they would need to have the security literally everywhere.
Prevent the pawn shop from buying stolen stuff? I don’t understand the details, but I was told that if they make the seller sign a paper saying “I totally swear I didn’t steal this”, they are legally ok. The pawn shop owner definitely knows that he deals with stolen stuff; that’s why he moves everything between the shops, so that when they rob your house, you won’t find your stuff in the shop window on your street the next day.
Prevent the dealer from selling drugs? That’s quite tricky, legally, because owning a small amount of drug “for your own use” is not illegal here. Of course the dealer only brings one small bag of powder each time. Also, you pay to one guy, and get the powder from another guy, so it is legally tricky to determine at which moment exactly the drug was sold. (The first guy didn’t give you any drug, and the second guy didn’t take any money from you. If you only catch one of them, he will probably claim it was just a misunderstanding.)
So our only remaining option is to form a vigilante squad, and… well, I am not going to write down anything that may or may not happen afterwards. Didn’t expect this to happen to me, and yet, here I am.
Reducing penalties for drug use is a well-sounding idiocy. In theory, you reduce the penalties for drug use, but in practice, you reduce the penalties for drug distribution, because most of the time when you catch a dealer, he can argue that this was all for his own use. And he only takes with him one bag of powder at a time. Yes, in theory it is possible to find the store full of bags, but you just made it needlessly complicated. When drug possession is a crime, you can catch the dealer, he says it was for his own use, you arrest him anyway. (Ideally, you would give him exponentially increasing sentences, starting with community service.)
This feels like debating a holocaust denier. We are moving from “it did not happen at all” to “maybe it wasn’t six million Jews but only five million”. (“You did not name a single historian, Greek city state, solitary event, or personality from history” → “ancients simply did not keep accurate records … what evidence we do have shows the numbers to be always exaggerated”)
The argument by inaccurate records goes both ways. If there is a genocide today, we probably know about it, and someone at least makes a note in Wikipedia. In the past, ethnic groups could be erased with no one (other than the people involved in the war) noticing. The fact that the list of known genocides in 20th century is longer than the list of known genocides in e.g. 12th century is mostly because of better bookkeeping.
And yet, despite choosing a century randomly (if I tried on purpose, I could have chosen e.g. the 13th century with Albigenian Crusade as a good example), Wikipedia mentions “Massacre of the Latins” with about 60 000 dead in the 12th century. In a world where the population was not even 1⁄10 of what it is today, so relatively comparable with the numbers that you have mentioned. And we have no idea about what massacres might have happened in 12th century Africa.
So yes, today we have more victims in absolute numbers, but that’s because we have larger populations and stronger weapons. When you have to kill your enemies using a hand axe, I guess you get quite tired after chopping off dozen heads. With a nuke, you just press a button and thousands die. And yet, despite the other side having nukes, most Japanese survived WW2. (Which is something they totally did not expect, given their usual behavior towards defeated enemies.) The people in the past were as efficient at killing their enemies with swords, as we are with the weapons of mass destruction today.
“Now go, attack the Amalekites and totally destroy all that belongs to them. Do not spare them; put to death men and women, children and infants, cattle and sheep, camels and donkeys.” (1 Samuel 15:3) Tell me again how civilians were not considered valid targets in the past.
You mention compelling prisoners of war to labor, as an analogy to slavery. Yeah, but that was an exception during the war. (Except for the Soviets, who conveniently kept many of the prisoners of war long after the war was over.) Now compare to a situation thousand years ago, when the slave trade was a crucial part of European economy, comparable to oil trade today. The reason entire countries converted to Christianity was to stop the unending slave raids from their neighbors. (Christians had a taboo against enslaving each other. So did Muslims. Both of them considered it okay to enslave each other, and the pagans.) Or consider Africa: the first black slaves brought to America were legally bought in Africa from the local African slave traders. Americans did not invent slavery; they just provided a huge new market for it.
Sorry, I think it is you who needs to learn history. Yes, humans suck today; the “Noble Savages” were not any better, probably much worse.
Attempts to jailbreak LLMs seem obvious to humans. What does it mean?
Maybe it is a selection bias—a non-obvious jailbreak would simply seem to me like “someone made a valid argument”, so I wouldn’t classify it as a jailbreak attempt.
Is there enough similarity so that we could create an input-checker AI which would only read the input for the purpose of determining whether it is a jailbreak attempt or not… and only if the input is considered okay, it would be passed to the second AI that actually tries to respond to it?
(That is, does the fact that the AI tries to respond to the input make it more vulnerable? As an analogy, imagine a human who is supposed to (a) write a reply to a comment, or (b) verify that the comment is written in English. The comment happens to contain a triggering description, or a mindkilling argument, or some other bad meme. I would expect the person who only verifies the English language to be less impacted, because they interact with the content of the meme less.)
Assuming that it is possible to jailbreak every AI, including the input checker, are there universal jailbreaks that apply to all AIs, or do you need to make a tailored jailbreak for each? Could we increase the resistance by having several different input-checker AIs, and have each input checked by three randomly selected ones?
(It is important that the algorithm that implements “if two out of three AIs say that the content is safe, but one says that it is a jailbreaking attempt, reject the input” is classical, not an LLM—otherwise it would be more efficient to jailbreak this one.)
Interesting perspective. The difficult part will be that the proposed metrics are of the “more or less” type, rather than “yes or no”. So one must be familiar with multiple beliefs, in order to put the specific one on the scale.
Psychological comfort—each belief implicitly divides people into two groups: those who understand it and those who don’t; the former is better. Knowing the Pythagorean theorem can make you proud of your math skills.
It gets suspicious when a seemingly simple belief explains too much. Knowing the Pythagorean theorem allows you to calculate the longest side of a right-angled triangle, a distance between two points in N-dimensional Euclidean space even for N>3, or allows you to prove that sin²(φ) + cos²(φ) = 1, but that’s it. If it also told you how to dress, what to eat, and which political party to vote for, that would be suspicious.
On the opposite end of the scale, mathematics as a whole claims to explains a lot, it is practically involved in everything, but it is a ton of knowledge that takes years or decades to study properly. It would be suspicious if something similarly powerful could be understood by merely reading a book and hanging out with some group.
(Elephant in the room: what about “rationality”, especially the claim that “P(A|B) = [P(A)*P(B|A)]/P(B)” explains the entire multiverse and beyond? I think it is kinda okay, as long as you remember that you also need specific data to apply the formula to; mere general knowledge won’t help you figure out the details. Also, no one claims that the Bayes Theorem can only be used for good purposes, or that it makes you morally superior.)
Self-Sealing Mechanisms—be careful when the belief is supported by things other than arguments and data; for example by violence (verbal or otherwise). This is also tricky: is it okay to “cancel” a crackpot? I think it is okay to fire crackpots from academic/scientific institutions; those obviously wouldn’t be able to do their job otherwise. But if you start persecuting heretical thoughts expressed by people in their free time, on their blogs, etc., that goes too far.
I sometimes says that “political orientation” is basically “which part of complex reality you decided to ignore”. (That doesn’t mean that if you ignore nothing, you are unable to have opinions or make decisions. But you opinions will be usually be like “this is complicated, in general it is usually better to do X, but there are exceptions, such as Y”. The kind of reasoning that would get you kicked out of any ideological group.)
“Thousands” is probably not enough.
Imagine trying to generate a poem by one algorithm creating thousands of random combinations of words, and another algorithm choosing the most poetic among the generated combinations. No matter how good the second algorithm is, it seems quite likely that the first one simply didn’t generate anything valuable.
As the hypothesis gets more complex, the number of options grows exponentially. Imagine a pattern such as “what if X increases/decreases Y by mechanism Z”. If you propose 10 different values for each of X, Y, Z, you already have 1000 hypotheses.
I can imagine finding some low-hanging fruit if we increase the number of hypotheses to millions. But even there, we will probably be limited by lack of experimental data. (Could a diet consisting only of broccoli and peanut butter cure cancer? Maybe, but how is the LLM supposed to find out?) So we would need to find a hypothesis where we accidentally already made all the necessary experiments and even described the intermediate findings (because LLMs are good at words, but probably suck at analyzing the primary data), but we somehow failed to connect the dots. Not impossible, but requires a lot of luck.
To get further, we need some new insight. Maybe collecting tons of data in a relatively uniform format, and teaching the LLM to translate its hypotheses into SQL queries it could then verify automatically.
(Even with hypothetical ubiquitous surveillance, you would probably need an extra step where the raw video records are transcribed to textual/numeric data, so that you could run queries on them later.)
Siege of Melos: Athens demanded a tribute, Melos refused to pay, the Athenians executed the men of fighting age and sold the women and children into slavery. They then settled 500 of their own colonists on the island.
Battle of Plataea: After the battle of Plataea, the city [Caryae] was captured by the allied Greeks, the city’s men were executed and the women were enslaved.
Miletus: Persians under Darius the Great punished Miletus for rebellion by selling all of the women and children into slavery, killing the men, and expelling all of the young men as eunuchs, thereby assuring that no Miletus citizen would ever be born again.
Battle of Thebes: Thirty thousand were sold into slavery and six thousand slain in the final fighting. The city was burnt to the ground, sparing only the temples, the Cadmae citadel and the house of Pindar, out of gratitude for Pindar’s verses praising Alexander’s ancestor, Alexander I of Macedon.
Yes, I have the same impression. Generating Java or Python code using popular libraries: mostly okay. Generating Lean code: does not compile even after several attempts to fix the code by feeding the compiler errors to LLM.