I attended an AI pause protest recently and thought I’d write up what my experience was like for people considering going to future ones.
I hadn’t been to a protest ever before and didn’t know what to expect. I will probably attend more in the future.
Some things that happened:
There were about 20ish people protesting. I arrived a bit after the protest had begun and it was very easy and quick to get oriented. It wasn’t awkward at all (and I’m normally pretty socially anxious and awkward). The organisers had flyers printed out to give away and there were some extra signs I could hold up.
I held up a sign for some of the protest and tried handing out flyers the rest of the time. I told people who passed by that we were talking about the danger from AI and if they’d like a flyer. Most of them declined but a substantial minority accepted the flyer.
I got the sense that a lot of people who picked up a flyer weren’t just doing it to be polite. For example, I had multiple people walking by mention to me that they agreed with the protest. A person in a group of friends who walked by looked at the flyer and mentioned to their friends that they thought it was cool someone was talking about this.
There were also people who got flyers who misunderstood or didn’t really care for what we were talking about. For example, a mother pointed at the flyer and told her child “see, this is why you should spend less time on your phone.”
I think giving out the flyers was a good thing overall. Some people seemed genuinely interested. Others, even those who rejected it, were pretty polite. Felt like a wholesome experience. If I had planned more for the protest, I think I would have liked to print my own flyers, I also considered adding contact details to the flyers in case people wanted to talk about the content. It would have been interesting to get a better sense of what people actually thought.
During the protest, a person was using a megaphone to talk about AI risk and there were chants and a bit of singing at the end. I really liked the bit at the end, it felt a bit emotional for me in a good way and I gave away a large fraction of the flyers near the end when more people stopped by to see what was going on.
I overheard some people talk about wanting to debate us. I was sad I didn’t get the chance to properly talk to them (plausibly I could have started a conversation while they were waiting for the pedestrian crossing lights to turn green). I think at a future protest, I would like to have a “debate me” or “ask me questions” sign to be able to talk to people in more depth rather than just superficially.
It’s hard to give people a pitch for AI risk in a minute
I feel more positive about AI pause advocacy after the protest, though I do feel uneasy because of not having total control of the pause AI website and the flyers. It still feels roughly close to my views though.
I liked that there were a variety of signs at the protest, representing a wider spectrum of views than just the most doomy ones. Something about there being multiple people with whom I would probably disagree a lot with being there made it feel nicer.
Lots more people are worried about job loss than extinction and want to hear about that. The economist in me will not stop giving them an optimistic picture of AI and employment before telling them about extinction. This is hard to do when you only have a couple of minutes but it feels good being honest about my actual views.
Things I wish I’d known in advance:
It’s pretty fun talking to strangers! A person who was there briefly asked about AI risk, I suggested podcast episodes to him, and he invited me to a Halloween party. It was cool!
I did have some control over when I was photographed and could choose to not be in photos that might be on Twitter if I didn’t feel comfortable with that yet.
I could make my own signs or flyers that represented my views accurately (though it’s still good to have the signs not have many words)
The beliefs of prominent AI safety researchers may not be as well-founded as expected, and people should be cautious about taking their beliefs too seriously.
There is a tendency for people to overestimate their own knowledge and confidence in their expertise.
Social status plays a significant role in the community, with some individuals treated like “popular kids.”
Important decisions are often made in casual social settings, such as lunches and parties.
Geographical separation of communities can be helpful for idea spread and independent thought.
The community has a tendency to engage in off-the-cuff technical discussions, which can be both enjoyable and miscalibrated.
Shared influences, such as Eliezer’s Sequences and HPMOR, foster unique and enjoyable conversations.
The community is more socially awkward and tolerant of weirdness than other settings, leading to more direct communication.
I was recently in Berkeley and interacted a bunch with the longtermist EA / AI safety community there. Some thoughts on that:
I changed my mind about how much I should trust the beliefs of prominent AI safety researchers. It seems like they have thought less deeply about things to arrive at their current beliefs and are less intimidatingly intelligent and wise than I would have expected. The problem isn’t that they’re overestimating their capabilities and how much they know but that some newer people take the more senior people’s beliefs and intuitions more seriously than they should.
I noticed that many people knew a lot about their own specific area and not as much about others’ work as I would have expected. This observation makes me more likely to point out when I think someone is missing something instead of assuming they’ve read the same things I have and so already accounted for the thing I was going to say.
It seemed like more people were overconfident about the things they knew. I’m not sure if that is necessarily bad in general for the community; I suspect pursuing fruitful research directions often means looking overconfident to others because you trust your intuitions and illegible models over others’ reasoning. However, from the outside, it did look like people made confident claims about technical topics that weren’t very rigorous and that I suspect would fall apart when asked to actually clarify things further. I sometimes heard claims like “I’m the only person who understands X” where X was some hot topic related to AI safety followed by some vague description about X which wasn’t very compelling on its own.
What position or status someone has in the community doesn’t track their actual competence or expertise as much as I would have expected and is very affected by how and when they got involved in the community.
Social status is a big thing, though more noticeable in settings where there are many very junior people and some senior researchers. I also got the impression that senior people were underestimating how seriously people took the things they said, such as off-the-cuff casual remarks about someone’s abilities, criticism of someone’s ideas, and random hot takes they hadn’t thought about for too long. (It feels weird to call them “senior” people when everyone’s basically roughly the same age.)
In some ways, it felt like a mild throwback to high school with there being “popular kids” that people wanted to be around, and also because of how prevalent gossiping about the personal lives of those people is.
Important decisions are made in very casual social settings like over lunch or at random parties. Multiple people mentioned they primarily go to parties or social events for professional reasons. Things just seem more serious/“impactful”. It sometimes felt like I was being constantly evaluated especially on intelligence even while trying to just have enjoyable social interactions, though I did manage to find social environments in the end that did not feel this way, or possibly I just stopped being anxious about that as much.
It possibly made it more difficult for me to switch off the part of my brain that thinks constantly about AI existential risk.
I think it is probably quite helpful to have multiple communities separated geographically to allow ideas to spread. I think my being a clueless outsider with limited knowledge of what various people thought of various other people’s work made it easier for me to form my own independent impressions.
Good parts
The good parts were that it was easier to have more technical conversations that assumed lots of context even while at random parties which is sometimes enjoyable for me and something I now miss. Though I wish a greater proportion of them had been about fun mathy things in general rather than just things directly relevant to AI safety.
It also felt like people stated their off-the-cuff takes on technical topics (eg: random areas of biology) a lot more than usual. This was a bit weird for me in the beginning when I was experiencing deep imposter syndrome because I felt like they knew a lot about the thing they were talking about. Once I realised they did not, this was a fun social activity to participate in. Though I think some people take it too far and are miscalibrated about how correct their armchair thinking is on topics they don’t have actual expertise in.
I also really enjoyed hanging out with people who had been influenced by some of the same things I had been influenced by such as Eliezer’s Sequences and HPMOR. It felt like there were some fun conversations that happened there as a result that I wouldn’t be able to have with most people.
There was also noticeably slightly more social awkwardness in general which was great for me as someone who doesn’t have the most elite social skills in normal settings. It felt like people were more tolerant of some forms of weirdness. It also felt like once I got back home, I was noticeably more direct in the way I communicated (a friend mentioned this) as a result of the bay area culture. I also previously thought some bay area people were a bit rude and unapproachable, having only read their interactions on the internet but I think this was largely just caused by it being difficult to convey tone via text, especially when you’re arguing with someone. People were more friendly, approachable, and empathetic in real life than I assumed and now I view the interactions I have with them online somewhat differently.
Was having an EA conversation with some uni group organisers recently and it was terrifying to me that a substantial portion of them, in response to FTX, wanted to do PR for EA (implied in for eg supporting putting out messages of the form “EA doesn’t condone fraud” on their uni group’s social media accounts) and also that a couple of them seem to be running a naive version of consequentialism that endorsed committing fraud/breaking promises if the calculations worked out in favour of doing that for the greater good. Most interesting was that one group organiser was in both camps at once.
I think it is bad vibes that these uni students feel so emotionally compelled to defend EA, the ideology and community, from attack, and this seems plausibly really harmful for their own thinking.
I had this idea in my head of university group organisers modifying what they’re saying to be more positive about EA ideas to newcomers but thought this was a scary concern I was mostly making up but after some interactions with uni group organisers outside my bubble, this feels more important to me. People explicitly mentioned policing what they said to newcomers in order to not turn them off or give them reasons to doubt EA, and tips like “don’t criticise new people’s ideas in your first interactions with them as an EA community builder in order to be welcoming” were mentioned.
All this to say: I think some rationality ideas I consider pretty crucial for people trying to do EA uni group organising to be exposed to are not having the reach they should.
a naive version of consequentialism that endorsed committing fraud/breaking promises if the calculations worked out in favour of doing that for the greater good.
How self-aware was the group organizer of being in both camps?
All this to say: I think some rationality ideas I consider pretty crucial for people trying to do EA uni group organising to be exposed to are not having the reach they should.
It might be that they are rational at maximizing utility. It can be useful for someone who is okay with fraud to publically create an image that they aren’t.
You would expect that people who are okay with fraud are also okay with creating a false impression of them appearing to be not okay with fraud.
You’re right. When I meant some rationality ideas, I meant concepts that have been discussed here on LessWrong before, like Eliezer’s Ends Don’t Justify Means (Among Humans) post and Paul Christiano’s Integrity for Consequentialists post, among other things. The above group organiser doesn’t have to agree with those things but in this case, I found it surprising that they just hadn’t been exposed to the ideas around running on corrupted hardware and certainly hadn’t reflected on that and related ideas that seem pretty crucial to me.
My own view is that in our world, basically every time a smart person, even a well-meaning smart EA (like myself :p), does the rough calculations and they come out in favour of lying where a typical honest person wouldn’t or in favour of breaking promises or committing an act that hurts a lot of people in the short term for the “greater good”, almost certainly their calculations are misguided and they should aim for honesty and integrity instead.
Interesting bet on AI progress (with actual money) made in 1968:
1968 – Scottish chess champion David Levy makes a 500 pound bet with AI pioneers John McCarthy and Donald Michie that no computer program would win a chess match against him within 10 years.
1978 – David Levy wins the bet made 10 years earlier, defeating Chess 4.7 in a six-game match by a score of 4½–1½. The computer’s victory in game four is the first defeat of a human master in a tournament
In 1973, Levy wrote:
“Clearly, I shall win my … bet in 1978, and I would still win if the period were to be extended for another ten years. Prompted by the lack of conceptual progress over more than two decades, I am tempted to speculate that a computer program will not gain the title of International Master before the turn of the century and that the idea of an electronic world champion belongs only in the pages of a science fiction book.”
After winning the bet:
“I had proved that my 1968 assessment had been correct, but on the other hand my opponent in this match was very, very much stronger than I had thought possible when I started the bet.”He observed that, “Now nothing would surprise me (very much).”
In 1996, Popular Science asked Levy about Garry Kasparov’s impending match against Deep Blue. Levy confidently stated that ”...Kasparov can take the match 6 to 0 if he wants to. ‘I’m positive, I’d stake my life on it.’” In fact, Kasparov lost the first game, and won the match by a score of only 4–2. The following year, he lost their historic rematch 2.5–3.5.
So seems like he very much underestimated progress in chess despite winning the original bet.
I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made.
After some introspection, I think these are the mechanisms that made me feel that way:
They were very confident about their claim. Partly I felt annoyance because I didn’t feel like there was anything that would change their mind, partly I felt annoyance because it felt like they didn’t have enough status to make very confident claims like that. This is more linked to confidence in body language and tone rather than their confidence in their own claims though both matter.
Credentialism: them being unwilling to explain things and taking it as a given that they were correct because I didn’t have the specific experiences or credentials that they had without mentioning what specifically from gaining that experience would help me understand their argument.
Not letting me speak and interrupting quickly to take down the fuzzy strawman version of what I meant rather than letting me take my time to explain my argument.
Morality: I felt like one of my cherished values was being threatened.
The other person was relatively smart and powerful, at least within the specific situation. If they were dumb or not powerful, I would have just found the conversation amusing instead.
The other person assumed I was dumb or naive, perhaps because they had met other people with the same position as me and those people came across as not knowledgeable.
The other person getting worked up, for example, raising their voice or showing other signs of being irritated, offended, or angry while acting as if I was the emotional/offended one. This one particularly stings because of gender stereotypes. I think I’m more calm and reasonable and less easily offended than most people. I’ve had a few conversations with men where it felt like they were just really bad at noticing when they were getting angry or emotional themselves and kept pointing out that I was being emotional despite me remaining pretty calm (and perhaps even a little indifferent to the actual content of the conversation before the conversation moved to them being annoyed at me for being emotional).
The other person’s thinking is very black-and-white, thinking in terms of a very clear good and evil and not being open to nuance. Sort of a similar mechanism to the first thing.
Some examples of claims that recently triggered me. They’re not so important themselves so I’ll just point at the rough thing rather than list out actual claims.
AI killing all humans would be good because thermodynamics god/laws of physics good
Animals feel pain but this doesn’t mean we should care about them
We are quite far from getting AGI
Women as a whole are less rational than men are
Palestine/Israel stuff
Doing the above exercise was helpful because it helped me generate ideas for things to try if I’m in situations like that in the future. But it feels like the most important thing is to just get better at noticing what I’m feeling in the conversation and if I’m feeling bad and uncomfortable, to think about if the conversation is useful to me at all and if so, for what reason. And if not, make a conscious decision to leave the conversation.
Reasons the conversation could be useful to me:
I change their mind
I figure out what is true
I get a greater understanding of why they believe what they believe
Enjoyment of the social interaction itself
I want to impress the other person with my intelligence or knowledge
Things to try will differ depending on why I feel like having the conversation.
I am constantly flipping back and forth between “I have terrible social skills” and “People only think I am smart and competent because I have charmed them with my awesome social skills”.
Due to lurking online in the rationalist community as a teenager, I had so much anxiety about intelligence that affected my life. In particular, it made me avoid testing my fit for different things because I thought I wasn’t smart and didn’t want to confirm it to myself. And it made me more anxious about giving up on things that were a bad fit because I thought not being good at a thing was evidence of my not being smart enough. Things I would tell teenage me if I could go back in time:
It is valuable to explore the anxiety you feel around doing an IQ test. Things like learning about and hanging out with successful, cool people working on impressive projects who weren’t child prodigies will be helpful for exploring those anxious feelings and becoming comfortable with not being the smartest person in the room.
There are just some things that you won’t be good at because of lacking some sort of natural and mostly unchangeable quality that is correlated with raw intelligence. However, it still seems worth it to try things that would give you stronger evidence of what you are good at and what you are not—an IQ score is some evidence but not as good as actually trying the thing.
For a lot of things, including very intellectual pursuits like programming or doing certain kinds of research, you are probably underestimating the role of getting good mentorship and just practising and getting lots of feedback compared to just being naturally talented. Lots of people much smarter than you are going to be less good at it because they lack experience and motivation. Lots of people less smart than you are going to be much better than you because they will have experience and motivation.
Part of the anxiety you feel about your intelligence is because you’re consuming the thoughts of people who are older and more knowledgeable and attributing their being good at coming up with novel ideas and understanding concepts quickly to just raw intelligence. You will become a more impressive and interesting person yourself as you get older and more knowledgeable, and difficult concepts will take less time to understand compared to now.
For sanity reasons, I tried writing down my thoughts quickly about AI alignment and where they actually come from. I know a bunch about various parts of various alignment agendas from reading posts and attending talks and I sometimes can hold my own in conversations about them. However, a lot of my actual thoughts about how I expect things to turn out are super fuzzy, sound dumb, and are very much based on deferring to people and being very confused.
Why is AI existential risk a big deal
Big models seem competent at various things. I currently predict that in 2023, I will see more impressive advances in AI than in 2022. I am confused about how exactly strategic awareness will arise or be implemented in AI systems but it does feel like there are strong incentives to make that and other important capabilities happen. Just because these systems can appear competent, for example chatgpt for the most part gives good answers to questions in a way that OpenAI would want, doesn’t mean that they will be aimed at the right thing. I am not sure how actually agency or aims would arise/be implemented in systems future systems that share similarities with GPT but I expect whatever they are aimed at to not be conducive to human survival and flourishing. I expect things that are actually bad for the model seeming competent (eg: it outputs violent text when OpenAI doesn’t want it to or fails to bring you coffee if that’s what you want) to get papered over while deeper questions of it actually valuing what humans would value to be something that AGI labs are too confused to ask and resolve. It seems really hard, naively, to point our systems as they get more agency and strategic awareness, at the right things instead of pursuing casual fixes that just make it look like the model is doing what we want.
A lot of my thoughts are also influenced by discussions on deceptive alignment. The arguments to me make sense. However, because of the lack of empirical evidence, I am relying heavily on my intuitions of powerful AI systems being consequentialists (because that’s good for pursuing goals) and also on people who have thought about this more continuing to be concerned and my not having encountered arguments against this being a thing. Under my model, deceptive alignment is a thing that is _expected_ to happen, not just a worrying possibility but I feel like thinking and reading more on what future AI systems will look like concretely could change how plausible I find this (in either direction).
Forecasting AI
I feel like people often talk about timelines in confusing ways. Intuitively, it feels to me like we are ~7-30 years away (with lots of uncertainty) but I don’t know how to gain more evidence to become more confident. I am further confused because a lot of people rely on the bioanchors model as a starting point whereas to me it is not much evidence for anything besides “it is not implausible we get transformative AI this century”. I expect thinking forward from existing AI capabilities to shift my intuitions more but this feels weird and too inside-viewy in a way I feel like I don’t have the license to be because I don’t expect to be able to come up with good explanations for why AGI is soon or later.
About takeoff: when I read Paul on slow takeoff, most of his arguments seem convincing to me. When I read Eliezer on fast takeoff, most of his arguments seem convincing to me. I expect this is one topic where if I just started writing my thoughts on what I think about various arguments I’ve read, this would be useful for me in helping generate some insights and shifting my views.
How hard is alignment
I sometimes hang around with people who are working on alignment and normally give p(doom) around ~25%. They say this is because of their intuitions about how hard a problem like this is to solve and their thoughts around what sorts of things AGI labs will be willing to try and implement. I think my own p(doom) would be higher than that because I assign higher credence to the problem just being _much_ more difficult than our ability to solve it for technical as well as coordination reasons. This depends on other considerations such as what exactly progress in AI capabilities will look like.
However, the position of alignment actually not being that difficult also sometimes sounds convincing to me depending on what I am reading or whose talks I am attending. I have some intuitions about why OpenAI’s alignment plan wouldn’t work out but I, unfortunately, haven’t thought about this hard enough to explain exactly why while someone red-teams my answers. So I don’t know, doesn’t seem _that_ implausible to me that we could just get lucky with how hard alignment is.
My current model of many people working on alignment is that they’re trying to solve problems that seem like they would be helpful to solve (eg: mechanistic interpretability, model evaluations, etc.) but don’t have an alignment _plan_. This is as expected since having an alignment plan means we have something that we think would solve the problem if it works out. I think Paul Christiano does have a plan in mind. I am currently trying to understand whatever ARC has put out because other people I talk to think Paul is very smart and defer to him a lot.
Things I want to do to figure out my thoughts on things:
Understand the goals of people working on mechanistic interpretability enough to have thoughts on how I expect mechanistic interpretability to progress and how useful I expect it to be
Read ARC’s stuff and form thoughts on if work on the agenda goes really well if this would solve the alignment problem
Think harder about what progress in AI capabilities looks like and what capabilities come in what order
Figure out why Daniel Kokotajlo relies a bunch on the bioanchors model for _his_ forecasts
Understand John Wentworth’s The Plan to figure out if it is any good
Red-team OpenAI’s safety plan without deferring to other people
Read the 2021 MIRI conversations properly
Write my thoughts more clearly as I do the above
I think the above will be useful. I expect for decision-making reasons, I will continue to act based on deferring to people who seem reasonable and whom other people in my circles who seem more knowledgeable and smarter than me defer to.
A lot of reasonable AI alignment ideas are only going to be relevant post-singularity, and changing understanding of timelines keeps reshuffling them out of current relevance. Turns out LLM human imitations very likely can be very capable on their own, without a separate AGI needed to build them up. AI alignment is so poorly understood that any capable AIs that are not just carefully chosen human imitations are going to be more dangerous, that seems like a solid bet for the next few years before LLMs go full AGI. So alignment concerns about such AIs (that are not just human imitations) should be post-singularity concerns for the human imitations to worry about.
Oh and it also feels like some other things could be even more important for me to think about but I forgot to mention them because I rarely have conversations with people about those things so they feel less salient. Things such as s-risks, governance stuff even if we solve the technical challenge of alignment, what conclusions I can make from the fact that lots of people in alignment disagree pretty deeply with each other.
“If a factory is torn down but the rationality which produced it is left standing, then that rationality will simply produce another factory. If a revolution destroys a government, but the systematic patterns of thought that produced that government are left intact, then those patterns will repeat themselves. . . . There’s so much talk about the system. And so little understanding.”
One way people can help is by stating their beliefs on AI and the confidence in those beliefs to their friends, family members, and acquaintances who they talk to.
Currently, a bunch of people are coming across things in the news talking about humanity going extinct if AI progress continues as it has and no more alignment research happens. I would expect many of them to not think seriously about it because it’s really hard to shake out of the “business as usual” frame. Most of your friends and family members probably know you’re a reasonable, thoughtful person and it seems helpful to make people feel comfortable engaging with the arguments in a serious way instead of filing it away in some part of their brain that doesn’t affect their actions or predictions about the future in any way.
I have talked to my dad about how I feel very uncertain about making it to 40, that (with lots of uncertainty) I currently expect not to unless there’s coordination to slow AI development or a lot more effort towards AI alignment. He is new to this so had a bunch of questions but said he didn’t find it weird and now thinks it is scary. It was interesting noticing the inferential distance, since he initially had confusions like “If the AI gets consciousness, won’t it want to help other conscious beings?” and “It feels weird to be so against change, humanity will adapt” but I think he gets it now.
I think sharing sincerely the things you believe with more people is good.
I wasn’t expecting the development endgame to be much different, though it’s a bit early. At least it’s LLMs and not Atari-playing RL agents. Also, I’m much less certain about inevitability of boundary-norm-ignoring optimizers now, in a world that’s not too dog eat dog at the top. This makes precise value targeting less crucial for mere survival, though most of the Future is still lost without it.
So the news is good. I’m personally down to 70% probability of extinction, mostly first AGIs failing to prevent the world from getting destroyed by their research output, since it isn’t looking like they are going to be superintelligent out of the box. I’m no longer expecting the first AGIs to intentionally destroy the world, unless users are allowed to explicitly and successfully wish for it to be destroyed, which bizarrely seems like a significant portion of the risk.
I think there will probably be even more discussion of AI x-risk in the media in the near future. My own media consumption is quite filtered but for example, the last time I was in an Uber, the news channel on the radio mentioned Geoffrey Hinton thinking AI might kill us all. And it isn’t a distant problem for my parents the way climate change is because they use Chat-GPT and are both impressed and concerned by it. They’ll probably form thoughts on it anyway, and I’d prefer if I can be around to respond to their confusion and concerns.
It also seems plausible that there is more AI panic and anxiety amongst some fraction of the general public in the near future. And I’d prefer the people I love are eased into it rather than feeling panicked and anxious all at once and not knowing how to deal with it.
It’s also useful for me to get a pulse on how people outside my social group (which is mostly heavily filtered as well) respond to AI x-risk arguments. For example, I didn’t know before what ideas that seemed obvious to me (being more intelligent doesn’t mean you have nice values, why humans care about the things we care about, that if something much smarter than us aims to take over it will succeed quickly etc) were completely new to my parents or friends who are not rationalist-adjacent(-adjacent).
I also think being honest with people close to me is more compassionate and good but that by itself wouldn’t compel me to actively discuss AI x-risk with them.
When we compare results from PaLM 540B to our own identically trained 62B and 8B model variants, improvements are typically log-linear. This alone suggests that we have not yet reached the apex point of the scaling curve. However, on a number of benchmarks, improvements are actually discontinuous, meaning that the improvements from 8B to 62B are very modest, but then jump immensely when scaling to 540B. This suggests that certain capabilities of language models only emerge when trained at sufficient scale, and there are additional capabilities that could emerge from future generations of models.
Examples of the tasks for which there was discontinuous improvement were: english_proverbs (guess which proverb best describes a text passage from a list—requires a very high level of abstract thinking) and logical_sequence (order a set of “things” (months, actions, numbers, letters, etc.) into their logical ordering.).
Eg of a logical_sequence task:
Input: Which of the following lists is correctly ordered chronologically? (a) drink water, feel thirsty, seal water bottle, open water bottle (b) feel thirsty, open water bottle, drink water, seal water bottle (c) seal water bottle, open water bottle, drink water, feel thirsty
From Ray Kurzweil’s predictions for 2019 (written in 1999):
On Politics and Society
People are beginning to have relationships with automated personalities as companions, teachers, caretakers, and lovers. Automated personalities are superior to humans in some ways, such as having very reliable memories and, if desired, predictable (and programmable) personalities. They are not yet regarded as equal to humans in the subtlety of their personalities, although there is disagreement on this point.
An undercurrent of concern is developing with regard to the influence of machine intelligence. There continue to be differences between human and machine intelligence, but the advantages of human intelligence are becoming more difficult to identify and articulate. Computer intelligence is thoroughly interwoven into the mechanisms of civilization and is designed to be outwardly subservient to apparent human control. On the one hand, human transactions and decisions require by law a human agent of responsibility, even if fully initiated by machine intelligence. On the other hand, few decisions are made without significant involvement and consultation with machine‐based intelligence.
Public and private spaces are routinely monitored by machine intelligence to prevent interpersonal violence. People attempt to protect their privacy with near‐unbreakable encryption technologies, but privacy continues to be a major political and social issue with each individualʹs practically every move stored in a database somewhere.
The existence of the human underclass continues as an issue. While there is sufficient prosperity to provide basic necessities (secure housing and food, among others) without significant strain to the economy, old controversies persist regarding issues of responsibility and opportunity. The issue is complicated by the growing component of most employmentʹs being concerned with the employeeʹs own learning and skill acquisition. In other words, the difference between those ʺproductivelyʺ engaged and those who are not is not always clear.
On The Arts
Virtual artists in all of the arts are emerging and are taken seriously. These cybernetic visual artists, musicians, and authors are usually affiliated with humans or organizations (which in turn are comprised of collaborations of humans and machines) that have contributed to their knowledge base and techniques. However, interest in the output of these creative machines has gone beyond the mere novelty of machines being creative.
Visual, musical, and literary art created by human artists typically involve a collaboration between human and machine intelligence.
The type of artistic and entertainment product in greatest demand (as measured by revenue generated) continues to be virtual‐experience software, which ranges from simulations of ʺrealʺ experiences to abstract environments with little or no corollary in the physical world.
On Philosophy:
There are prevalent reports of computers passing the Turing Test, although these instances do not meet the criteria (with regard to the sophistication of the human judge, the length of time for the interviews, etcetera) established by knowledgeable observers. There is a consensus that computers have not yet passed a valid Turing Test, but there is growing controversy on this point.
The subjective experience of computer‐based intelligence is seriously discussed, although the rights of machine intelligence have not yet entered mainstream debate. Machine intelligence is still largely the product of a collaboration between humans and machines, and has been programmed to maintain a subservient relationship to the species that created it.
Paul Graham’s essay on What You Can’t Say is very practical. The tests/exercises he recommends for learning true, controversial things were useful to me.
Even if trying the following tests yields statements that aren’t immediately useful, I think the act of noticing where you disagree with someone or something more powerful is good practice. I think similar mental muscles get used when noticing when you disagree or are confused about a commonly-held assumption in a research field or when noticing important ideas that others are neglecting.
The different exercises he suggests (copied or paraphrased according to how I internalised them):
The conformist test: asking yourself the classic “Do I have any opinions that I would be reluctant to express in front of a group of my peers?”
What do people get in trouble for: look out for what things other people say that get them in trouble. Ask yourself if you think that thing or some version of it is true.
Heresy: Take a label (eg: “sexist”) and try to think of some ideas that would be called that. This is useful because ideas wouldn’t come to mind in random order but the plausible ones will (plausibly) come to mind first. Then for each one, ask if it might be true.
Time and space: compare present ideas against those of different past cultures and see what you get. Also, look at ideas from other present-day cultures that differ from your own.
Prigs: The exercise is to picture someone who has seen a lot (“Imagine a kind of latter-day Conrad character who has worked for a time as a mercenary in Africa, for a time as a doctor in Nepal, for a time as the manager of a nightclub in Miami”). Imagine comparing what’s inside this guy’s head with what’s inside the head of a well-behaved sixteen-year-old girl from the suburbs. What does he think that would shock her?
Look at the mechanisms: look at how taboos are created. How do moral fashions arise and why are they adopted? What groups are powerful but nervous, and what ideas would they like to suppress? What ideas were tarnished by association when they ended up on the losing side of a recent struggle? If a self-consciously cool person wanted to differentiate himself from preceding fashions, which of their ideas would he tend to reject? What are conventional-minded people afraid of saying?
I also liked the tip that if something is being attacked at “x-ist” or “y-ic” rather than being criticised for being incorrect or false, that is a red flag. And this is the case for many things that are heretical but true.
Lastly, I think the advice on being strategic and not very openly saying things that might get you into trouble is good.
It’s interesting that I feel attuned to social status in EA/EA-adjacent settings. I have been in settings before where people had more status according to the standards of the general public (eg: having political power, being extremely wealthy) and status didn’t feel like a salient thing to me in those contexts. My initial guess for what makes EA settings different is that I don’t feel particularly threatened by people’s perception of my political power or wealth being damaged but in EA settings it feels like being perceived as highly intelligent is more important and I do feel more anxious about people not thinking I’m smart. I also have status-anxiety in situations with heavily-credentialed academics but less so and I suspect that as a community cares more about credentials compared to being generally intelligent, I would have less status-anxiety in that community (even though I am at a lower percentile for credentials than intelligence) because of having less of my self-worth tied up in how credentialed people perceive me to be.
I also think it is interesting to note how noticing and subconsciously caring about social status is not a constant. I notice myself acting more status-blind when I am feeling more secure.
The Art of Gathering book is useful for folks who organise events. The suggestions line up with my experiences of the best gatherings (workshops, retreats, hangouts, etc.) I have attended. You can get most of the value from reading a summary. Here’s mine:
decide why you’re really gathering
think less about the what and more about the why
commit to a bold, sharp, specific purpose
close doors
be willing to exclude people
don’t be a chill host
act with generous authority instead of following a hands-off approach to hosting
make guests do something even if this makes them slightly uncomfortable at times
create a temporary alternate world
design your gathering as a world that will only exist once
have explicit rules
never start a funeral with logistics
prime your guests and honour them on arrival
start with a gesture or activity that encapsulates the gathering’s purpose
keep your best self out of my gathering
encourage guests to share honestly and authentically (and lead by example)
cause good controversy
accept that there is an end
end gatherings well instead of letting them fizzle out
The volunteers’ dilemma game models a situation in which each player can either make a small sacrifice that benefits everybody or instead wait in the hope of benefiting from someone else’s sacrifice.
An example could be: you see a person dying and can decide whether to call an ambulance or not. You prefer that someone else call but if no one else would, you would strongly prefer calling over not calling and the person being dead. So if it is just you watching the person die, you would call the ambulance given these payoffs.
There are as many pure strategy Nash equilibria in this game as there are players, a “player x calls ambulance, everyone else does not” for every x. There’s also a symmetric mixed strategy Nash equilibrium where every player has the same probability p of calling the police.
The fun part is that as the number of bystanders goes up, not only does your own probability of calling go down but also at equilibrium, the combined probability that anyone at all would call the ambulance also goes down. The person is more likely to die if there are more observers around, assuming everyone is playing optimally.
One implication of this is that if you have a team of people, unless you try to assign specific individuals to take charge of specific tasks, you might end up in situations where the probability of tasks happening at all decreases as you add more people (everyone feels like it’s less their responsibility to take care of any particular ball).
Game theory has many paradoxical models in which a player prefers having worse information, not a result of wishful thinking, escapism, or blissful ignorance, but of cold rationality. Coarse information can have a number of advantages.
(a) It may permit a player to engage in trade because other players do not fear his superior information.
(b) It may give a player a stronger strategic position because he usually has a strong position and is better off not knowing that in a particular realization of the game his position is weak.
Or, (c) as in the more traditional economics of uncertainty, poor information may permit players to insure each other.
Rational players never wish to have less information. Strategic options are provably non-negative in value in standard games.
Against opponents who model the player, the player may wish their opponents do not believe they have these options or information. For more powerful modelers as opponents, the best way for the opponents not to know is to actually not have those capabilities. But the value is through opponent manipulation, not direct advantage of fewer options or less knowledge.
It’s important to keep this in mind, in order to avoid over-valuing ignorance. It’s very rarely the best way to manipulate your opponents in the real world.
I attended an AI pause protest recently and thought I’d write up what my experience was like for people considering going to future ones.
I hadn’t been to a protest ever before and didn’t know what to expect. I will probably attend more in the future.
Some things that happened:
There were about 20ish people protesting. I arrived a bit after the protest had begun and it was very easy and quick to get oriented. It wasn’t awkward at all (and I’m normally pretty socially anxious and awkward). The organisers had flyers printed out to give away and there were some extra signs I could hold up.
I held up a sign for some of the protest and tried handing out flyers the rest of the time. I told people who passed by that we were talking about the danger from AI and if they’d like a flyer. Most of them declined but a substantial minority accepted the flyer.
I got the sense that a lot of people who picked up a flyer weren’t just doing it to be polite. For example, I had multiple people walking by mention to me that they agreed with the protest. A person in a group of friends who walked by looked at the flyer and mentioned to their friends that they thought it was cool someone was talking about this.
There were also people who got flyers who misunderstood or didn’t really care for what we were talking about. For example, a mother pointed at the flyer and told her child “see, this is why you should spend less time on your phone.”
I think giving out the flyers was a good thing overall. Some people seemed genuinely interested. Others, even those who rejected it, were pretty polite. Felt like a wholesome experience. If I had planned more for the protest, I think I would have liked to print my own flyers, I also considered adding contact details to the flyers in case people wanted to talk about the content. It would have been interesting to get a better sense of what people actually thought.
During the protest, a person was using a megaphone to talk about AI risk and there were chants and a bit of singing at the end. I really liked the bit at the end, it felt a bit emotional for me in a good way and I gave away a large fraction of the flyers near the end when more people stopped by to see what was going on.
I overheard some people talk about wanting to debate us. I was sad I didn’t get the chance to properly talk to them (plausibly I could have started a conversation while they were waiting for the pedestrian crossing lights to turn green). I think at a future protest, I would like to have a “debate me” or “ask me questions” sign to be able to talk to people in more depth rather than just superficially.
It’s hard to give people a pitch for AI risk in a minute
I feel more positive about AI pause advocacy after the protest, though I do feel uneasy because of not having total control of the pause AI website and the flyers. It still feels roughly close to my views though.
I liked that there were a variety of signs at the protest, representing a wider spectrum of views than just the most doomy ones. Something about there being multiple people with whom I would probably disagree a lot with being there made it feel nicer.
Lots more people are worried about job loss than extinction and want to hear about that. The economist in me will not stop giving them an optimistic picture of AI and employment before telling them about extinction. This is hard to do when you only have a couple of minutes but it feels good being honest about my actual views.
Things I wish I’d known in advance:
It’s pretty fun talking to strangers! A person who was there briefly asked about AI risk, I suggested podcast episodes to him, and he invited me to a Halloween party. It was cool!
I did have some control over when I was photographed and could choose to not be in photos that might be on Twitter if I didn’t feel comfortable with that yet.
I could make my own signs or flyers that represented my views accurately (though it’s still good to have the signs not have many words)
Reflections on bay area visit
GPT-4 generated TL;DR (mostly endorsed but eh):
The beliefs of prominent AI safety researchers may not be as well-founded as expected, and people should be cautious about taking their beliefs too seriously.
There is a tendency for people to overestimate their own knowledge and confidence in their expertise.
Social status plays a significant role in the community, with some individuals treated like “popular kids.”
Important decisions are often made in casual social settings, such as lunches and parties.
Geographical separation of communities can be helpful for idea spread and independent thought.
The community has a tendency to engage in off-the-cuff technical discussions, which can be both enjoyable and miscalibrated.
Shared influences, such as Eliezer’s Sequences and HPMOR, foster unique and enjoyable conversations.
The community is more socially awkward and tolerant of weirdness than other settings, leading to more direct communication.
I was recently in Berkeley and interacted a bunch with the longtermist EA / AI safety community there. Some thoughts on that:
I changed my mind about how much I should trust the beliefs of prominent AI safety researchers. It seems like they have thought less deeply about things to arrive at their current beliefs and are less intimidatingly intelligent and wise than I would have expected. The problem isn’t that they’re overestimating their capabilities and how much they know but that some newer people take the more senior people’s beliefs and intuitions more seriously than they should.
I noticed that many people knew a lot about their own specific area and not as much about others’ work as I would have expected. This observation makes me more likely to point out when I think someone is missing something instead of assuming they’ve read the same things I have and so already accounted for the thing I was going to say.
It seemed like more people were overconfident about the things they knew. I’m not sure if that is necessarily bad in general for the community; I suspect pursuing fruitful research directions often means looking overconfident to others because you trust your intuitions and illegible models over others’ reasoning. However, from the outside, it did look like people made confident claims about technical topics that weren’t very rigorous and that I suspect would fall apart when asked to actually clarify things further. I sometimes heard claims like “I’m the only person who understands X” where X was some hot topic related to AI safety followed by some vague description about X which wasn’t very compelling on its own.
What position or status someone has in the community doesn’t track their actual competence or expertise as much as I would have expected and is very affected by how and when they got involved in the community.
Social status is a big thing, though more noticeable in settings where there are many very junior people and some senior researchers. I also got the impression that senior people were underestimating how seriously people took the things they said, such as off-the-cuff casual remarks about someone’s abilities, criticism of someone’s ideas, and random hot takes they hadn’t thought about for too long. (It feels weird to call them “senior” people when everyone’s basically roughly the same age.)
In some ways, it felt like a mild throwback to high school with there being “popular kids” that people wanted to be around, and also because of how prevalent gossiping about the personal lives of those people is.
Important decisions are made in very casual social settings like over lunch or at random parties. Multiple people mentioned they primarily go to parties or social events for professional reasons. Things just seem more serious/“impactful”. It sometimes felt like I was being constantly evaluated especially on intelligence even while trying to just have enjoyable social interactions, though I did manage to find social environments in the end that did not feel this way, or possibly I just stopped being anxious about that as much.
It possibly made it more difficult for me to switch off the part of my brain that thinks constantly about AI existential risk.
I think it is probably quite helpful to have multiple communities separated geographically to allow ideas to spread. I think my being a clueless outsider with limited knowledge of what various people thought of various other people’s work made it easier for me to form my own independent impressions.
Good parts
The good parts were that it was easier to have more technical conversations that assumed lots of context even while at random parties which is sometimes enjoyable for me and something I now miss. Though I wish a greater proportion of them had been about fun mathy things in general rather than just things directly relevant to AI safety.
It also felt like people stated their off-the-cuff takes on technical topics (eg: random areas of biology) a lot more than usual. This was a bit weird for me in the beginning when I was experiencing deep imposter syndrome because I felt like they knew a lot about the thing they were talking about. Once I realised they did not, this was a fun social activity to participate in. Though I think some people take it too far and are miscalibrated about how correct their armchair thinking is on topics they don’t have actual expertise in.
I also really enjoyed hanging out with people who had been influenced by some of the same things I had been influenced by such as Eliezer’s Sequences and HPMOR. It felt like there were some fun conversations that happened there as a result that I wouldn’t be able to have with most people.
There was also noticeably slightly more social awkwardness in general which was great for me as someone who doesn’t have the most elite social skills in normal settings. It felt like people were more tolerant of some forms of weirdness. It also felt like once I got back home, I was noticeably more direct in the way I communicated (a friend mentioned this) as a result of the bay area culture. I also previously thought some bay area people were a bit rude and unapproachable, having only read their interactions on the internet but I think this was largely just caused by it being difficult to convey tone via text, especially when you’re arguing with someone. People were more friendly, approachable, and empathetic in real life than I assumed and now I view the interactions I have with them online somewhat differently.
Was having an EA conversation with some uni group organisers recently and it was terrifying to me that a substantial portion of them, in response to FTX, wanted to do PR for EA (implied in for eg supporting putting out messages of the form “EA doesn’t condone fraud” on their uni group’s social media accounts) and also that a couple of them seem to be running a naive version of consequentialism that endorsed committing fraud/breaking promises if the calculations worked out in favour of doing that for the greater good. Most interesting was that one group organiser was in both camps at once.
I think it is bad vibes that these uni students feel so emotionally compelled to defend EA, the ideology and community, from attack, and this seems plausibly really harmful for their own thinking.
I had this idea in my head of university group organisers modifying what they’re saying to be more positive about EA ideas to newcomers but thought this was a scary concern I was mostly making up but after some interactions with uni group organisers outside my bubble, this feels more important to me. People explicitly mentioned policing what they said to newcomers in order to not turn them off or give them reasons to doubt EA, and tips like “don’t criticise new people’s ideas in your first interactions with them as an EA community builder in order to be welcoming” were mentioned.
All this to say: I think some rationality ideas I consider pretty crucial for people trying to do EA uni group organising to be exposed to are not having the reach they should.
It’s called utilitarianism!
How self-aware was the group organizer of being in both camps?
It might be that they are rational at maximizing utility. It can be useful for someone who is okay with fraud to publically create an image that they aren’t.
You would expect that people who are okay with fraud are also okay with creating a false impression of them appearing to be not okay with fraud.
You’re right. When I meant some rationality ideas, I meant concepts that have been discussed here on LessWrong before, like Eliezer’s Ends Don’t Justify Means (Among Humans) post and Paul Christiano’s Integrity for Consequentialists post, among other things. The above group organiser doesn’t have to agree with those things but in this case, I found it surprising that they just hadn’t been exposed to the ideas around running on corrupted hardware and certainly hadn’t reflected on that and related ideas that seem pretty crucial to me.
My own view is that in our world, basically every time a smart person, even a well-meaning smart EA (like myself :p), does the rough calculations and they come out in favour of lying where a typical honest person wouldn’t or in favour of breaking promises or committing an act that hurts a lot of people in the short term for the “greater good”, almost certainly their calculations are misguided and they should aim for honesty and integrity instead.
Interesting bet on AI progress (with actual money) made in 1968:
1968 – Scottish chess champion David Levy makes a 500 pound bet with AI pioneers John McCarthy and Donald Michie that no computer program would win a chess match against him within 10 years.
1978 – David Levy wins the bet made 10 years earlier, defeating Chess 4.7 in a six-game match by a score of 4½–1½. The computer’s victory in game four is the first defeat of a human master in a tournament
In 1973, Levy wrote:
After winning the bet:
So seems like he very much underestimated progress in chess despite winning the original bet.
https://en.wikipedia.org/wiki/David_Levy_(chess_player)
I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made.
After some introspection, I think these are the mechanisms that made me feel that way:
They were very confident about their claim. Partly I felt annoyance because I didn’t feel like there was anything that would change their mind, partly I felt annoyance because it felt like they didn’t have enough status to make very confident claims like that. This is more linked to confidence in body language and tone rather than their confidence in their own claims though both matter.
Credentialism: them being unwilling to explain things and taking it as a given that they were correct because I didn’t have the specific experiences or credentials that they had without mentioning what specifically from gaining that experience would help me understand their argument.
Not letting me speak and interrupting quickly to take down the fuzzy strawman version of what I meant rather than letting me take my time to explain my argument.
Morality: I felt like one of my cherished values was being threatened.
The other person was relatively smart and powerful, at least within the specific situation. If they were dumb or not powerful, I would have just found the conversation amusing instead.
The other person assumed I was dumb or naive, perhaps because they had met other people with the same position as me and those people came across as not knowledgeable.
The other person getting worked up, for example, raising their voice or showing other signs of being irritated, offended, or angry while acting as if I was the emotional/offended one. This one particularly stings because of gender stereotypes. I think I’m more calm and reasonable and less easily offended than most people. I’ve had a few conversations with men where it felt like they were just really bad at noticing when they were getting angry or emotional themselves and kept pointing out that I was being emotional despite me remaining pretty calm (and perhaps even a little indifferent to the actual content of the conversation before the conversation moved to them being annoyed at me for being emotional).
The other person’s thinking is very black-and-white, thinking in terms of a very clear good and evil and not being open to nuance. Sort of a similar mechanism to the first thing.
Some examples of claims that recently triggered me. They’re not so important themselves so I’ll just point at the rough thing rather than list out actual claims.
AI killing all humans would be good because thermodynamics god/laws of physics good
Animals feel pain but this doesn’t mean we should care about them
We are quite far from getting AGI
Women as a whole are less rational than men are
Palestine/Israel stuff
Doing the above exercise was helpful because it helped me generate ideas for things to try if I’m in situations like that in the future. But it feels like the most important thing is to just get better at noticing what I’m feeling in the conversation and if I’m feeling bad and uncomfortable, to think about if the conversation is useful to me at all and if so, for what reason. And if not, make a conscious decision to leave the conversation.
Reasons the conversation could be useful to me:
I change their mind
I figure out what is true
I get a greater understanding of why they believe what they believe
Enjoyment of the social interaction itself
I want to impress the other person with my intelligence or knowledge
Things to try will differ depending on why I feel like having the conversation.
Thanks for the post. I don’t know if you saw this one: “Thank you for triggering me”, but it might be of interest. Cheers!
I am constantly flipping back and forth between “I have terrible social skills” and “People only think I am smart and competent because I have charmed them with my awesome social skills”.
What a coincidence, that the true version is always the one that happens to be self-limiting at the moment!
I wonder if there are people who have it the other way round.
Due to lurking online in the rationalist community as a teenager, I had so much anxiety about intelligence that affected my life. In particular, it made me avoid testing my fit for different things because I thought I wasn’t smart and didn’t want to confirm it to myself. And it made me more anxious about giving up on things that were a bad fit because I thought not being good at a thing was evidence of my not being smart enough. Things I would tell teenage me if I could go back in time:
It is valuable to explore the anxiety you feel around doing an IQ test. Things like learning about and hanging out with successful, cool people working on impressive projects who weren’t child prodigies will be helpful for exploring those anxious feelings and becoming comfortable with not being the smartest person in the room.
There are just some things that you won’t be good at because of lacking some sort of natural and mostly unchangeable quality that is correlated with raw intelligence. However, it still seems worth it to try things that would give you stronger evidence of what you are good at and what you are not—an IQ score is some evidence but not as good as actually trying the thing.
For a lot of things, including very intellectual pursuits like programming or doing certain kinds of research, you are probably underestimating the role of getting good mentorship and just practising and getting lots of feedback compared to just being naturally talented. Lots of people much smarter than you are going to be less good at it because they lack experience and motivation. Lots of people less smart than you are going to be much better than you because they will have experience and motivation.
Part of the anxiety you feel about your intelligence is because you’re consuming the thoughts of people who are older and more knowledgeable and attributing their being good at coming up with novel ideas and understanding concepts quickly to just raw intelligence. You will become a more impressive and interesting person yourself as you get older and more knowledgeable, and difficult concepts will take less time to understand compared to now.
For sanity reasons, I tried writing down my thoughts quickly about AI alignment and where they actually come from. I know a bunch about various parts of various alignment agendas from reading posts and attending talks and I sometimes can hold my own in conversations about them. However, a lot of my actual thoughts about how I expect things to turn out are super fuzzy, sound dumb, and are very much based on deferring to people and being very confused.
Why is AI existential risk a big deal
Big models seem competent at various things. I currently predict that in 2023, I will see more impressive advances in AI than in 2022. I am confused about how exactly strategic awareness will arise or be implemented in AI systems but it does feel like there are strong incentives to make that and other important capabilities happen. Just because these systems can appear competent, for example chatgpt for the most part gives good answers to questions in a way that OpenAI would want, doesn’t mean that they will be aimed at the right thing. I am not sure how actually agency or aims would arise/be implemented in systems future systems that share similarities with GPT but I expect whatever they are aimed at to not be conducive to human survival and flourishing. I expect things that are actually bad for the model seeming competent (eg: it outputs violent text when OpenAI doesn’t want it to or fails to bring you coffee if that’s what you want) to get papered over while deeper questions of it actually valuing what humans would value to be something that AGI labs are too confused to ask and resolve. It seems really hard, naively, to point our systems as they get more agency and strategic awareness, at the right things instead of pursuing casual fixes that just make it look like the model is doing what we want.
A lot of my thoughts are also influenced by discussions on deceptive alignment. The arguments to me make sense. However, because of the lack of empirical evidence, I am relying heavily on my intuitions of powerful AI systems being consequentialists (because that’s good for pursuing goals) and also on people who have thought about this more continuing to be concerned and my not having encountered arguments against this being a thing. Under my model, deceptive alignment is a thing that is _expected_ to happen, not just a worrying possibility but I feel like thinking and reading more on what future AI systems will look like concretely could change how plausible I find this (in either direction).
Forecasting AI
I feel like people often talk about timelines in confusing ways. Intuitively, it feels to me like we are ~7-30 years away (with lots of uncertainty) but I don’t know how to gain more evidence to become more confident. I am further confused because a lot of people rely on the bioanchors model as a starting point whereas to me it is not much evidence for anything besides “it is not implausible we get transformative AI this century”. I expect thinking forward from existing AI capabilities to shift my intuitions more but this feels weird and too inside-viewy in a way I feel like I don’t have the license to be because I don’t expect to be able to come up with good explanations for why AGI is soon or later.
About takeoff: when I read Paul on slow takeoff, most of his arguments seem convincing to me. When I read Eliezer on fast takeoff, most of his arguments seem convincing to me. I expect this is one topic where if I just started writing my thoughts on what I think about various arguments I’ve read, this would be useful for me in helping generate some insights and shifting my views.
How hard is alignment
I sometimes hang around with people who are working on alignment and normally give p(doom) around ~25%. They say this is because of their intuitions about how hard a problem like this is to solve and their thoughts around what sorts of things AGI labs will be willing to try and implement. I think my own p(doom) would be higher than that because I assign higher credence to the problem just being _much_ more difficult than our ability to solve it for technical as well as coordination reasons. This depends on other considerations such as what exactly progress in AI capabilities will look like.
However, the position of alignment actually not being that difficult also sometimes sounds convincing to me depending on what I am reading or whose talks I am attending. I have some intuitions about why OpenAI’s alignment plan wouldn’t work out but I, unfortunately, haven’t thought about this hard enough to explain exactly why while someone red-teams my answers. So I don’t know, doesn’t seem _that_ implausible to me that we could just get lucky with how hard alignment is.
My current model of many people working on alignment is that they’re trying to solve problems that seem like they would be helpful to solve (eg: mechanistic interpretability, model evaluations, etc.) but don’t have an alignment _plan_. This is as expected since having an alignment plan means we have something that we think would solve the problem if it works out. I think Paul Christiano does have a plan in mind. I am currently trying to understand whatever ARC has put out because other people I talk to think Paul is very smart and defer to him a lot.
Things I want to do to figure out my thoughts on things:
Understand the goals of people working on mechanistic interpretability enough to have thoughts on how I expect mechanistic interpretability to progress and how useful I expect it to be
Read ARC’s stuff and form thoughts on if work on the agenda goes really well if this would solve the alignment problem
Think harder about what progress in AI capabilities looks like and what capabilities come in what order
Figure out why Daniel Kokotajlo relies a bunch on the bioanchors model for _his_ forecasts
Understand John Wentworth’s The Plan to figure out if it is any good
Red-team OpenAI’s safety plan without deferring to other people
Read the 2021 MIRI conversations properly
Write my thoughts more clearly as I do the above
I think the above will be useful. I expect for decision-making reasons, I will continue to act based on deferring to people who seem reasonable and whom other people in my circles who seem more knowledgeable and smarter than me defer to.
A lot of reasonable AI alignment ideas are only going to be relevant post-singularity, and changing understanding of timelines keeps reshuffling them out of current relevance. Turns out LLM human imitations very likely can be very capable on their own, without a separate AGI needed to build them up. AI alignment is so poorly understood that any capable AIs that are not just carefully chosen human imitations are going to be more dangerous, that seems like a solid bet for the next few years before LLMs go full AGI. So alignment concerns about such AIs (that are not just human imitations) should be post-singularity concerns for the human imitations to worry about.
How do we get LLM human imitations?
I meant the same thing as masks/simulacra.
Though currently I’m more bullish about the shoggoths, because masks probably fail alignment security, even though their alignment might be quite robust despite the eldritch substrate.
Oh and it also feels like some other things could be even more important for me to think about but I forgot to mention them because I rarely have conversations with people about those things so they feel less salient. Things such as s-risks, governance stuff even if we solve the technical challenge of alignment, what conclusions I can make from the fact that lots of people in alignment disagree pretty deeply with each other.
“If a factory is torn down but the rationality which produced it is left standing, then that rationality will simply produce another factory. If a revolution destroys a government, but the systematic patterns of thought that produced that government are left intact, then those patterns will repeat themselves. . . . There’s so much talk about the system. And so little understanding.”
One way people can help is by stating their beliefs on AI and the confidence in those beliefs to their friends, family members, and acquaintances who they talk to.
Currently, a bunch of people are coming across things in the news talking about humanity going extinct if AI progress continues as it has and no more alignment research happens. I would expect many of them to not think seriously about it because it’s really hard to shake out of the “business as usual” frame. Most of your friends and family members probably know you’re a reasonable, thoughtful person and it seems helpful to make people feel comfortable engaging with the arguments in a serious way instead of filing it away in some part of their brain that doesn’t affect their actions or predictions about the future in any way.
I have talked to my dad about how I feel very uncertain about making it to 40, that (with lots of uncertainty) I currently expect not to unless there’s coordination to slow AI development or a lot more effort towards AI alignment. He is new to this so had a bunch of questions but said he didn’t find it weird and now thinks it is scary. It was interesting noticing the inferential distance, since he initially had confusions like “If the AI gets consciousness, won’t it want to help other conscious beings?” and “It feels weird to be so against change, humanity will adapt” but I think he gets it now.
I think sharing sincerely the things you believe with more people is good.
I wasn’t expecting the development endgame to be much different, though it’s a bit early. At least it’s LLMs and not Atari-playing RL agents. Also, I’m much less certain about inevitability of boundary-norm-ignoring optimizers now, in a world that’s not too dog eat dog at the top. This makes precise value targeting less crucial for mere survival, though most of the Future is still lost without it.
So the news is good. I’m personally down to 70% probability of extinction, mostly first AGIs failing to prevent the world from getting destroyed by their research output, since it isn’t looking like they are going to be superintelligent out of the box. I’m no longer expecting the first AGIs to intentionally destroy the world, unless users are allowed to explicitly and successfully wish for it to be destroyed, which bizarrely seems like a significant portion of the risk.
Do you think it’s worth doing it if you will cause them distress? I find that hard to decide
I think there will probably be even more discussion of AI x-risk in the media in the near future. My own media consumption is quite filtered but for example, the last time I was in an Uber, the news channel on the radio mentioned Geoffrey Hinton thinking AI might kill us all. And it isn’t a distant problem for my parents the way climate change is because they use Chat-GPT and are both impressed and concerned by it. They’ll probably form thoughts on it anyway, and I’d prefer if I can be around to respond to their confusion and concerns.
It also seems plausible that there is more AI panic and anxiety amongst some fraction of the general public in the near future. And I’d prefer the people I love are eased into it rather than feeling panicked and anxious all at once and not knowing how to deal with it.
It’s also useful for me to get a pulse on how people outside my social group (which is mostly heavily filtered as well) respond to AI x-risk arguments. For example, I didn’t know before what ideas that seemed obvious to me (being more intelligent doesn’t mean you have nice values, why humans care about the things we care about, that if something much smarter than us aims to take over it will succeed quickly etc) were completely new to my parents or friends who are not rationalist-adjacent(-adjacent).
I also think being honest with people close to me is more compassionate and good but that by itself wouldn’t compel me to actively discuss AI x-risk with them.
Examples of the tasks for which there was discontinuous improvement were: english_proverbs (guess which proverb best describes a text passage from a list—requires a very high level of abstract thinking) and logical_sequence (order a set of “things” (months, actions, numbers, letters, etc.) into their logical ordering.).
Eg of a logical_sequence task:
Over all 150 tasks [in BIG-bench], 25% of tasks had discontinuity greater than +10%, and 15% of tasks had a discontinuity greater than +20%.
Discontinuity = (actual accuracy for 540B model) - (log-linear projection using 8b → 62b)
From Ray Kurzweil’s predictions for 2019 (written in 1999):
On Politics and Society
On The Arts
On Philosophy:
Paul Graham’s essay on What You Can’t Say is very practical. The tests/exercises he recommends for learning true, controversial things were useful to me.
Even if trying the following tests yields statements that aren’t immediately useful, I think the act of noticing where you disagree with someone or something more powerful is good practice. I think similar mental muscles get used when noticing when you disagree or are confused about a commonly-held assumption in a research field or when noticing important ideas that others are neglecting.
The different exercises he suggests (copied or paraphrased according to how I internalised them):
The conformist test: asking yourself the classic “Do I have any opinions that I would be reluctant to express in front of a group of my peers?”
What do people get in trouble for: look out for what things other people say that get them in trouble. Ask yourself if you think that thing or some version of it is true.
Heresy: Take a label (eg: “sexist”) and try to think of some ideas that would be called that. This is useful because ideas wouldn’t come to mind in random order but the plausible ones will (plausibly) come to mind first. Then for each one, ask if it might be true.
Time and space: compare present ideas against those of different past cultures and see what you get. Also, look at ideas from other present-day cultures that differ from your own.
Prigs: The exercise is to picture someone who has seen a lot (“Imagine a kind of latter-day Conrad character who has worked for a time as a mercenary in Africa, for a time as a doctor in Nepal, for a time as the manager of a nightclub in Miami”). Imagine comparing what’s inside this guy’s head with what’s inside the head of a well-behaved sixteen-year-old girl from the suburbs. What does he think that would shock her?
Look at the mechanisms: look at how taboos are created. How do moral fashions arise and why are they adopted? What groups are powerful but nervous, and what ideas would they like to suppress? What ideas were tarnished by association when they ended up on the losing side of a recent struggle? If a self-consciously cool person wanted to differentiate himself from preceding fashions, which of their ideas would he tend to reject? What are conventional-minded people afraid of saying?
I also liked the tip that if something is being attacked at “x-ist” or “y-ic” rather than being criticised for being incorrect or false, that is a red flag. And this is the case for many things that are heretical but true.
Lastly, I think the advice on being strategic and not very openly saying things that might get you into trouble is good.
It’s interesting that I feel attuned to social status in EA/EA-adjacent settings. I have been in settings before where people had more status according to the standards of the general public (eg: having political power, being extremely wealthy) and status didn’t feel like a salient thing to me in those contexts. My initial guess for what makes EA settings different is that I don’t feel particularly threatened by people’s perception of my political power or wealth being damaged but in EA settings it feels like being perceived as highly intelligent is more important and I do feel more anxious about people not thinking I’m smart. I also have status-anxiety in situations with heavily-credentialed academics but less so and I suspect that as a community cares more about credentials compared to being generally intelligent, I would have less status-anxiety in that community (even though I am at a lower percentile for credentials than intelligence) because of having less of my self-worth tied up in how credentialed people perceive me to be.
I also think it is interesting to note how noticing and subconsciously caring about social status is not a constant. I notice myself acting more status-blind when I am feeling more secure.
The Art of Gathering book is useful for folks who organise events. The suggestions line up with my experiences of the best gatherings (workshops, retreats, hangouts, etc.) I have attended. You can get most of the value from reading a summary. Here’s mine:
decide why you’re really gathering
think less about the what and more about the why
commit to a bold, sharp, specific purpose
close doors
be willing to exclude people
don’t be a chill host
act with generous authority instead of following a hands-off approach to hosting
make guests do something even if this makes them slightly uncomfortable at times
create a temporary alternate world
design your gathering as a world that will only exist once
have explicit rules
never start a funeral with logistics
prime your guests and honour them on arrival
start with a gesture or activity that encapsulates the gathering’s purpose
keep your best self out of my gathering
encourage guests to share honestly and authentically (and lead by example)
cause good controversy
accept that there is an end
end gatherings well instead of letting them fizzle out
The volunteers’ dilemma game models a situation in which each player can either make a small sacrifice that benefits everybody or instead wait in the hope of benefiting from someone else’s sacrifice.
An example could be: you see a person dying and can decide whether to call an ambulance or not. You prefer that someone else call but if no one else would, you would strongly prefer calling over not calling and the person being dead. So if it is just you watching the person die, you would call the ambulance given these payoffs.
There are as many pure strategy Nash equilibria in this game as there are players, a “player x calls ambulance, everyone else does not” for every x. There’s also a symmetric mixed strategy Nash equilibrium where every player has the same probability p of calling the police.
The fun part is that as the number of bystanders goes up, not only does your own probability of calling go down but also at equilibrium, the combined probability that anyone at all would call the ambulance also goes down. The person is more likely to die if there are more observers around, assuming everyone is playing optimally.
One implication of this is that if you have a team of people, unless you try to assign specific individuals to take charge of specific tasks, you might end up in situations where the probability of tasks happening at all decreases as you add more people (everyone feels like it’s less their responsibility to take care of any particular ball).
Rational players never wish to have less information. Strategic options are provably non-negative in value in standard games.
Against opponents who model the player, the player may wish their opponents do not believe they have these options or information. For more powerful modelers as opponents, the best way for the opponents not to know is to actually not have those capabilities. But the value is through opponent manipulation, not direct advantage of fewer options or less knowledge.
It’s important to keep this in mind, in order to avoid over-valuing ignorance. It’s very rarely the best way to manipulate your opponents in the real world.