Here are some Twitter accounts I’ve found useful to follow (in no particular order): Quintin Pope, Janus @repligate, Neel Nanda, Chris Olah, Jack Clark, Yo Shavit @yonashav, Oliver Habryka, Eliezer Yudkowsky, alex lawsen, David Krueger, Stella Rose Biderman, Michael Nielsen, Ajeya Cotra, Joshua Achiam, Séb Krier, Ian Hogarth, Alex Turner, Nora Belrose, Dan Hendrycks, Daniel Paleka, Lauro Langosco, Epoch AI Research, davidad, Zvi Mowshowitz, Rob Miles
Quadratic Reciprocity
If some of the project ideas are smaller, is it easier for you to handle if they’re added on to just one larger application as extras that might be worth additional funding?
Is your “alignment research experiments I wish someone would run” list shareable :)
Paul Graham’s essay on What You Can’t Say is very practical. The tests/exercises he recommends for learning true, controversial things were useful to me.
Even if trying the following tests yields statements that aren’t immediately useful, I think the act of noticing where you disagree with someone or something more powerful is good practice. I think similar mental muscles get used when noticing when you disagree or are confused about a commonly-held assumption in a research field or when noticing important ideas that others are neglecting.
The different exercises he suggests (copied or paraphrased according to how I internalised them):
The conformist test: asking yourself the classic “Do I have any opinions that I would be reluctant to express in front of a group of my peers?”
What do people get in trouble for: look out for what things other people say that get them in trouble. Ask yourself if you think that thing or some version of it is true.
Heresy: Take a label (eg: “sexist”) and try to think of some ideas that would be called that. This is useful because ideas wouldn’t come to mind in random order but the plausible ones will (plausibly) come to mind first. Then for each one, ask if it might be true.
Time and space: compare present ideas against those of different past cultures and see what you get. Also, look at ideas from other present-day cultures that differ from your own.
Prigs: The exercise is to picture someone who has seen a lot (“Imagine a kind of latter-day Conrad character who has worked for a time as a mercenary in Africa, for a time as a doctor in Nepal, for a time as the manager of a nightclub in Miami”). Imagine comparing what’s inside this guy’s head with what’s inside the head of a well-behaved sixteen-year-old girl from the suburbs. What does he think that would shock her?
Look at the mechanisms: look at how taboos are created. How do moral fashions arise and why are they adopted? What groups are powerful but nervous, and what ideas would they like to suppress? What ideas were tarnished by association when they ended up on the losing side of a recent struggle? If a self-consciously cool person wanted to differentiate himself from preceding fashions, which of their ideas would he tend to reject? What are conventional-minded people afraid of saying?
I also liked the tip that if something is being attacked at “x-ist” or “y-ic” rather than being criticised for being incorrect or false, that is a red flag. And this is the case for many things that are heretical but true.
Lastly, I think the advice on being strategic and not very openly saying things that might get you into trouble is good.
Is there an organisation that can hire independent alignment researchers who already have funding, in order to help with visas for a place that has other researchers, perhaps somewhere in the UK? Is there a need for such an organisation?
What are the most promising plans for automating alignment research as mentioned in for example OpenAI’s approach to alignment and by others?
I think there will probably be even more discussion of AI x-risk in the media in the near future. My own media consumption is quite filtered but for example, the last time I was in an Uber, the news channel on the radio mentioned Geoffrey Hinton thinking AI might kill us all. And it isn’t a distant problem for my parents the way climate change is because they use Chat-GPT and are both impressed and concerned by it. They’ll probably form thoughts on it anyway, and I’d prefer if I can be around to respond to their confusion and concerns.
It also seems plausible that there is more AI panic and anxiety amongst some fraction of the general public in the near future. And I’d prefer the people I love are eased into it rather than feeling panicked and anxious all at once and not knowing how to deal with it.
It’s also useful for me to get a pulse on how people outside my social group (which is mostly heavily filtered as well) respond to AI x-risk arguments. For example, I didn’t know before what ideas that seemed obvious to me (being more intelligent doesn’t mean you have nice values, why humans care about the things we care about, that if something much smarter than us aims to take over it will succeed quickly etc) were completely new to my parents or friends who are not rationalist-adjacent(-adjacent).
I also think being honest with people close to me is more compassionate and good but that by itself wouldn’t compel me to actively discuss AI x-risk with them.
I think it’s plausible that too much effort is going to interp at the margin
What’s the counterfactual? Do you think newer people interested in AI safety should be doing other things instead of for example attempting one of the 200+ MI problems suggested by Neel Nanda? What other things?
I’m curious about whether I should change my shortform posting behaviour in response to higher site quality standards. I currently perceive it to be an alright place to post things that are quick and not aiming to be well-written or particularly useful for others to read because it doesn’t clutter up the website the way a post or comment on other people’s posts would.
Why is aliens wanting to put us in a zoo more plausible than the AI wanting to put us in a zoo itself?
Edit: Ah, there are more aliens around so even if the average alien doesn’t care about us, it’s plausible that some of them would?
And the biggest question for me is not, is AI going to doom the world? Can I work on this in order to save the world? A lot of people expect that would be the question. That’s not at all the question. The question for me is, is there a concrete problem that I can make progress on? Because in science, it’s not sufficient for a problem to be enormously important. It has to be tractable. There has to be a way to make progress. And this was why I kept it at arm’s length for as long as I did.
I thought this was interesting. But it does feel like with this AI thing we need more people backchaining from the goal of saving humanity instead of only looking forward to see what tractable neat research questions present themselves.
One way people can help is by stating their beliefs on AI and the confidence in those beliefs to their friends, family members, and acquaintances who they talk to.
Currently, a bunch of people are coming across things in the news talking about humanity going extinct if AI progress continues as it has and no more alignment research happens. I would expect many of them to not think seriously about it because it’s really hard to shake out of the “business as usual” frame. Most of your friends and family members probably know you’re a reasonable, thoughtful person and it seems helpful to make people feel comfortable engaging with the arguments in a serious way instead of filing it away in some part of their brain that doesn’t affect their actions or predictions about the future in any way.
I have talked to my dad about how I feel very uncertain about making it to 40, that (with lots of uncertainty) I currently expect not to unless there’s coordination to slow AI development or a lot more effort towards AI alignment. He is new to this so had a bunch of questions but said he didn’t find it weird and now thinks it is scary. It was interesting noticing the inferential distance, since he initially had confusions like “If the AI gets consciousness, won’t it want to help other conscious beings?” and “It feels weird to be so against change, humanity will adapt” but I think he gets it now.
I think sharing sincerely the things you believe with more people is good.
Hopefully this isn’t too rude to say, but: I am indeed confused how you could be confused
Fwiw, I was also confused and your comment makes a lot more sense now. I think it’s just difficult to convert text into meaning sometimes.
Thanks for posting this. It’s insightful reading other people thinking through career/life planning of this type.
Am curious about how you feel about the general state of the alignment community going into the midgame. Are there things you hoped you/alignment community had more of / achievable things that could have been different by the time the early game ended that would have been nice?
“I have a crazy take that the kind of reasoning that is done in generative modeling has a bunch of things in common with the kind of reasoning that is valuable when developing algorithms for AI alignment”
Cool!!
Wow, the quoted text feels scary to read.
I have met people within effective altruism who seem to be trying to do scary, dark things to their beliefs/motivations, which feels in the same category, like trying to convince themselves they don’t care about anything besides maximising impact or reducing x-risk. The latter, in at least one case, by thinking lots about dying due to AI to start caring about it more, which can’t be good for thinking clearly in the way they described it.
From Ray Kurzweil’s predictions for 2019 (written in 1999):
On Politics and Society
People are beginning to have relationships with automated personalities as companions, teachers, caretakers, and lovers. Automated personalities are superior to humans in some ways, such as having very reliable memories and, if desired, predictable (and programmable) personalities. They are not yet regarded as equal to humans in the subtlety of their personalities, although there is disagreement on this point.
An undercurrent of concern is developing with regard to the influence of machine intelligence. There continue to be differences between human and machine intelligence, but the advantages of human intelligence are becoming more difficult to identify and articulate. Computer intelligence is thoroughly interwoven into the mechanisms of civilization and is designed to be outwardly subservient to apparent human control. On the one hand, human transactions and decisions require by law a human agent of responsibility, even if fully initiated by machine intelligence. On the other hand, few decisions are made without significant involvement and consultation with machine‐based intelligence.
Public and private spaces are routinely monitored by machine intelligence to prevent interpersonal violence. People attempt to protect their privacy with near‐unbreakable encryption technologies, but privacy continues to be a major political and social issue with each individualʹs practically every move stored in a database somewhere.
The existence of the human underclass continues as an issue. While there is sufficient prosperity to provide basic necessities (secure housing and food, among others) without significant strain to the economy, old controversies persist regarding issues of responsibility and opportunity. The issue is complicated by the growing component of most employmentʹs being concerned with the employeeʹs own learning and skill acquisition. In other words, the difference between those ʺproductivelyʺ engaged and those who are not is not always clear.
On The Arts
Virtual artists in all of the arts are emerging and are taken seriously. These cybernetic visual artists, musicians, and authors are usually affiliated with humans or organizations (which in turn are comprised of collaborations of humans and machines) that have contributed to their knowledge base and techniques. However, interest in the output of these creative machines has gone beyond the mere novelty of machines being creative.
Visual, musical, and literary art created by human artists typically involve a collaboration between human and machine intelligence.
The type of artistic and entertainment product in greatest demand (as measured by revenue generated) continues to be virtual‐experience software, which ranges from simulations of ʺrealʺ experiences to abstract environments with little or no corollary in the physical world.On Philosophy:
There are prevalent reports of computers passing the Turing Test, although these instances do not meet the criteria (with regard to the sophistication of the human judge, the length of time for the interviews, etcetera) established by knowledgeable observers. There is a consensus that computers have not yet passed a valid Turing Test, but there is growing controversy on this point.
The subjective experience of computer‐based intelligence is seriously discussed, although the rights of machine intelligence have not yet entered mainstream debate. Machine intelligence is still largely the product of a collaboration between humans and machines, and has been programmed to maintain a subservient relationship to the species that created it.
There are too many books I want to read but probably won’t get around to reading any time soon. I’m more likely to read them if there’s someone else who’s also reading it at a similar pace and I can talk to them about the book. If anyone’s interested in going through any of the following books in June and discussing it together, message me. We can decide on the format later, it could just be reading the book and collaborating on a blog post about it together, or for more textbook-like things, reading a couple of selected chapters a week and going over the difficult bits in a video call, or just having a discord server where we spontaneously post thoughts we have while reading a book (in a “thinking out loud” way).
Thinking in Systems: A Primer
Visual Complex Analysis
Nanosystems: Molecular Machinery, Manufacturing, and Computation
Adaptation and Natural Selection: A Critique of Some Current Evolutionary Thought
Expert Political Judgment: How Good Is It? How Can We Know?
Superforecasting: The Art and Science of Prediction
The Structure of Scientific Revolutions
Information Theory, Inference, and Learning Algorithms
Writing the Book of the World
Thinking Physics: Understandable Practical Reality
What Is Life? The Physical Aspect of the Living Cell
The Forces of Matter (Michael Faraday)
Explaining Social Behavior: More Nuts and Bolts for the Social Sciences
Conceptual Mathematics: A First Introduction to Categories
And probably many of the things here: https://www.lesswrong.com/posts/bjjbp5i5G8bekJuxv/study-guide. I want to get around to reading some of the books/doing some of the courses mentioned there at some point in the future, don’t particularly care about the order and so might be happy to join on whatever bit of content from there appeals to you.
I might change my mind about what things I’m most excited to read and learn, but I like the idea of pairing up with another LessWrong person to learn and discuss things so reach out if that interests you.
I don’t remember if I put down “inside view” on the form when filling it out but that does sound like the type of thing I may have done. I think I might have been overly eager at the time to say I had an “inside view” when what I really had was: confusion and disagreements with others’ methods for forecasting, weighing others’ forecasts in a mostly non-principled way, intuitions about AI progress that were maybe overly strong and as much or more based on hanging around a group of people and picking up their beliefs instead of evaluating evidence for myself. It feels really hard to not let the general vibe around me affect the process of thinking through things independently.
Based on the results, I would think more people thinking about this for themselves and writing up their reasoning or even rough intuitions would be good. I suspect my beliefs are more influenced by the people that ranked high in survey answers than I’d want them to be because it turns out people around me are deferring to the same few people. Even when I think I have my own view on something, it is very largely affected by the fact that Ajeya said 2040/2050 and Daniel Kokotajlo said 5⁄7 years, and the vibes have trickled down to me even though I would weigh their forecasts/methodology less if I were coming across it for the first time.
(The timelines question doesn’t feel that important to me for its own sake at the moment but I think it is a useful one to practise figuring out where my beliefs actually come from)
How do we get LLM human imitations?
Other podcasts that have at least some relevant episodes: Hear This Idea, Towards Data Science, The Lunar Society, The Inside View, Machine Learning Street Talk