I still don’t see it, sorry. If I think of deep learning as an approximation of some kind of simplicity prior + updating on empirical evidence, I’m not very surprised that it solves the capacity allocation problem and learns a productive model of the world. [1] The prize is that the simplicity prior doesn’t necessarily get rid of scheming. The big extra challenge for heuristic explanations is that you need to do the same capacity allocation in a way that scheming reliably gets explained (even though it’s not relevant for the model’s performance and doesn’t make things classically simpler), while no capacity is spent on explaining other phenomena that are not relevant for the model’s performance. I still don’t see at all how we can get the the non-malign prior that can do that.
- ^
Though I’m still very surprised that it works in practice.
If you don’t believe in your work, consider looking for other options
I spent 15 months working for ARC Theory. I recently wrote up why I don’t believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC’s research direction is fundamentally unsound, or I’m still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it’s pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining readers asking the very fair question: “If you think the agenda is so doomed, why did you keep working on it?”[1]
In my first post, I write: “Unfortunately, by the time I left ARC, I became very skeptical of the viability of their agenda.”This is not quite true. I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer. Either they manage to convince me that the agenda is sound, or I demonstrate that it doesn’t work, in which case I free up the labor of the group of smart people working on the agenda. I think this was initially a somewhat reasonable position, though it was already in large part motivated reasoning.
But half a year after joining, I don’t think this theory of change was very tenable anymore. It was becoming clear that our arguments were going in circles. I couldn’t convince Paul and Mark (the two people thinking the most about the big picture questions), nor could they convince me. Eight months in, two friends visited me in California, and they noticed that I always derailed the conversation when they asked me about my research. I think that should have been an important thing to notice that I was ashamed to talk about my research to my friends, because I was afraid they would see how crazy it was. I should have quit then, but I stayed for another seven months.
I think this was largely due to cowardice. I’m very bad at coding and all my previous attempts at upskilling in coding went badly.[2] I thought of my main skill as being a mathematician, and I wanted to keep working on AI safety. The few other places one can work as a mathematician in AI safety looked even less promising to me than ARC. I was afraid that if I quit, I wouldn’t find anything else to do.
In retrospect, this fear was unfounded. I realized there were other skills one can develop, not just coding. In my afternoons, I started reading a lot more papers and serious blog posts [3] from various branches of AI safety. After a few months, I felt I had much more context on many topics. I started to think more about what I can do with my non-mathematical skills. When I finally started applying for jobs, I got an offer from the European AI Office and UKAISI, and it looked more likely than not that I would get an offer from Redwood. [4]
Other options I considered that looked less promising than the three above, but still better than staying at ARC: Team up with some Hungarian coder friends and execute some simple but interesting experiments I had vague plans for. [5] Assemble a good curriculum for the prosaic AI safety agendas that I like. Apply for a grant-maker job. Become a Joe Carlsmith-style general investigator. Try to become a journalist or an influential blogger. Work on crazy acausal trade stuff.
I still think many of these were good opportunities, and probably there are many others. Of course, different options are good for people with different skill profiles, but I really believe that the world is ripe with opportunities to be useful for people who are generally smart and reasonable and have enough context on AI safety. If you are working on AI safety but don’t really believe that your day-to-day job is going anywhere, remember that having context and being ingrained in the AI safety field is a great asset in itself,[6] and consider looking for other projects to work on.
(Important note: ARC was a very good workplace, my coworkers were very nice to me and receptive to my doubts, and I really enjoyed working there except for feeling guilty that my work is not useful. I’m also not accusing the people who continue working at ARC of being cowards in the way I have been. They just have a different assessment of ARC’s chances, or work on lower-level questions than I have, where it can be reasonable to just defer to others on the higher-level questions.)
(As an employee of the European AI Office, it’s important for me to emphasize this point: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of the European Commission or other EU institutions.)
No, really, it felt very bad writing the posts. It felt like describing how I worked for a year on a scheme that was either trying to build perpetuum mobile machines, or trying to build normal cars, I just missed the fact that gasoline exists. Embarrassing either way.
I don’t know why. People keep telling me that it should be easy to upskill, but for some reason it is not.
I particularly recommend Redwood’s blog.
We didn’t fully finish the work trial as I decided that the EU job was better.
Think of things in the style of some of Owain Evans’ papers or experiments on faithful chain of thought.
And having more context and knowledge is relatively easy to further improve by reading for a few months. It’s a young field.