AI Safety Sphere

Research in AI Safety is typically shared through two main channels: blog posts and academic publications. Academic publications are reliable, in that they present peer-reviewed information which is generally accurate. However, they lack the community and quick brainstorming which blog posting allows for. Conversely, blogging, while dynamic, often suffers from an overload of repetitive and sometimes unreliable ideas. From my observations, both of these mediums however, blog posting more than academia, suffer from one main issue. The scatteredness of ideas. Although people can link or reference to other works, connectivity is not fundamental to the framework. One cannot ‘zoom out’ from a post to see how it fits into the broader landscape. It requires someone to be extremely knowledgeable to be able to easily understand how new information or research is relevant to the overarching field. Academia has tried to solve this issue by encouraging writers to explain more broadly how their research is relevant and what gap it is trying to fill.[1] This only puts a bandaid on the problem, and becomes a harder and harder task the more technical or niche a topic is. As Eliezer Yudkowsky writes in “Expecting Short Inferential Distances”:

Oh, and you’d better not drop any hints that you think you’re working a dozen inferential steps away from what the audience knows, or that you think you have special background knowledge not available to them. The audience doesn’t know anything about an evolutionary-psychological argument for a cognitive bias to underestimate inferential distances leading to traffic jams in communication. They’ll just think you’re condescending.[2]

Because the system lacks a compelling way of connecting ideas, the work is offloaded to people willing to distill the information for a broader audience. One notable and impressive attempt at this is the future of life’s landscape map of AI safety research.[3] As impressive as it is, it suffers from the fact it is a static document. There is no easy way to manage community disagreement around how AI safety should be broken down (lacking consensus is something the AI safety community is particularly good at, for better or for worse). Nor is the document being constantly updated when new research comes out. With any new paper or blog published, the audience is still forced to learn about the landscape before they can make sense of the information.

My solution is a collaborative website which puts the connecting of ideas to at the core of how research is shared and developed. Instead of a long list of blog posts, why not have a collaborative map showing where new research is needed and all of the inferential steps needed to understand any author’s perspective/​priors.

Problem Solving Ideology:

One way to frame research is through the lens of problem solving (e.g. the alignment problem, or the problem of interpretability). And one helpful way to frame problems is from the perspective of a goal in which we agents strive to grasp how to complete. Once we have gained a solid understanding of how to complete said goal, we consider the problem solved. Here is an example:

  • Problem: I need milk but don’t have it (Goal: acquire milk)

  • Proposed solution: Go to the grocery store and get milk

Even though I haven’t necessarily gone out to get milk, I consider this problem to be solved. As in, I don’t need to think too much more about how to achieve my goal, I just have to act on my proposed solution. By framing problems in terms of an objective, which is preferably measurable, it helps people ideate solutions and frame their thinking.

However, when big problems need solving, it is often easier to break the problem down into smaller, more manageable subproblems. How one decides to break down a big problem and why, could be encapsulated in a strategy (instead of a solution).

Problem broken down into subproblems through a tree

A strategy breaks down a problem in a way where if all of its sub-problems are solved, then the parent problem gets solved. For a developed strategy to have been considered successful, it must:

  1. Break down the parent problem into a definitive set of subproblems, which if solved, would in turn solve the parent problem. (As defined by some logic)

  2. Define subproblems which are all simpler or very likely easier to solve/​think about than the parent problem. They don’t have to be collectively easier, just individually.

You can imagine how, as the tree gets deeper, problems become easier and easier to solve. Hopefully to a point where they don’t need to be broken down further and can simply be solved out right. If all of the problem leaves of a tree get solved, then the solving of problems cascades up the tree solving the big problem.

As this is a collaborative tool, people are likely to disagree on how a problem should be broken down. After all, a problem can be broken down in any number of ways. The solution here is to allow people to propose as many different strategies as they want (In the tree page you can navigate between strategies with arrow buttons). Thus, the system still thrives in the face of disagreement. Another issue is redundancy. Strategies, particularly those to the same problem, are likely to have overlap in terms of what problems they decide on. To solve this, people would be allowed to link to any existing subproblem in the tree, if it fits within their strategy.

The fact that one can easily see all of the existing strategies being used to solve any given problem is a particular strength of this system. The author of a paper doesn’t exactly have a big incentive to properly explain all of the other existing solutions to the problem they are trying to solve.

Another key feature of the website is the ability for other people to make inline suggestions to a document. On a similar note, although it is not developed yet, is the ability for people to comment and discuss the node in a comment section. Many of the other features of blog sites, such as voting and karma could also be implemented. Voting in particular will be necessary for ranking what order strategies will be shown.

If you want to try it out for yourself, this is the link: https://​​aisafetysphere.com/​​.

How you can help:

  1. Try out the site and give feedback

  2. I am in need of people to help further develop the site. Email me at aisafetysphere@gmail.com if you are interested.