Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter.
The improvements to the newsletter continue! Rob Miles has generously volunteered to make the Alignment Newsletter Podcast. Chances are that the podcast will trail a week behind the emails, unless I manage to get my act together and give Rob a preview of the newsletter in advance.
Highlights
Standards for AI Governance: International Standards to Enable Global Coordination in AI Research & Development (Peter Cihon): This technical report argues that we can have an outsized impact on the future of AI by influencing standards on AI so that they help ensure that AI systems are safe and beneficial, in addition to making the deployment of AI more efficient. A standard here could be a product like Tensorflow or Gym, or a process like this list. It’s particularly useful to focus on international standards: since corporations can simply leave the country to escape national regulations, there is a race to the bottom on the stringency of national standards, and so they can’t effect as much change.
It may be particularly valuable to influence existing organizations that set standards because they are very responsive to expert opinion. It is also possible to develop a standard privately, and then “convert” it into an international standard. (This happened with the C programming language and the PDF file format.) Such influence can be used to change the culture around AI development, e.g. to put safety more at the forefront.
Rohin’s opinion: I would guess that the most influential standards are “network standards” like Tensorflow: they make it easier for everyone to develop AI systems. However, the benefit here is in having any standard at all, and so it seems unlikely that such standards could also effect a change in culture that’s unrelated to the efficiency aspect of the standard. That said, the report convinced me that “enforced standards” are also impactful: even if the standard requires active enforcement to prevent organizations from ignoring it, organizations will often choose to comply with the standard in order to get a certification that builds consumer trust in them.
Regulatory Markets for AI Safety (Jack Clark et al): This paper presents an idea on how AI could be regulated: by the introduction of a market of private regulators that themselves are regulated by the government. Companies would be required by law to purchase regulatory services, but could choose which regulator they purchase from. The regulators compete to attract companies, but are all required to meet goals set by the government.
The key benefit of such an approach is that the government now only needs to set goals for regulation (e.g. for self-driving cars, a limit on the rate of accidents) while offloading to private regulators the regulations on processes (e.g. required adversarial training on the vision models employed in self-driving cars). This relieves the burden on government, which is currently too slow-moving to effectively regulate AI. It gets the best of both worlds: as with government regulation, it can optimize for the public good, and as with tech self-regulation, it can have best practices emerge from the researchers who know best (since they can build their own regulatory startups).
Of course, for this to work, it is crucial that the private regulators avoid regulatory capture, and that the market for regulators is competitive and independent.
Rohin’s opinion: This seems very related to the notion of an “enforced standard” in the previous paper, though here it is only necessary to enforce a goal across everyone, and the details of processes can vary across regulators. I especially like the scenario in which regulators emerge “bottom-up” from researchers thinking about potential problems with AI, though I’m not sure how likely it is.
With both this and the previous paper, I can see how they would apply to e.g. self-driving cars and adversarial robustness, but it’s less clear to me how such an approach can help with AI alignment. If we believe that alignment is really hard and we only get one shot at it, then it seems especially difficult to have legible regulations that ensure, without any testing, that we don’t build a misaligned superintelligent AI. Alternatively, if we believe that we will have lots of non-catastrophic experience with aligning AI systems, and can iterate on our processes, then it seems more likely that we could develop useful, legible regulations. (I am more inclined to believe this latter scenario, based on CAIS (AN #40) and other intuitions.) Even in this scenario I don’t yet know what regulations I would place, but it seems likely that with more experience we would be able to develop such regulations.
Technical AI alignment
Technical agendas and prioritization
Overview of AGI Safety Research Agendas (Rohin Shah): The video from my talk at the Beneficial AGI conference has just been released. In this talk, I cover five broad safety-related areas that people are investing: understanding the future of AI (embedded agency (AN #31), CAIS (AN #40)), limiting the influence of an AI system (boxing (AN #54), impact regularization methods (AN #49)), robustness (verification (AN #19), red teaming), helpful AI systems (ambitious value learning (AN #31), preference learning, Cooperative IRL, corrigibility (AN #35), factored cognition (AN #36), iterated amplification, debate (AN #5)) and interpretability (AN #49). My podcast (AN #54) covers almost all of this and more, so you may want to listen to that instead.
Preventing bad behavior
Self-confirming predictions can be arbitrarily bad and Oracles, sequence predictors, and self-confirming predictions (Stuart Armstrong): Let’s consider an oracle AI system tasked with accurate prediction, with a strong enough world model that it could understand how its prediction will affect the world. In that case, “accurate prediction” means giving a prediction P such that the world ends up satisfying P, given the knowledge that prediction P was made. There need not be a single correct prediction—there could be no correct prediction (imagine predicting what I will say given that I commit to saying something different from what you predict), or there could be many correct predictions (imagine instead that I commit to say whatever you predict). These self-confirming predictions could be arbitrarily bad.
Part of the point of oracles was to have AI systems that don’t try to affect the world, but now the AI system will learn to manipulate us via predictions such that the predictions come true. Imagine for example the self-confirming prediction where the oracle predicts zero profit for a company, which causes the company to shut down.
In order to fix this, we could have counterfactual oracles, which predict what would have happened in a counterfactual where the prediction couldn’t affect the world. In particular, we ask the oracle to predict the future given that the prediction will immediately be erased and never be read by anyone. We can also use this to tell how much the prediction can affect us, by looking at the difference between the unconditional prediction and the prediction conditioned on erasure.
Read more: Good and safe uses of AI Oracles
AI strategy and policy
Google’s brand-new AI ethics board is already falling apart (Kelsey Piper): Google announced an ethical advisory council, that quickly became controversial, and was then cancelled. The author makes the point that the council was not well-placed to actually advise on ethics—it would only meet four times a year, and could only give recommendations. This committee, and others at Facebook and Microsoft, seem to be more about PR and less about AI ethics. Instead, an AI ethics council should include both insiders and outsiders, should be able to make formal, specific, detailed recommendations, and would publicly announce whether the recommendations were followed. Key quote: “The brouhaha has convinced me that Google needs an AI ethics board quite badly — but not the kind it seems to want to try to build.”
In a tweetstorm, the author holds OpenAI up as a large organization that is at least trying to engage deeply with AI ethics, as evidenced by their safety and policy team, their charter (AN #2), GPT-2 (AN #46). They make public, contentful statements that are weird, controversial and seem bad from a PR perspective. The arguments they make and hear about AI ethics and policy lead to real decisions with consequences.
Rohin’s opinion: I broadly agree with this article—I can’t imagine how a council that meets four times a year could properly provide advice on Google’s AI projects. I’m not sure if the solution is more powerful and intensive ethics councils whose primary power is public accountability. I expect that making good decisions about AI ethics requires either a technical background, or a long, detailed conversation with a person with that background, neither of which are possible with the public. This could mean that an ethics board could struggle to raise a legitimate issue, or that they could cause outrage about an issue that is upon closer examination not an issue at all. I would feel better about a board with some more formal power, such as the ability to create investigations that could lead to fines, the ability to sue Google, specific whistleblowing affordances, etc. (I have no idea how feasible any of those suggestions are, even assuming Google was okay with them.)
On the tweetstorm about OpenAI, I’m not sure if I’ve said it before in this newsletter, but I generally trust OpenAI to be trying to do the right thing, and this is one of the reasons for that. Of course, I also know and trust many people who work there.
Rationally Speaking #231 - Helen Toner on “Misconceptions about China and artificial intelligence” (Julia Galef and Helen Toner): In this podcast Helen talks about AI policy, China, and the Center for Security and Emerging Technology, where she is the director of strategy. Some of her opinions that stood out to me:
While Baidu is a huge tech company and is the main search engine, it’s a bit misleading to call it the Google of China, since it doesn’t have the same diversity of products that Google does.
While the social credit score story seems overblown, the reporting on the Uighur situation seems to be basically accurate.
Based on a very small sample of AI researchers in China, it seems like Chinese researchers are less interested in thinking about the real-world effects of the technology they’re building, relative to Western researchers.
Since people in government have so little time to think about so many issues, they have simple versions of important ideas. For example, it’s easy to conclude that China must have an intrinsic advantage at data since they have more people and fewer privacy controls. However, there’s a lot of nuance: for example, most of the Internet is in English, which seems like a big advantage for the US.
The incentives in China can be quite different: in at least one case, a chemistry professor’s salary depended on the number of papers published.
A particularly interesting question: “how does it help the US geopolitically if an American company is developing powerful AI?”
When Is It Appropriate to Publish High-Stakes AI Research? (Claire Leibowicz et al): Following the GPT-2 controversy (AN #46), the Partnership on AI held a dinner with OpenAI and other members of the AI community to discuss the tension between the norm of openness and the desire to mitigate potential unintended consequences and misuse risks of AI research. The post discusses some of the relevant considerations, and highlights a key conclusion: while there is not yet a consensus on on review norms for AI research, there is a consensus that whatever the review norms are, they should be standardized across the AI community.
Rohin’s opinion: I definitely agree that having everyone follow the same review norms is important: it doesn’t do much good to hold back from publishing something problematic if a different group will publish all of the details a few weeks later. However, getting everyone to agree on a change to the existing norms seems incredibly hard to do, though it might be feasible if it was limited to only the largest actors who can engage deeply in the debate of what these norms should be.
Other progress in AI
Unsupervised learning
Unsupervised learning: the curious pupil (Alexander Graves et al) (summarized by Cody): A high-level but well-written explanation of why many believe unsupervised learning will be key to achieving general intelligence, touching on the approaches of GANs and autoregressive models as examples.
Cody’s opinion: This is a clean, clear summary, but one without any real technical depth or detail; this would be a good writeup to hand someone without any machine learning background who wanted to get an intuitive grasp for unsupervised learning as a field.
Evaluating the Unsupervised Learning of Disentangled Representations (Olivier Bachem) (summarized by Cody): This blog post and paper describe a Google-scale comparative study of different representation learning methods designed to learn “disentangled” representations, where the axes of the representation are aligned with the true underlying factors generating the data. The paper’s claims are a sobering result for the field, both theoretically and empirically. Theoretically, they show that in an unsupervised context, it’s not possible to find a disentangled representation without embedding some form of inductive bias into your model. Empirically, they present evidence suggesting that variation between random seeds for a given hyperparameter setting (in particular, regularization strength) matters as much or more than variation between that hyperparameter’s values. Finally, they run experiments that call into question whether disentangled representations actually support transfer learning, or can be identified as in fact being disentangled without using a metric that relies on having ground truth factors of variation to begin with, making it difficult to evaluate on the many realistic contexts where these aren’t available.
Cody’s opinion: This strikes me as a really valuable injection of empirical realism, of the kind that tends to be good for research fields to have periodically, even if it can be a bit painful or frustrating. I appreciate in particular the effort and clarity that this paper puts into articulating the implicit assumptions of how disentanglement can be used or evaluated, and trying to test those assumptions under more real-world settings, such as the one where you don’t have any ground truth factors of variation, since the real world doesn’t tend to just hand out the Correct factorized model of itself.
Robots that Learn to Use Improvised Tools (Annie Xie et al)
Re Regulatory markets for AI safety: You say that the proposal doesn’t seem likely to work if “alignment is really hard and we only get one shot at it” (i.e. unbounded maximiser with discontinuous takeoff). Do you expect that status-quo government regulation would do any better, or just that any regulation wouldn’t be helpful in such a scenario? My intuition is that even if alignment is really hard, regulation could be helpful e.g. by reducing races to the bottom, and I’d rather have a more informed group (like people from a policy and technical safety team at a top lab) implementing it instead of a less-informed government agency. I’m also not sure what you mean by legible regulation.
I agree that regulation could be helpful by reducing races to the bottom; I think what I was getting at here (which I might be wrong about, as it was several months ago) was that it is hard to build regulations that directly attack the technical problem. Consider for example the case for car manufacturing. You could have two types of regulations:
Regulations that provide direct evidence of safety: For example, you could require that all car designs be put through a battery of safety tests, e.g. crashing them into a wall and ensuring that the airbags deploy.
Regulations that provide evidence of thinking about safety: For example, you could require that all car designs have at least 5 person-years of safety analysis done by people with a degree in Automotive Safety (which is probably not an actual field but in theory could be one).
Iirc, the regulatory markets paper seemed to have most of its optimism on the first kind of regulation, or at least that’s how I interpreted it. That kind of regulation seems particularly hard in the one-shot alignment case. The second kind of regulation seems much more possible to do in all scenarios, and preventing races to the bottom is an example of that kind of regulation.
I’m not sure what I meant by legible regulation—probably I was just emphasizing the fact that for regulations to be good, they need to be sufficiently clear and understood by companies so that they can actually be in compliance with them. Again, for regulations of the first kind this seems pretty hard to do.