Zvi comments on The Rocket Alignment Problem

Zvi 16 Jan 2020 13:20 UTC
7 points
Big fan of this but, like most of us, I knew all this already. What I want to know is, how effective is/was this when not preaching to the choir? What happens when someone who doesn’t understand MIRI’s mission starts to read this? I’d like to think it helps them grok what is going on reasonably often, but I could be fooling myself, and that question is ultimately the test of how vital this really is.
- Ben Pace 16 Jan 2020 17:32 UTC
  12 points
  Parent
  Huh, I’m surprised to hear you say you already knew it. I did not know this already. This is the post where I properly understood that Eliezer et al are interested in decision theory and tiling agents and so on, not because they’re direct failures that they expect of future systems, but because they highlight confusions that are in want of basic theory to describe them, and that this basic theory will hopefully help make AGI alignable. Like I think I’d heard the words once or twice before then, but I didn’t really get it.
  (Its important that Embedded Agenyou came out too, which was entirely framed around this “list of confusions in want of better concepts / basic theory” so I has some more concrete things to pin this to.)
  - Raemon 16 Jan 2020 19:48 UTC
    4 points
    Parent
    FYI I also didn’t learn much from this post. (But, the places I did learn it from were random comments buried in threads that didn’t make it easy for people to learn)
    - Ben Pace 16 Jan 2020 19:59 UTC
      10 points
      Parent
      Fair, but I expect I’ve also read those comments buried in random threads. Like, Nate said it here three years ago on the EA Forum.
      The main case for [the problems we tackle in MIRI’s agent foundations research] is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., ‘developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it’s likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems’.
      I have a mental model of directly working on problems. But before Eliezer’s post, I didn’t have an alternative mental model to move probability mass toward. I just funnelled probability mass away from “MIRI is working on direct problems they foresee in AI systems” to “I don’t understand why MIRI is doing what it’s doing”. Nowadays I have a clearer pointer to what technical research looks like when you’re trying to get less confused and get better concepts.
      This sounds weirdly dumb to say in retrospect, because ‘get less confused and get better concepts’ is one of the primary ways I think about trying to understand the world these days. I guess the general concepts have permeated a lot of LW/rationality discussion. But at the time I guess I had a concept shaped whole in my discussion of AI alignment research, and after reading this post I had a much clearer sense of that concept.
- Liam Donovan 20 Jan 2020 20:38 UTC
  8 points
  0
  Parent
  For what it’s worth, I was just learning about the basics of MIRI’s research when this came out, and reading it made me less convinced of the value of MIRI’s research agenda. That’s not necessarily a major problem, since the expected change in belief after encountering a given post should be 0, and I already had a lot of trust in MIRI. However, I found this post by Jessica Taylor vastly clearer and more persuasive (it was written before “Rocket Alignment”, but I read “Rocket Alignment” first). In particular, I would expect AI researchers to be much more competent than the portrayal of spaceplane engineers in the post, and it wasn’t clear to me why the analogy should be strong Bayesian evidence for MIRI being correct.