Huh, I’m surprised to hear you say you already knew it. I did not know this already. This is the post where I properly understood that Eliezer et al are interested in decision theory and tiling agents and so on, not because they’re direct failures that they expect of future systems, but because they highlight confusions that are in want of basic theory to describe them, and that this basic theory will hopefully help make AGI alignable. Like I think I’d heard the words once or twice before then, but I didn’t really get it.
(Its important that Embedded Agenyou came out too, which was entirely framed around this “list of confusions in want of better concepts / basic theory” so I has some more concrete things to pin this to.)
FYI I also didn’t learn much from this post. (But, the places I did learn it from were random comments buried in threads that didn’t make it easy for people to learn)
Fair, but I expect I’ve also read those comments buried in random threads. Like, Nate said it here three years ago on the EA Forum.
The main case for [the problems we tackle in MIRI’s agent foundations research] is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., ‘developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it’s likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems’.
I have a mental model of directly working on problems. But before Eliezer’s post, I didn’t have an alternative mental model to move probability mass toward. I just funnelled probability mass away from “MIRI is working on direct problems they foresee in AI systems” to “I don’t understand why MIRI is doing what it’s doing”. Nowadays I have a clearer pointer to what technical research looks like when you’re trying to get less confused and get better concepts.
This sounds weirdly dumb to say in retrospect, because ‘get less confused and get better concepts’ is one of the primary ways I think about trying to understand the world these days. I guess the general concepts have permeated a lot of LW/rationality discussion. But at the time I guess I had a concept shaped whole in my discussion of AI alignment research, and after reading this post I had a much clearer sense of that concept.
Huh, I’m surprised to hear you say you already knew it. I did not know this already. This is the post where I properly understood that Eliezer et al are interested in decision theory and tiling agents and so on, not because they’re direct failures that they expect of future systems, but because they highlight confusions that are in want of basic theory to describe them, and that this basic theory will hopefully help make AGI alignable. Like I think I’d heard the words once or twice before then, but I didn’t really get it.
(Its important that Embedded Agenyou came out too, which was entirely framed around this “list of confusions in want of better concepts / basic theory” so I has some more concrete things to pin this to.)
FYI I also didn’t learn much from this post. (But, the places I did learn it from were random comments buried in threads that didn’t make it easy for people to learn)
Fair, but I expect I’ve also read those comments buried in random threads. Like, Nate said it here three years ago on the EA Forum.
I have a mental model of directly working on problems. But before Eliezer’s post, I didn’t have an alternative mental model to move probability mass toward. I just funnelled probability mass away from “MIRI is working on direct problems they foresee in AI systems” to “I don’t understand why MIRI is doing what it’s doing”. Nowadays I have a clearer pointer to what technical research looks like when you’re trying to get less confused and get better concepts.
This sounds weirdly dumb to say in retrospect, because ‘get less confused and get better concepts’ is one of the primary ways I think about trying to understand the world these days. I guess the general concepts have permeated a lot of LW/rationality discussion. But at the time I guess I had a concept shaped whole in my discussion of AI alignment research, and after reading this post I had a much clearer sense of that concept.