David Scott Krueger (formerly: capybaralet) comments on AI Alignment Open Thread August 2019

David Scott Krueger (formerly: capybaralet) 9 Aug 2019 16:00 UTC
LW: 5 AF: 3
AF
Interesting. Your crux seems good; I think it’s a crux for us. I expect things play out more like Eliezer predicts here: https://www.facebook.com/jefftk/posts/886930452142?comment_id=886983450932&comment_tracking=%7B%22tn%22%3A%22R%22%7D&hc_location=ufi
I also predict that there will be types of failure we will not notice, or will misinterpret. It seems fairly likely to me proto-AGI (i.e. AI that could autonomously learn to become AGI within <~10yrs of acting in the real world) is deployed and creates proto-AGI subagents, some of which we don’t become aware of (e.g. because accidental/incidental/deliberate steganography) and/or are unable to keep track of. And then those continue to survive and reproduce, etc… I guess this only seems plausible if the proto-AGI has a hospitable environment (like the internet, human brains/memes) and/or means of reproduction in the real world.
A very similar problem would be a form of longer-term “seeding”, where an AI (at any stage) with a sufficiently advanced model of the world and long horizons discovers strategies for increasing the chances (“at the margin”) that its values dominate in the long-term future. With my limited knowledge of physics, I imagine there might be ways of doing this just by beaming signals into space in a way calculated to influence/spur the development of life/culture in other parts of the galaxy.
I notice a lot of what I said above makes less sense if you think of AIs as having a similar skill profile to humans, but I think we agree that AIs might be much more advanced than people in some respects while still falling short of AGI because of weaknesses in other areas.
That observation also cuts against the argument you make about warning signs, I think, as it suggests that we might significantly underestimate an AIs (e.g. vastly superhuman) skill in some areas, if it still fails at some things we think are easy. To pull an example (not meant to be realistic) out of a hat: we might have AIs that can’t carry on a conversations, but can implement a very sophisticated covert world domination strategy.
- Aleksi Liimatainen 9 Aug 2019 16:22 UTC
  3 points
  Parent
  It seems fairly likely to me proto-AGI (i.e. AI that could autonomously learn to become AGI within <~10yrs of acting in the real world) is deployed and creates proto-AGI subagents, some of which we don’t become aware of (e.g. because accidental/incidental/deliberate steganography) and/or are unable to keep track of. And then those continue to survive and reproduce, etc…
  Now I’m wondering if it makes sense to model past or present cognitive-cultural information processes in a similar fashion. Memetic and cultural evolutions are a thing and any agentlike processes that spawn could piggypack on our existing general intelligence architecture.
  - David Scott Krueger (formerly: capybaralet) 15 Aug 2019 4:33 UTC
    3 points
    Parent
    Yeah, I think it totally does! (and that’s a very interesting / “trippy” line of thought :D)
    However, it does seem to me somewhat unlikely, since it does require fairly advanced intelligence, and I don’t think evolution is likely to have produced such advanced intelligence with us being totally unaware, whereas I think something about the way we train AI is more strongly selecting for “savant-like” intelligence, which is sort of what I’m imagining here. I can’t think of why I have that intuition OTTMH.
- Rohin Shah 9 Aug 2019 22:08 UTC
  LW: 2 AF: 1
  AF Parent
  That observation also cuts against the argument you make about warning signs, I think, as it suggests that we might significantly underestimate an AIs (e.g. vastly superhuman) skill in some areas, if it still fails at some things we think are easy.
  Nobody denies that AI is really good at extracting patterns out of statistical data (e.g. image classification, speech-to-text, and so on), even though AI is absolutely terrible at many “easy” things. This, and the linked comment from Eliezer, seem to be drastically underselling the competence of AI researchers. (I could imagine it happening with strong enough competitive pressures though.)
  I also predict that there will be types of failure we will not notice, or will misinterpret. [...]
  All of this assumes some very good long-term planning capabilities. I expect long-term planning to be one of the last capabilities that AI systems get. If I thought they would get them early, I’d be more worried about scenarios like these.
  - David Scott Krueger (formerly: capybaralet) 15 Aug 2019 4:29 UTC
    LW: 9 AF: 3
    AF Parent
    So I don’t take EY’s post as about AI researchers’ competence, as much as their incentives and levels of rationality and paranoia. It does include significant competitive pressures, which seems realistic to me.
    I don’t think I’m underestimating AI researchers, either, but for a different reason… let me elaborate a bit: I think there are waaaaaay to many skills for us to hope to have a reasonable sense of what an AI is actually good at. By skills I’m imagining something more like options, or having accurate generalized value functions (GVFs), than tasks.
    Regarding long-term planning, I’d factor this into 2 components:
    1) having a good planning algorithm
    2) having a good world model
    I think the way long-term planning works is that you do short-term planning in a good hierarchical world model. I think AIs will have vastly superhuman planning algorithms (arguably, they already do), so the real bottleneck is the world-model.
    I don’t think its necessary to have a very “complete” world-model (i.e. enough knowledge to look smart to a person) in order to find “steganographic” long-term strategies like the ones I’m imagining.
    I also don’t think it’s even necessary to have anything that looks very much like a world-model. The AI can just have a few good GVFs.… (i.e. be some sort of savant).