tom4everitt comments on Modeling AGI Safety Frameworks with Causal Influence Diagrams

tom4everitt 28 Jun 2019 10:36 UTC
LW: 6 AF: 4
AF
Hey Charlie,
Thanks for your comment! Some replies:
sometimes one makes different choices in how to chop an AI’s operation up into causally linked boxes, which can lead to an apples-and-oranges problem when comparing diagrams (for example, the diagrams you use for CIRL and IDI are very different choppings-up of the algorithms)
There is definitely a modeling choice involved in choosing how much “to pack” in each node. Indeed, most of the diagrams have been through a few iterations of splitting and combining nodes. The aim has been to focus on the key dynamics of each framework.
As for the CIRL and IDA difference, this is a direct effect of the different levels the frameworks are specified at. CIRL is a high-level framework, roughly saying “somehow you infer the human preferences from their actions”. IDA, in contrast, provides a reasonably detailed supervised learning criteria. So I think the frameworks themselves are already like apples and oranges, it’s not just the diagrams. (And drawing the diagrams, this is something you notice.)
But I am skeptical that there’s a one-size-fits-all solution, and instead think that diagram usage should be tailored to the particular point it’s intended to make.
We don’t want to claim the CIDs are the one-and-only diagram to always use, but as you mentioned above, they do allow for quite some flexibility in what aspects to highlight.
I actually have a draft sitting around of how one might represent value learning schemes with a hierarchical diagram of information flow.
Interesting. A while back I was looking at information flow diagram myself, and was surprised to discover how hard it was to make them formally precise (there seems to be no formal semantics for them). In contrast, causal graphs and CIDs have formal semantics, which is quite useful.
For hierarchical representations, there are networks of influence diagrams https://arxiv.org/abs/1401.3426
- Charlie Steiner 30 Jun 2019 17:35 UTC
  LW: 1 AF: 1
  AF Parent
  All good points.
  The paper you linked was interesting—the graphical model is part of an AI design that actually models other agents using that graph. That might be useful if you’re coding a simple game-playing agent, but I think you’d agree that you’re using CIDs in a more communicative / metaphorical way?