I think the existing approach and easy improvements don’t seem like they can capture many important incentives such that you don’t want to use it as an actual assurance (e.g. suppose that agent A is predicting the world and agent B is optimizing A’s predictions about B’s actions—then we want to say that the system has an incentive to manipulate the world but it doesn’t seem like that is easy to incorporate into this kind of formalism).
This is what multi-agent incentives are for (i.e. incentive analysis in multi-agent CIDs). We’re still working on these as there are a range of subtleties, but I’m pretty confident we’ll have a good account of it.
This is what multi-agent incentives are for (i.e. incentive analysis in multi-agent CIDs). We’re still working on these as there are a range of subtleties, but I’m pretty confident we’ll have a good account of it.