Jonas Hallgren comments on High-level interpretability: detecting an AI’s objectives

Jonas Hallgren 29 Sep 2023 9:59 UTC
3 points
0
Very nice! I think work in this general direction is what is more or less needed if we want to survive.

I just wanted to probe a bit when it comes to turning these methods into governance proposals. Do you see ways of creating databases/tests for objective measurement or how do you see this being used in policy and the real world?

(Obviously, I get that understanding AI will be better for less doom, but I’m curious about your thoughts on the last implementation step)
- Paul Colognese 29 Sep 2023 11:11 UTC
  1 point
  0
  Parent
  With ideal objective detection methods, the inner alignment problem is solved (or partially solved in the case of non-ideal objective detection methods), and governance would be needed to regulate which objectives are allowed to be instilled in an AI (i.e., government does something like outer alignment regulation).
  Ideal objective oversight essentially allows an overseer instill whatever objectives it wants the AI to have. Therefore, if the overseer includes the government, the government can influence whatever target outcomes the AI pursues.
  So practically, this means that the governance policies would require the government to have access to the objective detection method results, directly or indirectly through the AI labs.