niplav comments on The Main Sources of AI Risk?

niplav 19 Mar 2023 21:32 UTC
4 points
Proposal: AI systems correctly learn human values, but then change their world-model/ontology but don’t port the values to that ontology (or do so incorrectly). See Rescuing the utility function, Ontology identification problem: Main, Ontology identification problem: Technical tutorial:

Intuitively, of course, we’d like AIXI-atomic to discover the composition of nuclei, shift its models to use nuclear physics, and refine the ‘carbon atoms’ mentioned in its utility function to mean ‘atoms with nuclei containing six protons’.

But we didn’t actually specify that when constructing the agent (and saying how to do it in general is, so far as we know, hard; in fact it’s the whole ontology identification problem). We constrained the hypothesis space to contain only universes running on the classical physics that the programmers knew about. So what happens instead?

Probably the ‘simplest atomic hypothesis that fits the facts’ will be an enormous atom-based computer, simulating nuclear physics and quantum physics in order to create a simulated non-classical universe whose outputs are ultimately hooked up to AIXI’s webcam. From our perspective this hypothesis seems silly, but if you restrict the hypothesis space to only classical atomic universes, that’s what ends up being the computationally simplest hypothesis that predicts, in detail, the results of nuclear and quantum experiments.