In fact, there’s not much difference between saying “the human uses a Bayes net” and “the human uses some arbitrary function F, and we worry the AI will figure out F and then use it to lie to us”.
I think that there isn’t much difference between the two in this case—I was reading the Bayes net example as just illustration for the point that any AI sufficiently powerful to pose risk would be able to model humans with high fidelity. For that matter, I think that the AI Bayes net was also for illustration, and realistically the AI could learn other methods of reasoning about the world, maybe which include Bayes nets in some form.
Or am I actually wrong and it’s okay for a “builder” solution to assume we have access to the human Bayes net?
I think we can’t assume this naively, but that if you can figure out a competitive and trustworthy way to get this (like with AI assistance), then it’s fair game.
I think that there isn’t much difference between the two in this case—I was reading the Bayes net example as just illustration for the point that any AI sufficiently powerful to pose risk would be able to model humans with high fidelity. For that matter, I think that the AI Bayes net was also for illustration, and realistically the AI could learn other methods of reasoning about the world, maybe which include Bayes nets in some form.
I think we can’t assume this naively, but that if you can figure out a competitive and trustworthy way to get this (like with AI assistance), then it’s fair game.