Thank you! I was thinking about counterfactual Oracles for some time and totally missed case with multiple/future counterfactal Oracles. Now I feel kinda dumb about it.
Have you considered logical counterfactuals? Something like “D: for all Oracles that have utility function U(answer|true(D)) accessible for humans answer is NaN”?
We shouldn’t consider problem “how to persuade putative Oracle to stay inside the box?” solved. If we just take powerful optimizer and tell it to optimize for truth of particular formula, in addition to self-fulfilling prophecies we can get simple old instrumental convergence where AI gather knowledge and computing resources to give the most correct possivle answer.
I have a distaste for design decisions that impair the cognitive abilities of AI, because they are unnatural and just begging to be broken. I prefer weird utility functions to weird cognitions.
I don’t believe we considered logical counterfactuals as such, but it seems to me that those would be quite comparable to the counterfactual of replacing an oracle with a simpler system.
Thank you! I was thinking about counterfactual Oracles for some time and totally missed case with multiple/future counterfactal Oracles. Now I feel kinda dumb about it.
Have you considered logical counterfactuals? Something like “D: for all Oracles that have utility function U(answer|true(D)) accessible for humans answer is NaN”?
Some scattered thoughts on the topic:
We shouldn’t consider problem “how to persuade putative Oracle to stay inside the box?” solved. If we just take powerful optimizer and tell it to optimize for truth of particular formula, in addition to self-fulfilling prophecies we can get simple old instrumental convergence where AI gather knowledge and computing resources to give the most correct possivle answer.
I have a distaste for design decisions that impair the cognitive abilities of AI, because they are unnatural and just begging to be broken. I prefer weird utility functions to weird cognitions.
I don’t believe we considered logical counterfactuals as such, but it seems to me that those would be quite comparable to the counterfactual of replacing an oracle with a simpler system.