Thanks for reading!
Yes, you can think of it as having a non-corrigible complicated utility function. The relevant utility function is the ‘aggregated utilities’ defined in section 2. I think ‘corrigible’ vs ‘non-corrigible’ is slightly verbal, since it depends on how you define ‘utility’, but the non-verbal question is whether the resulting AI is safer.
Good idea, this is on my agenda!
Looking forward to reading up on geometric rationality in detail. On a quick first pass, looks like geometric rationality is a bit different because it involves deviating from axioms of VNM rationality by using random sampling. By contrast, utility aggregation is consistent with VNM rationality, because it just replaces the ordinary utility function with aggregated utility
Thanks for taking the time to work through this carefully! I’m looking forward to reading and engaging with the articles you’ve linked to. I’ll make sure to implement the specific description-improvement suggestions in final draft
I wish I had more to say about the effort metric! So far, the only thing concrete ideas I’ve come up with are (i) measure how much compute each action performs; or (ii) decompose each action into a series of basic actions, measure the number of basic actions necessary to perform the action. But both ideas are sketchy.