Mitchell_Porter comments on Formal Metaethics and Metasemantics for AI Alignment

Mitchell_Porter 10 Oct 2019 6:54 UTC
3 points
“If you or anyone else could point to a specific function in my code that we don’t know how to compute, I’d be very interested to hear that.”
From the comments in main():
“Given a set of brain models, associate them with the decision algorithms they implement.”
“Then map each brain to its rational self’s values (understood extensionally i.e. cashing out the meaning of their mental concepts in terms of the world events they refer to).”
Are you assuming that you have whole brain emulations of a few mature human beings? And then the “decision algorithms” and “rational… values” are defined in terms of how those emulations respond to various sequences of inputs?
- June Ku 10 Oct 2019 8:46 UTC
  3 points
  Parent
  Yeah, more or less. In the abstract, I “suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available.” I’ve tended to imagine this as an oracle that just has a causal model of the actual world and the brains in it. But whole brain emulations would likely also suffice.
  In the code, the causal models of the world and brains in it would be passed as parameters to the metaethical_ai_u function in main. The world w and each element of the set bs would be an instance of the causal_markov_model class.
  Each brain gets associated with an instance of the decision_algorithm class by calling the class function implemented_by. A decision algorithm models the brain in higher level concepts like credences and preferences as opposed to bare causal states. And yeah, in determining both the decision algorithm implemented by a brain and its rational values, we look at their responses to all possible inputs.
  For implementation, we aim for isomorphic, coherent, instrumentally rational and parsimonious explanations. For rational values, we aggregate the values of possible continuations weighting more heavily those that better satisfied the agent’s own higher-order decision criteria without introducing too much unrelated distortion of values.