I’ve looked over that list, but the problem is that it essentially consists of a list of items to catch you up to the state of the discussion as it was a year ago, along with a list of general mathematics texts.
I’m pretty well acquainted with mathematical logic; the main item on the list that I’m particularly weak in would be category theory, and I’m not sure why category theory is on the list. I’ve a couple of ideas about the potential use of category theory in, maybe, knowledge representation or something along those lines, but I have no clue how it could be brought to bear on the friendliness content problem.
The book list is somewhat obsolete (the list of LW posts is not), but I’m not ready to make the next iteration. The state of decision theory hasn’t changed much since then.
Roughly, the central mystery seems to be the idea of acausal control. It feels like it might even be useful for inferring friendliness content, along the lines of what I described here. But we don’t understand that idea. It first more or less explicitly appeared in UDT with its magical mathematical intuition module, and became more concrete in ADT, where proofs are used instead (at the cost of making it useless where complete proofs can’t be expected, that is almost always outside very simple thought experiments).
The problem is this: given action-definition and utility-definition, agent can find a function between their sets of possible values and use it as a “utility function”, but other “utility functions” are correct as well, the agent just isn’t capable of finding them, but somehow it’s a good thing, that’s why it works (see this post). What makes some of the functions “better” than others? Can we generalize this to inference of dependencies between facts other than action and utility-value? What particular properties of agents constructed in one of the standard ways allows them to be controlled by some, but not other dependencies? What kinds of “facts” are relevant? What constitutes a “fact”? (In ADT, a “fact” is an axiomatic definition of a structure, which refers to some particular class of structures and not to other structures; decision theory then considers ways in which some of these “facts” can control other “facts”, that is make the structures defined by certain definitions be a certain way, given control over other structures that contain agent’s action.)
It feels like mathematics is the discipline for clarifying questions like this (and it’s perhaps not useful to prioritize its areas, though some emphasis on foundations seems right). An important milestone would be to produce a useful problem statement about clarification of this idea of acausal dependence that can be communicated at least to mathematicians on LW.
He might have changed his mind till now, but in case you missed it: Recommended Reading for Friendly AI Research
I’ve looked over that list, but the problem is that it essentially consists of a list of items to catch you up to the state of the discussion as it was a year ago, along with a list of general mathematics texts.
I’m pretty well acquainted with mathematical logic; the main item on the list that I’m particularly weak in would be category theory, and I’m not sure why category theory is on the list. I’ve a couple of ideas about the potential use of category theory in, maybe, knowledge representation or something along those lines, but I have no clue how it could be brought to bear on the friendliness content problem.
The book list is somewhat obsolete (the list of LW posts is not), but I’m not ready to make the next iteration. The state of decision theory hasn’t changed much since then.
Roughly, the central mystery seems to be the idea of acausal control. It feels like it might even be useful for inferring friendliness content, along the lines of what I described here. But we don’t understand that idea. It first more or less explicitly appeared in UDT with its magical mathematical intuition module, and became more concrete in ADT, where proofs are used instead (at the cost of making it useless where complete proofs can’t be expected, that is almost always outside very simple thought experiments).
The problem is this: given action-definition and utility-definition, agent can find a function between their sets of possible values and use it as a “utility function”, but other “utility functions” are correct as well, the agent just isn’t capable of finding them, but somehow it’s a good thing, that’s why it works (see this post). What makes some of the functions “better” than others? Can we generalize this to inference of dependencies between facts other than action and utility-value? What particular properties of agents constructed in one of the standard ways allows them to be controlled by some, but not other dependencies? What kinds of “facts” are relevant? What constitutes a “fact”? (In ADT, a “fact” is an axiomatic definition of a structure, which refers to some particular class of structures and not to other structures; decision theory then considers ways in which some of these “facts” can control other “facts”, that is make the structures defined by certain definitions be a certain way, given control over other structures that contain agent’s action.)
It feels like mathematics is the discipline for clarifying questions like this (and it’s perhaps not useful to prioritize its areas, though some emphasis on foundations seems right). An important milestone would be to produce a useful problem statement about clarification of this idea of acausal dependence that can be communicated at least to mathematicians on LW.