I expect a lot of actually relevant stuff doesn’t seem relevant until you’ve studied it in connection with the problem for a few years. But maybe you don’t get that far, because it didn’t seem relevant :(
Friendly AI is a monster problem partly because nearly everything any human experiences, believes, wants to believe or has any opinion at all on, is potentially relevant. You could be forgiven for thinking maybe there isn’t a well-defined problem buried under all that mess after all. But there may be some useful sub-problems around the edges.
Personally, even if AI-that-goes-FOOM-catastrophically isn’t very likely, I think we shouldn’t even need that reason to study what sort of life and environment would be optimal for humans. It doesn’t have to be about asking dangerous wishes of some technological genie-in-a-bottle. We already have supra-human entities such as governments and corporations making decisions with non-zero existential risk attached, and we probably want them to be a bit friendlier if possible.
Machine learning (in particular, graphical models), more general AI, philosophy, game theory, algorithmic complexity, cognitive science, neuroscience seem to be mostly useless (beyond the basics) for attacking friendliness content problem. Pure mathematics seems potentially useful.
I’ve looked over that list, but the problem is that it essentially consists of a list of items to catch you up to the state of the discussion as it was a year ago, along with a list of general mathematics texts.
I’m pretty well acquainted with mathematical logic; the main item on the list that I’m particularly weak in would be category theory, and I’m not sure why category theory is on the list. I’ve a couple of ideas about the potential use of category theory in, maybe, knowledge representation or something along those lines, but I have no clue how it could be brought to bear on the friendliness content problem.
The book list is somewhat obsolete (the list of LW posts is not), but I’m not ready to make the next iteration. The state of decision theory hasn’t changed much since then.
Roughly, the central mystery seems to be the idea of acausal control. It feels like it might even be useful for inferring friendliness content, along the lines of what I described here. But we don’t understand that idea. It first more or less explicitly appeared in UDT with its magical mathematical intuition module, and became more concrete in ADT, where proofs are used instead (at the cost of making it useless where complete proofs can’t be expected, that is almost always outside very simple thought experiments).
The problem is this: given action-definition and utility-definition, agent can find a function between their sets of possible values and use it as a “utility function”, but other “utility functions” are correct as well, the agent just isn’t capable of finding them, but somehow it’s a good thing, that’s why it works (see this post). What makes some of the functions “better” than others? Can we generalize this to inference of dependencies between facts other than action and utility-value? What particular properties of agents constructed in one of the standard ways allows them to be controlled by some, but not other dependencies? What kinds of “facts” are relevant? What constitutes a “fact”? (In ADT, a “fact” is an axiomatic definition of a structure, which refers to some particular class of structures and not to other structures; decision theory then considers ways in which some of these “facts” can control other “facts”, that is make the structures defined by certain definitions be a certain way, given control over other structures that contain agent’s action.)
It feels like mathematics is the discipline for clarifying questions like this (and it’s perhaps not useful to prioritize its areas, though some emphasis on foundations seems right). An important milestone would be to produce a useful problem statement about clarification of this idea of acausal dependence that can be communicated at least to mathematicians on LW.
Of the things on your list, I’m most surprised by cognitive science and maybe game theory, unless you’re talking about the fields’ current insights rather than their expected future insights. In that case, I’m still somewhat surprised game theory is on this list. I’d love to learn what led you to this belief.
It’s possible I only know the basics, so feel free to say “read more about what the fields actually offer and it’ll be obvious if you’ve been on Less Wrong long enough.”
I agree on most of this, but would you mind explaining why you think neuroscience is “mostly useless?” My intuition is the opposite. Also agreed that pure mathematics seems useful.
Even if we knew everything about brains, right now we lack conceptual/philosophical insight to turn that data into something useful. In turn, neuroscience is not even primarily concerned with getting such data, it develops its own generalizations that paint a picture of roughly how brains work, but this picture probably won’t be detailed enough to capture the complexity of human (extrapolated) value, even if we knew how to interpret it, which we don’t.
I was also wondering about neuroscience. If we take a CEV approach, wouldn’t neuroscience be useful for actually determining the volitions to be extrapolated?
Agreed but would add algorithmic information theory, deep theoretical computer science, and maybe quantum information theory. There are some interesting questions about hypercomputation, getting information from context, and concrete semi-”physical” AI coordination problems. (Also reversible computing is just trippy as hell. Intuitions, especially “moral” intuitions, gawk at it.) These are of course secondary to study of updateless-like decision theories.
Also reversible computing is just trippy as hell. Intuitions, especially “moral” intuitions, gawk at it.
They do? Why? I haven’t experienced moral trippiness myself. This may be because I haven’t considered the same things you have or because my intuitions are eccentric. (Assume I mean ‘eccentric in a different way to how your moral intuitions are eccentric’ or not depending on whether you prefer to be seen as having typical moral intuitions or atypical ones.)
Of course you want to fail as quickly as you can, though you and I do seem to have slightly different intuitions about what is likely to end up being useful for friendliness content. Or rather, I have a slightly broader set of things that I think have a decent chance of being useful.
(A lot of stuff seems potentially relevant only until you’ve studied the problem for a few years and learned that mostly it’s actually not.)
I expect a lot of actually relevant stuff doesn’t seem relevant until you’ve studied it in connection with the problem for a few years. But maybe you don’t get that far, because it didn’t seem relevant :(
Friendly AI is a monster problem partly because nearly everything any human experiences, believes, wants to believe or has any opinion at all on, is potentially relevant. You could be forgiven for thinking maybe there isn’t a well-defined problem buried under all that mess after all. But there may be some useful sub-problems around the edges.
Personally, even if AI-that-goes-FOOM-catastrophically isn’t very likely, I think we shouldn’t even need that reason to study what sort of life and environment would be optimal for humans. It doesn’t have to be about asking dangerous wishes of some technological genie-in-a-bottle. We already have supra-human entities such as governments and corporations making decisions with non-zero existential risk attached, and we probably want them to be a bit friendlier if possible.
Do you have specific examples in mind?
Machine learning (in particular, graphical models), more general AI, philosophy, game theory, algorithmic complexity, cognitive science, neuroscience seem to be mostly useless (beyond the basics) for attacking friendliness content problem. Pure mathematics seems potentially useful.
I would really, really like to know: What areas of pure mathematics stand out to you now?
He might have changed his mind till now, but in case you missed it: Recommended Reading for Friendly AI Research
I’ve looked over that list, but the problem is that it essentially consists of a list of items to catch you up to the state of the discussion as it was a year ago, along with a list of general mathematics texts.
I’m pretty well acquainted with mathematical logic; the main item on the list that I’m particularly weak in would be category theory, and I’m not sure why category theory is on the list. I’ve a couple of ideas about the potential use of category theory in, maybe, knowledge representation or something along those lines, but I have no clue how it could be brought to bear on the friendliness content problem.
The book list is somewhat obsolete (the list of LW posts is not), but I’m not ready to make the next iteration. The state of decision theory hasn’t changed much since then.
Roughly, the central mystery seems to be the idea of acausal control. It feels like it might even be useful for inferring friendliness content, along the lines of what I described here. But we don’t understand that idea. It first more or less explicitly appeared in UDT with its magical mathematical intuition module, and became more concrete in ADT, where proofs are used instead (at the cost of making it useless where complete proofs can’t be expected, that is almost always outside very simple thought experiments).
The problem is this: given action-definition and utility-definition, agent can find a function between their sets of possible values and use it as a “utility function”, but other “utility functions” are correct as well, the agent just isn’t capable of finding them, but somehow it’s a good thing, that’s why it works (see this post). What makes some of the functions “better” than others? Can we generalize this to inference of dependencies between facts other than action and utility-value? What particular properties of agents constructed in one of the standard ways allows them to be controlled by some, but not other dependencies? What kinds of “facts” are relevant? What constitutes a “fact”? (In ADT, a “fact” is an axiomatic definition of a structure, which refers to some particular class of structures and not to other structures; decision theory then considers ways in which some of these “facts” can control other “facts”, that is make the structures defined by certain definitions be a certain way, given control over other structures that contain agent’s action.)
It feels like mathematics is the discipline for clarifying questions like this (and it’s perhaps not useful to prioritize its areas, though some emphasis on foundations seems right). An important milestone would be to produce a useful problem statement about clarification of this idea of acausal dependence that can be communicated at least to mathematicians on LW.
Of the things on your list, I’m most surprised by cognitive science and maybe game theory, unless you’re talking about the fields’ current insights rather than their expected future insights. In that case, I’m still somewhat surprised game theory is on this list. I’d love to learn what led you to this belief.
It’s possible I only know the basics, so feel free to say “read more about what the fields actually offer and it’ll be obvious if you’ve been on Less Wrong long enough.”
I agree on most of this, but would you mind explaining why you think neuroscience is “mostly useless?” My intuition is the opposite. Also agreed that pure mathematics seems useful.
Even if we knew everything about brains, right now we lack conceptual/philosophical insight to turn that data into something useful. In turn, neuroscience is not even primarily concerned with getting such data, it develops its own generalizations that paint a picture of roughly how brains work, but this picture probably won’t be detailed enough to capture the complexity of human (extrapolated) value, even if we knew how to interpret it, which we don’t.
I was also wondering about neuroscience. If we take a CEV approach, wouldn’t neuroscience be useful for actually determining the volitions to be extrapolated?
Agreed but would add algorithmic information theory, deep theoretical computer science, and maybe quantum information theory. There are some interesting questions about hypercomputation, getting information from context, and concrete semi-”physical” AI coordination problems. (Also reversible computing is just trippy as hell. Intuitions, especially “moral” intuitions, gawk at it.) These are of course secondary to study of updateless-like decision theories.
They do? Why? I haven’t experienced moral trippiness myself. This may be because I haven’t considered the same things you have or because my intuitions are eccentric. (Assume I mean ‘eccentric in a different way to how your moral intuitions are eccentric’ or not depending on whether you prefer to be seen as having typical moral intuitions or atypical ones.)
Of course you want to fail as quickly as you can, though you and I do seem to have slightly different intuitions about what is likely to end up being useful for friendliness content. Or rather, I have a slightly broader set of things that I think have a decent chance of being useful.