I guess you could call it that, but that doesn’t necessarily mean it’s a correct question, or that anyone is necessarily thinking about the problem of FAI with that implied conceptual framework.
Mostly unrelated idea: It’d be really cool if someone who’d thought a decent amount about FAI could moderate a single web page where people with some FAI/rationality experience could post (by emailing the moderator or whatever) somewhat cogent advice about how whoever’s reading the site could perhaps make a small amount of progress towards FAI conceptual development. Restricting it to advice would keep each contributor’s section from bloating to become a jargon-filled description of their personal approach/project. Being somewhat elitist/selective about allowed contributors/contributions would be important. Advice shouldn’t just be LW-obvious applause lights. The contributors should assume (because a notice at the top says so) that their audience is, or can easily become without guidance, pretty damn familiar with FAI dreams and their patterns of failure and thus doesn’t need those arguments repeated. Basically, the advice should be novel to the easily accessible web, though it’s okay to emphasize e.g. specific ways of doing analysis found in LOGI. But basically such restrictions are just hypotheses about optimal tone. If the moderator is selective about contributors then it’d probably naturally self-optimize.
Such a site sounds pretty easy to set up. It’s just an HTML document with a description and lots of external links and book suggestions at the top, and neat sections below. Potential hard parts: seducing people (e.g. Mitchell Porter, Wei Dai) to seed it, and choosing a moderator who’s willing to be choosy about what gets published, and is willing to implement edits according to some sane policy for editing. (And maybe some other moderators with access too.)
I guess it’s possible that LW wiki is sort of almost okay, but really, I don’t like it. It’s not a url I can just type into my address bar, it requires extra moderation which is socially and technically awkward, LW wiki is not about FAI, and in general it doesn’t have the clean simplicity which is both attractive and expandable in many ways.
I’m not sure how to stably point people at it, but it’d be easy to link to when someone professes interested in learning more about FAI stuff. Also it’s probable a fair bit of benefit would come from current FAI-interested folk getting a chance to learn from each other, and depending on the site structure (like whether or not a paragraph or another page just about current research interests of all contributors is a good idea) it could easily provide an affordance for people to actually bother constructively criticizing others’ approaches and emphases. I suspect that lukeprog’s future efforts could be sharpened by Vladimir Nesov’s advice, as a completely speculative example. And I’d like to have a better idea of what Mitchell Porter thinks I might be missing, as a non-speculative example.
Do you think a LW subreddit devoted to FAI could work? If not, then we probably aren’t ready for the site you suggest, and the default venue for such dialogues should continue to be LW Discussion.
I think a LW subreddit devoted to FAI could potentially be very frustrating. The majority of FAI-related posts that I’ve seen on LW Discussion are pretty bad and get upvoted anyway (though not much). Do you think Discussion is an adequate forum for now?
A new forum devoted to FAI risks rapidly running out of quality material, if it just recruits a few people from LW. It needs outsiders from relevant fields, like AGI, non-SIAI machine ethics, and “decision neuroscience”, to have a chance of sustainability, and these new recruits will be at risk of fleeing the project if it comes packaged with the standard LW eschatology of immortality and a utilitronium cosmos, which will sound simultaneously fanatical and frivolous to someone engaged in hard expert work. I don’t think we’re ready for this; it sounds like at least six months’ work to develop a clear intention for the site, decide who to invite and how to invite them, and otherwise settle into the necessary sobriety of outlook.
Meanwhile, you could make a post like Luke has done, explaining your objective and the proposed ingredients.
Not a project that I have time for right now. But I certainly would like to collaborate with others working on CEV. My hope is to get through my metaethics sequence to get my own thoughts clear and communicate them to others, and also so that we all have a more up-to-date starting point than Eliezer’s 2004 CEV paper.
Sounds good. I sort of feel obligated to point out that CEV is about policy, public relations, and abstract philosophy significantly more than it is about the real problem of FAI. Thus I’m a little worried about what “working on CEV” might look like if the optimization targets aren’t very clear from the start.
Bringing CEV up-to-date and ideally emphasizing that whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning sounds more straight-forwardly good. (Actually, Steve had some analysis about why even smart people so consistently miss this point (besides the typical diagnosis of ‘insufficient Hofstadter during adolescence syndrome’) which should really go into a future CEV doc. A huge part of the common confusion about CEV is due to people not really noticing or understanding the whole “if you can think of a failure mode, the AI can think of it” thing.)
whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning
This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren’t allowed to make that assumption.
The adequate response is not that it’s “correct by definition” (because it isn’t, it’s a constructed artifact that could well be a wrong thing to construct), but an (abstract) explanation of why it will still make that correct decision under the given circumstances. An explanation of why exactly it’s true that CEV will also take into account that line of reasoning, why do you believe that it is its nature to do so, for example. And it aren’t that simple, say it won’t take into account that line of reasoning if it’s wrong, but it’s again not clear how it decides what’s wrong.
This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren’t allowed to make that assumption.
Right, I am talking about the scenario not covered by your “(hopefully)” clause where people accept for the sake of argument that CEV would work as intended/written but still imagine failure modes. Or subtler cases where you think up something horrible that CEV might do but don’t use your sense of horribleness as evidence against CEV actually doing it (e.g. Rokogate). It seems to me you are talking about people who are afraid CEV wouldn’t be implemented correctly, which is a different group of people that includes basically everyone, no? (I should probably note again that I do not think of CEV as something you’d work on implementing so much as a piece of philosophy and public relations that you should take into account when thinking up FAI research plans. I am definitely not going around saying “CEV is right by definition!”...)
I’m not sure what you mean by the first paragraph. CEV is a plan for friendliness content. That is one of the real problems with FAI, along with the problem of reflective decision theory, the problem of goal stability over self-modification, and others.
Your bolded words do indeed need to be emphasized, but people can rightly worry that the particular line of reasoning that leads them to a failure scenario will not be taken into account if, for example, their brains are not accounted for by CEV either because nobody with that objection is scanned for their values, or because extrapolated values do not converge cleanly and the value that leads to the supposed failure scenario will not survive a required ‘voting’ process (or whatever) in the extrapolation process.
I’m not sure what you mean by the first paragraph. CEV is a plan for friendliness content.
More of a partial plan. I would call it a plan once an approximate mechanism for aggregation is specified. Without the aggregation method the outcome is basically undefined.
or because extrapolated values do not converge cleanly and the value that leads to the supposed failure scenario will not survive a required ‘voting’ process (or whatever) in the extrapolation process.
My impression and my worry is that calling CEV a ‘plan for Friendliness content’, while true in a sense, is giving CEV-as-written too much credit as a stable conceptual framework. My default vision of someone working on CEV from my intuitive knee-jerk interpretation of your phrasing is of a person thinking hard for many hours about how to design a really clever meta level extrapolation process. This would probably be useful work, compared to many other research methods. But I would be kind of surprised if such research was at all eventually useful before the development of a significantly more thorough notion of preference, preferences as bounded computations, approximately embodied computation, overlapping computations, et cetera. I may well be underestimating the amount of creative juices you can get from informal models of something like extrapolation. It could be that you don’t have to get A.I.-precise to get an abstract theory whose implementation details aren’t necessarily prohibitively arbitrary, complex, or model-breaking. But I don’t think C.E.V. is at the correct level of abstraction to start such reasoning, and I’m worried that the first step of research on it wouldn’t involve an immediate and total conceptual reframing on a more precise/technical level. That said, there is assuredly less technical but still theoretical research to be done on existent systems of morality and moral reasoning, so I am not advocating against all research that isn’t exploring the foundations of computer science or anything.
I should note that the above are my impressions and I intend them as evidence more than advice. Someone who has experience jumping between original research on condensed matter physics and macroscopic complex systems modeling (as an example of a huge set of people) would know a lot more about the right way to tackle such problems.
Your second paragraph is of course valid and worth noting though it perhaps unfortunately doesn’t describe the folk I’m talking about, who are normally thinking on the humanity and not individual level. I should have stated that specifically. I should note for posterity that I am incredibly tired and (legally) drugged, and also was in my previous message, so although I feel sane I may not think so upon reflection.
Steps 2 and 3 are to defeat the auto-complete rule that in
certain cases turns two consecutive spaces into a period and
one space. The other steps are the
same as what you would do on a regular computer.
Note that you should only do this if you are typing a poem
or something else where you would use the HTML element.
Normally you should use paragraph breaks, which you get by
pressing “return” twice, so that a blank line is between the
paragraphs (same as on a regular computer).
What web browser are you using? I have a “return” button in Safari (on
an iPhone 3G running iOS 4.2.1).
I might try HTML breaks next time.
Won’t work; the LW Markdown implementation doesn’t do raw HTML. (In
other words, when I typed “ ” in my previous comment and this one, I
didn’t need to do any escaping to get it to show up rather than turn
into a line break.)
If you don’t mind some hassle, it would probably work to write your
comment in the “Notes” app, then copy and paste it.
I guess you could call it that, but that doesn’t necessarily mean it’s a correct question, or that anyone is necessarily thinking about the problem of FAI with that implied conceptual framework.
Mostly unrelated idea: It’d be really cool if someone who’d thought a decent amount about FAI could moderate a single web page where people with some FAI/rationality experience could post (by emailing the moderator or whatever) somewhat cogent advice about how whoever’s reading the site could perhaps make a small amount of progress towards FAI conceptual development. Restricting it to advice would keep each contributor’s section from bloating to become a jargon-filled description of their personal approach/project. Being somewhat elitist/selective about allowed contributors/contributions would be important. Advice shouldn’t just be LW-obvious applause lights. The contributors should assume (because a notice at the top says so) that their audience is, or can easily become without guidance, pretty damn familiar with FAI dreams and their patterns of failure and thus doesn’t need those arguments repeated. Basically, the advice should be novel to the easily accessible web, though it’s okay to emphasize e.g. specific ways of doing analysis found in LOGI. But basically such restrictions are just hypotheses about optimal tone. If the moderator is selective about contributors then it’d probably naturally self-optimize.
Such a site sounds pretty easy to set up. It’s just an HTML document with a description and lots of external links and book suggestions at the top, and neat sections below. Potential hard parts: seducing people (e.g. Mitchell Porter, Wei Dai) to seed it, and choosing a moderator who’s willing to be choosy about what gets published, and is willing to implement edits according to some sane policy for editing. (And maybe some other moderators with access too.)
I guess it’s possible that LW wiki is sort of almost okay, but really, I don’t like it. It’s not a url I can just type into my address bar, it requires extra moderation which is socially and technically awkward, LW wiki is not about FAI, and in general it doesn’t have the clean simplicity which is both attractive and expandable in many ways.
I’m not sure how to stably point people at it, but it’d be easy to link to when someone professes interested in learning more about FAI stuff. Also it’s probable a fair bit of benefit would come from current FAI-interested folk getting a chance to learn from each other, and depending on the site structure (like whether or not a paragraph or another page just about current research interests of all contributors is a good idea) it could easily provide an affordance for people to actually bother constructively criticizing others’ approaches and emphases. I suspect that lukeprog’s future efforts could be sharpened by Vladimir Nesov’s advice, as a completely speculative example. And I’d like to have a better idea of what Mitchell Porter thinks I might be missing, as a non-speculative example.
What do you think, Luke? Worth an experiment?
Do you think a LW subreddit devoted to FAI could work? If not, then we probably aren’t ready for the site you suggest, and the default venue for such dialogues should continue to be LW Discussion.
Probably not. There are too many things that can’t be said about FAI in a SIAI affiliated blog for political reasons. It would be lame.
What if the subreddit was an actual reddit subreddit?
I think a LW subreddit devoted to FAI could potentially be very frustrating. The majority of FAI-related posts that I’ve seen on LW Discussion are pretty bad and get upvoted anyway (though not much). Do you think Discussion is an adequate forum for now?
I should use this opportunity to quit LW for a whiie.
A new forum devoted to FAI risks rapidly running out of quality material, if it just recruits a few people from LW. It needs outsiders from relevant fields, like AGI, non-SIAI machine ethics, and “decision neuroscience”, to have a chance of sustainability, and these new recruits will be at risk of fleeing the project if it comes packaged with the standard LW eschatology of immortality and a utilitronium cosmos, which will sound simultaneously fanatical and frivolous to someone engaged in hard expert work. I don’t think we’re ready for this; it sounds like at least six months’ work to develop a clear intention for the site, decide who to invite and how to invite them, and otherwise settle into the necessary sobriety of outlook.
Meanwhile, you could make a post like Luke has done, explaining your objective and the proposed ingredients.
Not a project that I have time for right now. But I certainly would like to collaborate with others working on CEV. My hope is to get through my metaethics sequence to get my own thoughts clear and communicate them to others, and also so that we all have a more up-to-date starting point than Eliezer’s 2004 CEV paper.
Sounds good. I sort of feel obligated to point out that CEV is about policy, public relations, and abstract philosophy significantly more than it is about the real problem of FAI. Thus I’m a little worried about what “working on CEV” might look like if the optimization targets aren’t very clear from the start.
Bringing CEV up-to-date and ideally emphasizing that whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning sounds more straight-forwardly good. (Actually, Steve had some analysis about why even smart people so consistently miss this point (besides the typical diagnosis of ‘insufficient Hofstadter during adolescence syndrome’) which should really go into a future CEV doc. A huge part of the common confusion about CEV is due to people not really noticing or understanding the whole “if you can think of a failure mode, the AI can think of it” thing.)
This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren’t allowed to make that assumption.
The adequate response is not that it’s “correct by definition” (because it isn’t, it’s a constructed artifact that could well be a wrong thing to construct), but an (abstract) explanation of why it will still make that correct decision under the given circumstances. An explanation of why exactly it’s true that CEV will also take into account that line of reasoning, why do you believe that it is its nature to do so, for example. And it aren’t that simple, say it won’t take into account that line of reasoning if it’s wrong, but it’s again not clear how it decides what’s wrong.
Right, I am talking about the scenario not covered by your “(hopefully)” clause where people accept for the sake of argument that CEV would work as intended/written but still imagine failure modes. Or subtler cases where you think up something horrible that CEV might do but don’t use your sense of horribleness as evidence against CEV actually doing it (e.g. Rokogate). It seems to me you are talking about people who are afraid CEV wouldn’t be implemented correctly, which is a different group of people that includes basically everyone, no? (I should probably note again that I do not think of CEV as something you’d work on implementing so much as a piece of philosophy and public relations that you should take into account when thinking up FAI research plans. I am definitely not going around saying “CEV is right by definition!”...)
I’m not sure what you mean by the first paragraph. CEV is a plan for friendliness content. That is one of the real problems with FAI, along with the problem of reflective decision theory, the problem of goal stability over self-modification, and others.
Your bolded words do indeed need to be emphasized, but people can rightly worry that the particular line of reasoning that leads them to a failure scenario will not be taken into account if, for example, their brains are not accounted for by CEV either because nobody with that objection is scanned for their values, or because extrapolated values do not converge cleanly and the value that leads to the supposed failure scenario will not survive a required ‘voting’ process (or whatever) in the extrapolation process.
More of a partial plan. I would call it a plan once an approximate mechanism for aggregation is specified. Without the aggregation method the outcome is basically undefined.
The ‘people are assholes’ failure mode. :)
My impression and my worry is that calling CEV a ‘plan for Friendliness content’, while true in a sense, is giving CEV-as-written too much credit as a stable conceptual framework. My default vision of someone working on CEV from my intuitive knee-jerk interpretation of your phrasing is of a person thinking hard for many hours about how to design a really clever meta level extrapolation process. This would probably be useful work, compared to many other research methods. But I would be kind of surprised if such research was at all eventually useful before the development of a significantly more thorough notion of preference, preferences as bounded computations, approximately embodied computation, overlapping computations, et cetera. I may well be underestimating the amount of creative juices you can get from informal models of something like extrapolation. It could be that you don’t have to get A.I.-precise to get an abstract theory whose implementation details aren’t necessarily prohibitively arbitrary, complex, or model-breaking. But I don’t think C.E.V. is at the correct level of abstraction to start such reasoning, and I’m worried that the first step of research on it wouldn’t involve an immediate and total conceptual reframing on a more precise/technical level. That said, there is assuredly less technical but still theoretical research to be done on existent systems of morality and moral reasoning, so I am not advocating against all research that isn’t exploring the foundations of computer science or anything.
I should note that the above are my impressions and I intend them as evidence more than advice. Someone who has experience jumping between original research on condensed matter physics and macroscopic complex systems modeling (as an example of a huge set of people) would know a lot more about the right way to tackle such problems.
Your second paragraph is of course valid and worth noting though it perhaps unfortunately doesn’t describe the folk I’m talking about, who are normally thinking on the humanity and not individual level. I should have stated that specifically. I should note for posterity that I am incredibly tired and (legally) drugged, and also was in my previous message, so although I feel sane I may not think so upon reflection.
(Deleted this minor comment as no longer relevant, so instead: how do you add line breaks with iOS 4? 20 seconds of Google didn’t help me.)
Type a space.
Type a letter (doesn’t matter which).
Erase the letter.
Type another space.
Press “return”.
Steps 2 and 3 are to defeat the auto-complete rule that in certain cases turns two consecutive spaces into a period and one space. The other steps are the same as what you would do on a regular computer.
Note that you should only do this if you are typing a poem or something else where you would use the HTML
element. Normally you should use paragraph breaks, which you get by pressing “return” twice, so that a blank line is between the paragraphs (same as on a regular computer).
The problem is, I don’t think I have a return button? Ah well, it’s not a big deal at all. I might try HTML breaks next time.
What web browser are you using? I have a “return” button in Safari (on an iPhone 3G running iOS 4.2.1).
Won’t work; the LW Markdown implementation doesn’t do raw HTML. (In other words, when I typed “
” in my previous comment and this one, I didn’t need to do any escaping to get it to show up rather than turn into a line break.)
If you don’t mind some hassle, it would probably work to write your comment in the “Notes” app, then copy and paste it.