“this kind of work may have the unintended consequence of pushing people who would have otherwise worked on hard core problems of x-risk to more prosaic projects, lulling them into a false sense of security when progress is made.”
I think it is more like:
This kind of work seems likely to one day redirect funding intended for X-risk away from X-risk.
I know people who would point to this kind of thing to argue that AI can be made safe without the kind of deep decision theory thinking MIRI is interested in. Those people would probably argue against X-risk research regardless, but the more stuff there is that’s difficult for outsiders to distinguish from X-risk relevant research, the more difficulty outsiders have assessing such arguments.
So it isn’t so much that I think people who would work on X-risk would be redirected, as that I think there will be a point where people adjacent to X-risk research will have difficulty telling which people are actually trying to work on X-risk, and also what the state of the X-risk concerns is (I mean to what extent it has been addressed by the research.
As I understand it, this argument can also be applied to any work that doesn’t plausibly one-shot a significant alignment problem, potentially including research by OpenAI and DeepMind. While obviously we’d all prefer one-shots, sometimes research is more incremental (I’m sure this isn’t news to you!). Here, I set out to make progress on one of the Concrete Problems; after doing so, I thought “does this scale? What insights can we take away?”. I had relaxed the problem by assuming a friendly ontology, and I was curious what difficulties (if any) remained.
I agree that research has to be incremental. It should be taken almost for granted that anything currently written about the subject is not anywhere near a real solution even to a sub-problem of safety, unless otherwise stated. If I had to point out one line which caused me to have such a skeptical reaction to your post was, it would be:
I’m fairly confident that whitelisting contributes meaningfully to short- to mid-term AI safety,
If instead this had been presented as “here’s something interesting which doesn’t work” I would not have made the objection I made. IE, what’s important is not any contribution to near- or medium- term AI safety, but rather exploration of the landscape of low-impact RL, which may eventually contribute to reducing X-risk. IE, more the attitude you express here:
We are currently grading this approach by the most rigorous of metrics—I think this is good, as that’s how we will eventually be judged! However, we shouldn’t lose sight of the fact that most safety work won’t be immediately superintelligence-complete. Exploratory work is important.
So, I’m saying that exploratory work should not be justified as “confident that this contributes meaningfully to short-term safety”. Almost everything at this stage is more like “maybe useful for one day having better thoughts about reducing X-risk, maybe not”.
I don’t follow—the agent has a distribution for an object at time t, and another at t+1. It penalizes based on changes in its beliefs about how the world actually is at the time steps—not with respect to its expectation.
So it isn’t so much that I think people who would work on X-risk would be redirected, as that I think there will be a point where people adjacent to X-risk research will have difficulty telling which people are actually trying to work on X-risk, and also what the state of the X-risk concerns is (I mean to what extent it has been addressed by the research).
That makes more sense. I haven’t thought enough about this aspect to have a strong opinion yet. My initial thoughts are that
this problem can be basically avoided if this kind of work clearly points out where the problems would be if scaled.
I do think it’s plausible that some less-connected funding sources might get confused (NSF), but I’d be surprised if later FLI funding got diverted because of this. I think this kind of work will be done anyways, and it’s better to have people who think carefully about scale issues doing it.
your second bullet point reminds me of how some climate change skeptics will point to “evidence” from “scientists”, as if that’s what convinced them. In reality, however, they’re drawing the bottom line first, and then pointing to what they think is the most dignified support for their position. I don’t think that avoiding this kind of work would ameliorate that problem—they’d probably just find other reasons.
most people on the outside don’t understand x-risk anyways, because it requires thinking rigorously in a lot of ways to not end up a billion miles off of any reasonable conclusion. I don’t think that this additional straw will marginally add significant confusion.
IE, what’s important is not any contribution to near- or medium- term AI safety
I’m confused how “contributes meaningfully to short-term safety” and “maybe useful for having better thoughts” are mutually-exclusive outcomes, or why it’s wrong to say that I think my work contributes to short-term efforts. Sure, that may not be what you care about, but I think it’s still reasonable that I mention it.
I’m saying that exploratory work should not be justified as “confident that this contributes meaningfully to short-term safety”. Almost everything at this stage is more like “maybe useful for one day having better thoughts about reducing X-risk, maybe not”.
I’m confused why that latter statement wasn’t what came across! Later in that sentence, I state that I don’t think it will scale. I also made sure to highlight how it breaks down in a serious way when scaled up, and I don’t think I otherwise implied that it’s presently safe for long-term efforts.
I totally agree that having better thoughts about x-risk is a worthy goal at this point.
I’m confused how “contributes meaningfully to short-term safety” and “maybe useful for having better thoughts” are mutually-exclusive outcomes, or why it’s wrong to say that I think my work contributes to short-term efforts. Sure, that may not be what you care about, but I think it’s still reasonable that I mention it.
In hindsight I am regretting the way my response went. While it was my honest response, antagonizing newcomers to the field for paying any attention to whether their work might be useful for sub-AGI safety doesn’t seem like a good way to create the ideal research atmosphere. Sorry for being a jerk about it.
Although I did flinch a bit, my S2 reaction was “this is Abram, so if it’s criticism, it’s likely very high-quality. I’m glad I’m getting detailed feedback, even if it isn’t all positive”. Apology definitely accepted (although I didn’t view you as being a jerk), and really—thank you for taking the time to critique me a bit. :)
I think it is more like:
This kind of work seems likely to one day redirect funding intended for X-risk away from X-risk.
I know people who would point to this kind of thing to argue that AI can be made safe without the kind of deep decision theory thinking MIRI is interested in. Those people would probably argue against X-risk research regardless, but the more stuff there is that’s difficult for outsiders to distinguish from X-risk relevant research, the more difficulty outsiders have assessing such arguments.
So it isn’t so much that I think people who would work on X-risk would be redirected, as that I think there will be a point where people adjacent to X-risk research will have difficulty telling which people are actually trying to work on X-risk, and also what the state of the X-risk concerns is (I mean to what extent it has been addressed by the research.
I agree that research has to be incremental. It should be taken almost for granted that anything currently written about the subject is not anywhere near a real solution even to a sub-problem of safety, unless otherwise stated. If I had to point out one line which caused me to have such a skeptical reaction to your post was, it would be:
If instead this had been presented as “here’s something interesting which doesn’t work” I would not have made the objection I made. IE, what’s important is not any contribution to near- or medium- term AI safety, but rather exploration of the landscape of low-impact RL, which may eventually contribute to reducing X-risk. IE, more the attitude you express here:
So, I’m saying that exploratory work should not be justified as “confident that this contributes meaningfully to short-term safety”. Almost everything at this stage is more like “maybe useful for one day having better thoughts about reducing X-risk, maybe not”.
Ah, alright.
That makes more sense. I haven’t thought enough about this aspect to have a strong opinion yet. My initial thoughts are that
this problem can be basically avoided if this kind of work clearly points out where the problems would be if scaled.
I do think it’s plausible that some less-connected funding sources might get confused (NSF), but I’d be surprised if later FLI funding got diverted because of this. I think this kind of work will be done anyways, and it’s better to have people who think carefully about scale issues doing it.
your second bullet point reminds me of how some climate change skeptics will point to “evidence” from “scientists”, as if that’s what convinced them. In reality, however, they’re drawing the bottom line first, and then pointing to what they think is the most dignified support for their position. I don’t think that avoiding this kind of work would ameliorate that problem—they’d probably just find other reasons.
most people on the outside don’t understand x-risk anyways, because it requires thinking rigorously in a lot of ways to not end up a billion miles off of any reasonable conclusion. I don’t think that this additional straw will marginally add significant confusion.
I’m confused how “contributes meaningfully to short-term safety” and “maybe useful for having better thoughts” are mutually-exclusive outcomes, or why it’s wrong to say that I think my work contributes to short-term efforts. Sure, that may not be what you care about, but I think it’s still reasonable that I mention it.
I’m confused why that latter statement wasn’t what came across! Later in that sentence, I state that I don’t think it will scale. I also made sure to highlight how it breaks down in a serious way when scaled up, and I don’t think I otherwise implied that it’s presently safe for long-term efforts.
I totally agree that having better thoughts about x-risk is a worthy goal at this point.
In hindsight I am regretting the way my response went. While it was my honest response, antagonizing newcomers to the field for paying any attention to whether their work might be useful for sub-AGI safety doesn’t seem like a good way to create the ideal research atmosphere. Sorry for being a jerk about it.
Although I did flinch a bit, my S2 reaction was “this is Abram, so if it’s criticism, it’s likely very high-quality. I’m glad I’m getting detailed feedback, even if it isn’t all positive”. Apology definitely accepted (although I didn’t view you as being a jerk), and really—thank you for taking the time to critique me a bit. :)