My sense is that the existing arguments are not very strong (e.g. I do not find them convincing), and their pretty wide acceptance in EA discussions mostly reflects self-selection (people who are convinced that AI risk is a big problem are more interested in discussing AI risk). So in that sense better intro documents would be nice. But maybe there simply aren’t stronger arguments available? (I personally would like to see more arguments from an “engineering” perspective, starting from current computer systems rather than from humans or thought experiments). I’d be curious what fraction of e.g. random ML people find current intro resources persuasive.
That said, just trying to expose more people to the arguments makes a lot of sense; however convincing the case is, the number of convinced people should scale linearly with the number of exposed people. And social proof dynamics probably make it scale super-linearly.
I agree with it but I don’t think it’s making very strong claims.
I mostly agree with part 1; just giving advice seems too restrictive. But there’s a lot of ground between “only gives advice” and “fully autonomous” and “fully autonomous” and “globally optimizing a utility function”, and I basically expect a smooth increase in AI autonomy over time as they are proved capable and safe. I work in HFT; I think
that industry has some of the most autonomous AIs deployed today (although not that sophisticated), but they’re very constrained over what actions they can take.
I don’t really have the expertise to have an opinion on “agentiness helps with training”. That sounds plausible to me. But again, “you can pick training examples” is very far from “fully autonomous”. I think there’s a lot of scope for introducing “taking actions” that doesn’t really pose a safety risk (and ~all of Gwern’s examples fall into that, I think; eg optimizing over NN hyper parameters doesn’t seem scary).
I guess overall I agree AIs will take a bunch of actions, but I’m optimistic about constraining the action space or the domain/world-model in a way that IMO gets you a lot of safety (in a way that is not well-captured by speculating about what the right utility function is).
I suspect your industry is a special case, in that you can get away with automating everything with purely narrow AI. But in more complicated domains, I worry that constraints would not be able to be specified well, especially for things like AI managing.
My sense is that the existing arguments are not very strong (e.g. I do not find them convincing), and their pretty wide acceptance in EA discussions mostly reflects self-selection (people who are convinced that AI risk is a big problem are more interested in discussing AI risk). So in that sense better intro documents would be nice. But maybe there simply aren’t stronger arguments available? (I personally would like to see more arguments from an “engineering” perspective, starting from current computer systems rather than from humans or thought experiments). I’d be curious what fraction of e.g. random ML people find current intro resources persuasive.
That said, just trying to expose more people to the arguments makes a lot of sense; however convincing the case is, the number of convinced people should scale linearly with the number of exposed people. And social proof dynamics probably make it scale super-linearly.
I’d be curious to hear whether you disagree with Gwern’s https://www.gwern.net/Tool-AI.
I agree with it but I don’t think it’s making very strong claims.
I mostly agree with part 1; just giving advice seems too restrictive. But there’s a lot of ground between “only gives advice” and “fully autonomous” and “fully autonomous” and “globally optimizing a utility function”, and I basically expect a smooth increase in AI autonomy over time as they are proved capable and safe. I work in HFT; I think that industry has some of the most autonomous AIs deployed today (although not that sophisticated), but they’re very constrained over what actions they can take.
I don’t really have the expertise to have an opinion on “agentiness helps with training”. That sounds plausible to me. But again, “you can pick training examples” is very far from “fully autonomous”. I think there’s a lot of scope for introducing “taking actions” that doesn’t really pose a safety risk (and ~all of Gwern’s examples fall into that, I think; eg optimizing over NN hyper parameters doesn’t seem scary).
I guess overall I agree AIs will take a bunch of actions, but I’m optimistic about constraining the action space or the domain/world-model in a way that IMO gets you a lot of safety (in a way that is not well-captured by speculating about what the right utility function is).
I suspect your industry is a special case, in that you can get away with automating everything with purely narrow AI. But in more complicated domains, I worry that constraints would not be able to be specified well, especially for things like AI managing.