Specifically, I think it should only be acceptable to claim something is infohazardous when you have strong empirical evidence that 1.) it substantially advances capabilities (i.e. more than the median NeurIPS paper), 2.) It empirically works on actual ML systems at scale, 3.) it is already not reasonably known within the ML community, and 4.) when there is no reason to expect differential impact on safety vs capabilities i.e. when the idea has no safety implications and is pure capabilities.
If we’re proposing this as a general criterion, as opposed to “a starting-point heuristic that would have been an improvement in Beren’s microenvironment”, then I have some complaints.
First, before we even start, I think “claim something is infohazardous” is not where I’d necessarily be locating the decision point. I want to ask: What exactly are you deciding whether or not to say, and to whom, and in what context? For example, if there are paradigm shifts between us and TAI, then the biggest infohazards would presumably involve what those paradigm-shifts are. And realistically, whatever the answer is, someone somewhere has already proposed it, but it hasn’t caught on, because it hasn’t so far been demonstrated to work well, and/or seems a priori implausible to most people, and/or is confusing or antimemetic for some other reason, etc. So if a random person says “I think the paradigm shift is X”, that’s probably not an infohazard even if they’re right, because random people make claims like that all the time and nobody listens. By contrast, if they are not a random person but rather a prominent leader in the field giving a NeurIPS keynote, or unusually good at explaining things in a compelling way, or controlling an AI funding source, or publishing turnkey SOTA code, or whatever, that might be different.
Second, “it empirically works on actual ML systems at scale” shouldn’t be a necessary criterion in general, in my opinion. For example, if there are paradigm shifts between us and TAI (as I happen to believe), then maybe we can think of actual ML systems as airplanes, and TAI as a yet-to-be-invented rocket ship, in an imaginary world where modern aircraft exist but rocket ships haven’t been invented yet. Anyway, early rocket ships are going to be far behind the airplane SOTA; in fact, for quite a while, they just won’t work at all, because they need yet-to-be-invented components like rocket fuel. Developing those components is important progress, but would not “empirically” advance the (airplane) SOTA.
I think about this a lot in the context of neuroscience. IMO the most infohazardous AI work on earth right now involves theoretical neuroscience researchers toiling away in obscurity, building models that don’t do anything very impressive. But we know that fully understanding the human brain is a recipe for TAI. So figuring out a little piece of that puzzle is unarguably a step towards TAI. The key is that: one piece of the puzzle, in the absence of all the other pieces, is not necessarily going to do anything impressive.
Third, (4) is not really how I would frame weighing the costs and benefits. For example, I would be especially interested in the question: “If this is true, then do I expect capabilities researchers to figure this out and publish it sooner or later, before TAI?” If so, it might make sense to just sit on it and find something else to work on until it’s published by someone else, even if it does have nonzero safety/alignment implications. Especially if there are other important alignment things to work on in the meantime (which there probably are). This has the side benefit of being a great strategy in the worlds where the idea is wrong anyway. It depends on the details in lots of ways, of course. More discussion & nuance in my post here, and see also Charlie’s post on tech trees.
(None of this is to invalidate Beren’s personal experience, which I think is pretty different from mine in various ways, especially that he’s working closer to the capabilities research mainstream, and was working at Conjecture which actually wanted to be out ahead of everyone else in (some aspects of) AI capabilities IIUC, whereas I’m speaking from my own experience doing alignment research with has a path-to-impact which mostly doesn’t require that.)
From the link:
If we’re proposing this as a general criterion, as opposed to “a starting-point heuristic that would have been an improvement in Beren’s microenvironment”, then I have some complaints.
First, before we even start, I think “claim something is infohazardous” is not where I’d necessarily be locating the decision point. I want to ask: What exactly are you deciding whether or not to say, and to whom, and in what context? For example, if there are paradigm shifts between us and TAI, then the biggest infohazards would presumably involve what those paradigm-shifts are. And realistically, whatever the answer is, someone somewhere has already proposed it, but it hasn’t caught on, because it hasn’t so far been demonstrated to work well, and/or seems a priori implausible to most people, and/or is confusing or antimemetic for some other reason, etc. So if a random person says “I think the paradigm shift is X”, that’s probably not an infohazard even if they’re right, because random people make claims like that all the time and nobody listens. By contrast, if they are not a random person but rather a prominent leader in the field giving a NeurIPS keynote, or unusually good at explaining things in a compelling way, or controlling an AI funding source, or publishing turnkey SOTA code, or whatever, that might be different.
Second, “it empirically works on actual ML systems at scale” shouldn’t be a necessary criterion in general, in my opinion. For example, if there are paradigm shifts between us and TAI (as I happen to believe), then maybe we can think of actual ML systems as airplanes, and TAI as a yet-to-be-invented rocket ship, in an imaginary world where modern aircraft exist but rocket ships haven’t been invented yet. Anyway, early rocket ships are going to be far behind the airplane SOTA; in fact, for quite a while, they just won’t work at all, because they need yet-to-be-invented components like rocket fuel. Developing those components is important progress, but would not “empirically” advance the (airplane) SOTA.
I think about this a lot in the context of neuroscience. IMO the most infohazardous AI work on earth right now involves theoretical neuroscience researchers toiling away in obscurity, building models that don’t do anything very impressive. But we know that fully understanding the human brain is a recipe for TAI. So figuring out a little piece of that puzzle is unarguably a step towards TAI. The key is that: one piece of the puzzle, in the absence of all the other pieces, is not necessarily going to do anything impressive.
Third, (4) is not really how I would frame weighing the costs and benefits. For example, I would be especially interested in the question: “If this is true, then do I expect capabilities researchers to figure this out and publish it sooner or later, before TAI?” If so, it might make sense to just sit on it and find something else to work on until it’s published by someone else, even if it does have nonzero safety/alignment implications. Especially if there are other important alignment things to work on in the meantime (which there probably are). This has the side benefit of being a great strategy in the worlds where the idea is wrong anyway. It depends on the details in lots of ways, of course. More discussion & nuance in my post here, and see also Charlie’s post on tech trees.
(None of this is to invalidate Beren’s personal experience, which I think is pretty different from mine in various ways, especially that he’s working closer to the capabilities research mainstream, and was working at Conjecture which actually wanted to be out ahead of everyone else in (some aspects of) AI capabilities IIUC, whereas I’m speaking from my own experience doing alignment research with has a path-to-impact which mostly doesn’t require that.)