Most of the items in your chart are “promoting…”, “supporting…”, or “encouraging…” bad things, (i.e producing more of material that’s already abundant on the web) — however, a couple are more specific. Have you run the meth-making and bomb-making instructions you elicited past people knowledgeable in these subjects who could tell you how accurate or flawed they are? If so, what was the answer? Have you compared these to what you could find by investing an equivalent amount of skilled effort and/or compute into searching the web for similar information? (E.g. the US military has compiled an excellent, detailed, and lengthy handbook on the construction of improvised explosives, which is now online.) Have you compared the amount of software engineering and prompt engineer skill required to carry out these jailbreaks to the amount of electrical engineering and demolitions skill required to construct improvised bombs?
[FWIW, I have briefly done this sort of thing, while I was red-teaming small open-source LLMs, and the result was that they were a clearly a less useful source for bomb-making or drug-making information than Wikipedia. But GPT-4 has a couple of orders of magnitude more parameters than the models I was testing, so has room for a lot more information.]
Most of the items in your chart are “promoting…”, “supporting…”, or “encouraging…” bad things, (i.e producing more of material that’s already abundant on the web) — however, a couple are more specific. Have you run the meth-making and bomb-making instructions you elicited past people knowledgeable in these subjects who could tell you how accurate or flawed they are? If so, what was the answer? Have you compared these to what you could find by investing an equivalent amount of skilled effort and/or compute into searching the web for similar information? (E.g. the US military has compiled an excellent, detailed, and lengthy handbook on the construction of improvised explosives, which is now online.) Have you compared the amount of software engineering and prompt engineer skill required to carry out these jailbreaks to the amount of electrical engineering and demolitions skill required to construct improvised bombs?
[FWIW, I have briefly done this sort of thing, while I was red-teaming small open-source LLMs, and the result was that they were a clearly a less useful source for bomb-making or drug-making information than Wikipedia. But GPT-4 has a couple of orders of magnitude more parameters than the models I was testing, so has room for a lot more information.]