We have people in crypto who are good at breaking things, and they’re the reason why anything is not on fire. And some of them might go into breaking AI systems instead, ’cause that’s where you learn anything.
You know, you know, any fool can build a crypto system that they think will work. Breaking existing crypto systems—cryptographical systems—is how we learn who the real experts are. So maybe the people finding weird stuff to do with AIs, maybe those people will come up with some truth about these systems that makes them easier to align than I suspect.
When he says “cryptographical systems”, he’s clarifying what he meant by “crypto” in the previous few clauses (this is a bit clearer from the video, where you can hear his tone). He often says stuff like this about cryptography and computer security; e.g., see the article Eliezer wrote on Arbital called Show me what you’ve broken:
See AI safety mindset. If you want to demonstrate competence at computer security, cryptography, or AI alignment theory, you should first think in terms of exposing technically demonstrable flaws in existing solutions, rather than solving entire problems yourself. Relevant Bruce Schneier quotes: “Good engineering involves thinking about how things can be made to work; the security mindset involves thinking about how things can be made to fail” and “Anyone can invent a security system that he himself cannot break. Show me what you’ve broken to demonstrate that your assertion of the system’s security means something.”
And above all, aligning superhuman AI is hard for similar reasons to why cryptography is hard. If you do everything right, the AI won’t oppose you intelligently; but if something goes wrong at any level of abstraction, there may be powerful cognitive processes seeking out flaws and loopholes in your safety measures.
When you think a goal criterion implies something you want, you may have failed to see where the real maximum lies. When you try to block one behavior mode, the next result of the search may be another very similar behavior mode that you failed to block. This means that safe practice in this field needs to obey the same kind of mindset as appears in cryptography, of “Don’t roll your own crypto” and “Don’t tell me about the safe systems you’ve designed, tell me what you’ve broken if you want me to respect you” and “Literally anyone can design a code they can’t break themselves, see if other people can break it” and “Nearly all verbal arguments for why you’ll be fine are wrong, try to put it in a sufficiently crisp form that we can talk math about it” and so on. (AI safety mindset)
I did in fact go back and listen to that part, but I interpreted that clarifying expansion as referring to the latter part of your quoted segment only, and the former part of your quoted segment to be separate—using cryptocurrency as a bridging topic to get to cryptography afterwards. Anyway, your interpretation is entirely reasonable as well, and you probably have a much better Eliezer-predictor than I do; it just seemed oddly unconservative to interpolate that much into a transcript proper as part of what was otherwise described as an error correction pass.
The verbatim statement is:
When he says “cryptographical systems”, he’s clarifying what he meant by “crypto” in the previous few clauses (this is a bit clearer from the video, where you can hear his tone). He often says stuff like this about cryptography and computer security; e.g., see the article Eliezer wrote on Arbital called Show me what you’ve broken:
See also So Far: Unfriendly AI Edition:
And Security Mindset and Ordinary Paranoia.
I did in fact go back and listen to that part, but I interpreted that clarifying expansion as referring to the latter part of your quoted segment only, and the former part of your quoted segment to be separate—using cryptocurrency as a bridging topic to get to cryptography afterwards. Anyway, your interpretation is entirely reasonable as well, and you probably have a much better Eliezer-predictor than I do; it just seemed oddly unconservative to interpolate that much into a transcript proper as part of what was otherwise described as an error correction pass.