RogerDearnaley comments on Managing catastrophic misuse without robust AIs

RogerDearnaley 17 Jan 2024 2:17 UTC
LW: 1 AF: 1
0
AF
This all seems very sensible, and I must admit, I had been basically assuming that things along these lines were going to occur, once risks from frontier models became significant enough. Likely via a tiered series of a cheap weak filter passing the most suspicious X% plus a random Y% of its input to a stronger more expensive filter, and so on up to more routine/cheaper and finally more expensive/careful human oversight. Another obvious addition for the cybercrime level of risk would be IP address logging of particularly suspicious queries, and not being able to use the API via a VPN that hides your IP address if you’re unpaid and unsigned-in, or seeing more refusals if you do.
I also wouldn’t assume that typing a great many queries with clearly-seriously-criminal intent into a search engine in breach of its terms of use was an entirely risk-free thing to do, either — or if it were now, that it will remain so with NLP models becoming cheaper.
Obviously open-source models are a separate question here: about the best approach currently available for them is, as you suggest above, filtering really dangerous knowledge out of their training set.