Bogdan Ionut Cirstea comments on Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis

Bogdan Ionut Cirstea 2 Feb 2024 22:50 UTC
4 points
0
‘theoretical paper of early 2023 that I can’t find right now’ → perhaps you’re thinking of Fundamental Limitations of Alignment in Large Language Models? I’d also recommend LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?.