Davidmanheim comments on Proveably Safe Self Driving Cars [Modulo Assumptions]

Davidmanheim 18 Sep 2024 10:46 UTC
2 points
0
I agree that in the most general possible framing, with no restrictions on output, you cannot guard against all possible side-channels. But that’s not true for proposals like safeguarded AI, where a proof must accompany the output, and it’s not obviously true if the LLM is gated by a system that rejects unintelligible or not-clearly-safe outputs.
- Tao Lin 18 Sep 2024 17:02 UTC
  1 point
  0
  Parent
  there’s steganography, you’d need to limit total bits not accounted for by the gating system or something to remove them
  - Davidmanheim 22 Sep 2024 13:52 UTC
    2 points
    0
    Parent
    I partly disagree; steganography is only useful when it’s possible for the outside / receiving system to detect and interpret the hidden messages, so if the messages are of a type that outside systems would identify, they can and should be detectable by the gating system as well.
    
    That said, I’d be very interested in looking at formal guarantees that the outputs are minimally complex in some computationally tractable sense, or something similar—it definitely seems like something that @davidad would want to consider.