Signer comments on Is AI alignment a purely functional property?

Signer 15 Dec 2024 23:47 UTC
4 points
2
There is no such disagreement, you just can’t test all inputs. And without knowledge of how internals work, you may me wrong about extrapolating alignment to future systems.
- Roko 18 Dec 2024 5:18 UTC
  4 points
  0
  Parent
  There are plenty of systems where we rationally form beliefs about likely outputs from a system without a full understanding of how it works. Weather prediction is an example.
  - Signer 18 Dec 2024 15:06 UTC
    2 points
    0
    Parent
    What makes it rational is that there is an actual underlying hypothesis about how weather works, instead of vague “LLMs are a lot like human uploads”. And weather prediction outputs numbers connected to reality we actually care about. And there is no alternative credible hypothesis that implies weather prediction not working.
    
    I don’t want to totally dismiss empirical extrapolations, but given the stakes, I would personally prefer for all sides to actually state their model of reality and how they think evidence changed it’s plausibility, as formally as possible.