Noosphere89 comments on My disagreements with “AGI ruin: A List of Lethalities”

Noosphere89 Sep 16, 2024, 8:00 PM
0 points
0

Basically any slow-takeover planning can happen from within a simulation; it just has to watch the realistic history, separate it from obviously faked history (this is a human-level or slightly superhuman intelligence trained on literally all the world’s data, right?),

I actually think I disagree with the assumption that it will be trained on all of the world’s data, for reasons related to capabilities, and I personally expect the amount of IRL data to be much, much lower, plausibly only 0.1% of what the model actually sees or knows.

I think some of my cruxes are that given that it’s inputs are almost likely to be mostly synthetic data, that there is no reason for the AI to elevate the hypothesis that it’s in a sim different from our world.

My other crux is I usually imagine values data being trained on by models before it becomes very capable, and given that alignment generalizes farther than capabilities, I don’t buy the assumption that misaligned goals will naturally emerge in a neural network trained on synthetic data.
- Ben Livengood Sep 17, 2024, 9:36 PM
  1 point
  0
  Parent
  If a model trained on synthetic data is expected to have good performance out of distribution (on real-world problems) then I think that it would also be expected to have high performance at assessing whether it’s in a simulation. Narrowing its domain of expertise sounds distinctly harder than using mostly synthetic data.
  
  If it’s a model limited to e.g. the world of true mathematical theorems from synthetic data then perhaps this would narrow its capabilities enough. I don’t know what happens if such a model starts investigating theorems about decision theory and statistical inference and machine learning. At some point, self-identification seems likely. I am not sure how to test the effectiveness of synthetic data on models that achieve self-awareness.
  - Satron Nov 16, 2024, 4:39 PM
    1 point
    0
    Parent
    “If a model trained on synthetic data is expected to have good performance out of distribution (on real-world problems) then I think that it would also be expected to have high performance at assessing whether it’s in a simulation.”
    
    Noosphere89, you have marked this sentence with a “disagree” emoji. Would you mind expanding on that? I think it is a pretty important point and I’d love to see why you disagree with Ben.
    - Noosphere89 Nov 16, 2024, 5:11 PM
      6 points
      3
      Parent
      I’m less confident in this position since I put on a disagree emoji, but my reason is because it’s much easier to control an AIs data sources for training than it is for humans, which means it’s quite easy in theory (but might be difficult in practice, which worries me) to censor just enough data such that the model doesn’t even think that it’s likely in a simulation that doesn’t add up to normality.