DPiepgrass comments on All AGI Safety questions welcome (especially basic ones) [~monthly thread]

DPiepgrass 2 Nov 2022 19:53 UTC
2 points
0
Are AGIs with bad epistemics more or less dangerous? (By “bad epistemics” I mean a tendency to believe things that aren’t true, and a tendency to fail to learn true things, due to faulty and/or limited reasoning processes… or to update too much / too little / incorrectly on evidence, or to fail in peculiar ways like having beliefs that shift incoherently according to the context in which an agent finds itself)
It could make AGIs more dangerous by causing them to act on beliefs that they never should have developed in the first place. But it could make AGIs less dangerous by causing them to make exploitable mistakes, or fail to learn facts or techniques that would make them too powerful.
Note: I feel we aspiring rationalists haven’t really solved epistemics yet (my go-to example: if Alice and Bob tell you X, is that two pieces of evidence for X or just one?), but I wonder how, if it were solved, it would impact AGI and alignment research.
- Matt Goldenberg 8 Nov 2022 15:15 UTC
  2 points
  0
  Parent
  But it could make AGIs less dangerous by causing them to make exploitable mistakes, or fail to learn facts or techniques that would make them too powerful.
  There is in fact class of AI safety proposals that try to make the AI unaware of certain things, such as not being aware that there is a shutdown button for it.
  One of the issues with these types of proposals is that as the AI gets smarter/more powerful, it has to come up with increasingly crazy hypotheses about the world to ignore the facts that all the evidence is pointing towards (such as the fact that there’s a conspiracy, or an all powerful being, or something doing this). This could, in the long term, cause it to be very dangerous in it’s unpredictability.
- Koen.Holtman 3 Nov 2022 18:22 UTC
  1 point
  0
  Parent
  It depends. But yes, incorrect epistemics can make an AGI safer, if it is the right and carefully calibrated kind of incorrect. A goal-directed AGI that incorrectly believes that its off switch does not work will be less resistant to people using it. So the goal is here to design an AGI epistemics that is the right kind of incorrect.
  
  Note: designing an AGI epistemics that is the right kind of incorrect seems to go against a lot of the principles that aspiring rationalists seem to hold dear, but I am not an aspiring rationalist. For more technical info on such designs, you can look up my sequence on counterfactual planning.