Charlie Steiner answers Hypothetical: what would you do?

Charlie Steiner 3 Aug 2023 22:51 UTC
9 points
3
I endorse the “overly galaxy brained strategy.” If you actually understand why it’s not useful even as a step towards some other alignment scheme that works for superintelligence, you should just drop it and think about other things.
However, usually things aren’t so cut and dried. In the course of arriving at the epistemic state hypothesized above, it’s probably a good idea to talk to some other safety researchers.
Generally if you think of something that’s super useful for present-day systems, it’s related to ideas that are useful for future systems. In that case, I endorse attempting to study your idea for its safety properties for a while and then eventually publishing (preferably just in time to scoop people in industry who are thinking about similar things :P ).
- JNS 5 Aug 2023 19:07 UTC
  1 point
  0
  Parent
  My hypothetical self thanks you for your input and has punted the issues to the real me.
  I feel like I need to dig a little bit into this
  If you actually understand why it’s not useful
  Honestly I don’t know for sure I do, how can I when everything is so ill-defined and we have so few scraps of solid fact to base things on.
  That said, there is a couple of issue, and the major one is grounding, or rather the lack of grounding.
  Grounding is IMO a core problem, although people rarely talk about, I think that mainly comes about because we (humans) seemingly have solved it.
  I don’t think that’s the case at all, but because our cognition is pretty dang good at heuristics and error correction it rarely gets noticed, and even high impact situation are in the grand scheme of things not particular noteworthy.
  The hypothetical architecture my hypothetical self has been working on, cannot do proper grounding^[1], the short version is it does something that looks a little like what humans do, so heuristics based and error prone.
  Now that should scale somewhat (but how much?), but errors persist and at SI capability level the potential consequences look scary.
  (the theme here is uncertainty and that actually crops up all over the place)
  Anyways, an accord has been reached, the hypothetical architecture will exit mind space and be used to build a PoC.
  Primary goal is to see does it work (likely not), secondary is to get a feel for how much fidelity is lost, tertiary goal is to try and gauge how manageable uncertainty and errors are.
  No need to contemplate any further steps until I know if its workable in the real world.
  (obviously if it works in some capacity, tangible results will be produced, and that could hopefully be used to engage with others and do more work on accessing potential usefulness for AI safety in general).
  1. ^
    Having actually looked at the problem, I don’t think it is solvable, I mean solvable in the sense that its provable error free.