habryka comments on Catching the Eye of Sauron

habryka 7 Apr 2023 23:25 UTC
12 points
4
People have been trying to write this for years, but it’s genuinely hard. Eliezer wrote a lot of it on Arbital, but it is too technical for this purpose. Richard Ngo has been writing distillations for a while, and I think they are pretty decent, but IMO currently fail to really actually get the intuitions across and connect things on a more intuitive and emotional level. Many people have written books, but all of them had a spin on them that didn’t work for a lot of people.
There are really a ton of things you can send people if they ask for something like this. Tons of people have tried to make this. I don’t think we have anything perfect, but I really don’t think it’s for lack of trying.
- Richard_Ngo 13 Apr 2023 1:22 UTC
  2 points
  0
  Parent
  Curious which intuitions you think most fail to come across?
  - habryka 14 Apr 2023 4:33 UTC
    10 points
    0
    Parent
    I don’t have all the cognitive context booted up of what exact essays are part of AI Safety Fundamentals, so do please forgive me if something here does end up being covered and I just forgot about an important essay, but as a quick list of things that I vaguely remember missing:
    Having good intuitions for how smart a superintelligence could really be. Arguments for the lack of upper limit of intelligence.
    Having good intuitions for complexity of value. That even if you get an AI aligned with your urges and local desires, this doesn’t clearly get you that far towards an AGI you would feel comfortable optimizing things on their own.
    Somehow communicating the counterintuiveness of optimization. Classical examples that have helped me are the cannibal bug examples from the sequences. The genetic algorithm that developed an antenna (the specification gaming Deepmind post never really got this across for me)
    Security mindset stuff
    Something about the set of central intuitions I took away from Paul’s work. I.e. something in the space of “try to punt as much of the problem to systems smarter than you”.
    Eternity in six hours style stuff. Trying to understand the scale of the future. This has been very influential on my models of what kinds of goals an AI might have.
    Civilizational inadequacy stuff. A huge component of people’s differing views on what to do about AI Risk seems to be sources in disagreements on the degree to which humanity at large does crazy things when presented with challenges. I think that’s currently completely not covered in AGISF.
    There are probably more things, and some things on this list are probably wrong since I only skimmed the curriculum again, but hopefully it gives a taste.