I don’t have all the cognitive context booted up of what exact essays are part of AI Safety Fundamentals, so do please forgive me if something here does end up being covered and I just forgot about an important essay, but as a quick list of things that I vaguely remember missing:
Having good intuitions for how smart a superintelligence could really be. Arguments for the lack of upper limit of intelligence.
Having good intuitions for complexity of value. That even if you get an AI aligned with your urges and local desires, this doesn’t clearly get you that far towards an AGI you would feel comfortable optimizing things on their own.
Somehow communicating the counterintuiveness of optimization. Classical examples that have helped me are the cannibal bug examples from the sequences. The genetic algorithm that developed an antenna (the specification gaming Deepmind post never really got this across for me)
Security mindset stuff
Something about the set of central intuitions I took away from Paul’s work. I.e. something in the space of “try to punt as much of the problem to systems smarter than you”.
Eternity in six hours style stuff. Trying to understand the scale of the future. This has been very influential on my models of what kinds of goals an AI might have.
Civilizational inadequacy stuff. A huge component of people’s differing views on what to do about AI Risk seems to be sources in disagreements on the degree to which humanity at large does crazy things when presented with challenges. I think that’s currently completely not covered in AGISF.
There are probably more things, and some things on this list are probably wrong since I only skimmed the curriculum again, but hopefully it gives a taste.
Curious which intuitions you think most fail to come across?
I don’t have all the cognitive context booted up of what exact essays are part of AI Safety Fundamentals, so do please forgive me if something here does end up being covered and I just forgot about an important essay, but as a quick list of things that I vaguely remember missing:
Having good intuitions for how smart a superintelligence could really be. Arguments for the lack of upper limit of intelligence.
Having good intuitions for complexity of value. That even if you get an AI aligned with your urges and local desires, this doesn’t clearly get you that far towards an AGI you would feel comfortable optimizing things on their own.
Somehow communicating the counterintuiveness of optimization. Classical examples that have helped me are the cannibal bug examples from the sequences. The genetic algorithm that developed an antenna (the specification gaming Deepmind post never really got this across for me)
Security mindset stuff
Something about the set of central intuitions I took away from Paul’s work. I.e. something in the space of “try to punt as much of the problem to systems smarter than you”.
Eternity in six hours style stuff. Trying to understand the scale of the future. This has been very influential on my models of what kinds of goals an AI might have.
Civilizational inadequacy stuff. A huge component of people’s differing views on what to do about AI Risk seems to be sources in disagreements on the degree to which humanity at large does crazy things when presented with challenges. I think that’s currently completely not covered in AGISF.
There are probably more things, and some things on this list are probably wrong since I only skimmed the curriculum again, but hopefully it gives a taste.