Sean Hardy comments on All AGI Safety questions welcome (especially basic ones) [May 2023]

Sean Hardy 9 May 2023 9:54 UTC
1 point
0
What about simulating smaller aspects of cognition that can be chained like CoT with GPT? You can use self-criticism to align and assess its actions relative to a bunch of messy human abstractions. How does that scenario lead to doom? If it was misaligned, I think a well-instantiated predictive model could update its understanding of our values from feedback, predicting how a corrigible AI would act