Do you regard Concrete AI Safety Problems as a fictional world-building exercise? Or are you classifying that as “AI Safety” as opposed to “AI Alignment”?
In physics, we can try to reason about black holes and the big bang by inserting extreme values into the equations we know as the laws of physics, laws we got from observing less extreme phenomena. Would this also be ‘a fictional-world-building exercise’ to you?
Reasoning about AGI is similar to reasoning about black holes: both of these do not necessarily lead to pseudo-science, though both also attract a lot of fringe thinkers, and not all of them think robustly all of the time.
In the AGI case, the extreme value math can be somewhat trivial, if you want it. One approach is to just take the optimal policy π∗ defined by a normal MDP model, and assume that the AGI has found it and is using it. If so, what unsafe phenomena might we predict? What mechanisms could we build to suppress these?
Do you regard Concrete AI Safety Problems as a fictional world-building exercise? Or are you classifying that as “AI Safety” as opposed to “AI Alignment”?
I think that AI Safety can be a subfield of AI Alignment, however I see a distinction between AI as current ML models and AI as theoretical AGI.
Okay, so “AI Alignment (of current AIs)” is scientific and rigorous and falsifiable, but “AGI Alignment” is a fictional world-building exercise?
Yeah, that is somewhat my perception.
In physics, we can try to reason about black holes and the big bang by inserting extreme values into the equations we know as the laws of physics, laws we got from observing less extreme phenomena. Would this also be ‘a fictional-world-building exercise’ to you?
Reasoning about AGI is similar to reasoning about black holes: both of these do not necessarily lead to pseudo-science, though both also attract a lot of fringe thinkers, and not all of them think robustly all of the time.
In the AGI case, the extreme value math can be somewhat trivial, if you want it. One approach is to just take the optimal policy π∗ defined by a normal MDP model, and assume that the AGI has found it and is using it. If so, what unsafe phenomena might we predict? What mechanisms could we build to suppress these?