eternal/ephemera

Karma: 11

eternal/ephemera Oct 12, 2024, 6:04 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Clarifying Alignment Fundamentals Through the Lens of Ontology
It’s not about comparing a process to universal ontology, it’s about comparing it to one’s internal model of the universal ontology, which we then hope is good enough. In the ethics dataset, that could look like reductio ad absurdum on certain model processes, e.g.: “You have a lot of fancy reasoning here for why you should kill an unspecified man on the street, but it must be wrong because it reaches the wrong conclusion.”

(Ethics is a bit of a weird example because the choices aren’t based around trying to infer missing information, as is paradigmatic of the personal/universal tension, but the dynamic is similar.)

Predicting the future 10,000 years hence has much less potential for this sort of reductio, of course. So I see your point. It seems like in such cases, humans can only provide feedback via comparison to our own learned forecasting strategies. But even this bears similar structure.

We can view the real environment that we learned our forecasting strategies from as the “toy model” that we are hoping will generalize well enough to the 10,000 year prediction problem. Then, the judgement we provide on the AI’s processes is the stand-in for actually running those processes in the toy model. Instead of seeing how well the AI’s methods do by simulating them in the toy model, we compare its methods to our own methods, which evolved due to success in the model.

Seeing things like this allows us to identify two distinct points of failure in the humans-judging-processes setup:
1. The forecasting environment humans learned in may not bear enough similarity to the 10,000 year forecasting problem.
2. Human judgement is just a lossy signal for actual performance on that environment they learned in; AI methods that would perform well in the human’s environment may still get rated poorly by humans, and vice versa.
So it seems to me that the general model of the post can understand these cases decently well, but the concepts are definitely a bit slippery and this is the area that I feel most uncertain about here.

eternal/ephemera Oct 11, 2024, 9:25 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Clarifying Alignment Fundamentals Through the Lens of Ontology
Hey, thanks for the comment. Part of what I like about this framework is that it provides an account for how we do that process of “somehow judging things as true”. Namely, that we develop personal concepts that correspond with universal concepts via the various forces that change our minds over time.

We can’t access universal ontology ourselves, but reasoning about it allows us to state things precisely—it provides a theoretical standard for whether a process aimed at determining truth succeeds or not.

Do you have an example for domains where ground truth is unavailable, but humans can still make judgements about what processes are good to use? I’d claim that most such cases involve a thought experiment, i.e. a model about how the world works that implies a certain truth-finding method will be successful.

Clarifying Alignment Fundamentals Through the Lens of Ontology

eternal/ephemeraOct 7, 2024, 8:57 PM

12 points

4 comments24 min readLW link

eternal/ephemera

Clar­ify­ing Align­ment Fun­da­men­tals Through the Lens of Ontology

Clarifying Alignment Fundamentals Through the Lens of Ontology