Ebenezer Dukakis comments on My AI Model Delta Compared To Yudkowsky

Ebenezer Dukakis 11 Jun 2024 10:29 UTC
7 points
2
I think this quantum fields example is perhaps not all that forceful, because in your OP you state

maybe a faithful and robust translation would be so long in the system’s “internal language” that the translation wouldn’t fit in the system

However, it sounds like you’re describing a system where we represent humans using quantum fields as a routine matter, so fitting the translation into the system isn’t sounding like a huge problem? Like, if I want to know the answer to some moral dilemma, I can simulate my favorite philosopher at the level of quantum fields in order to hear what they would say if they were asked about the dilemma. Sounds like it could be just as good as an em, where alignment is concerned.

It’s hard for me to imagine a world where developing representations that allow you to make good next-token predictions etc. doesn’t also develop representations that can somehow be useful for alignment. Would be interested to hear fleshed-out counterexamples.