Understanding [how to design] rather than ‘growing’ search/agency-structure would actually equal solving inner alignment, if said structure does not depend on what target[1] it is intended to be given, i.e. is targetable (inner-alignable) rather than target-specific.[2]
Such an understanding would simultaneously qualify as of ‘how to code a capable AI’, but would be fundamentally different from what labs are doing in an alignment-relevant way. In this framing, labs are selecting for target-specific structures (that we don’t understand). (Another difference is that, IIRC, Johannes might intend not to share research on this publicly, but I’m less sure after rereading the quote that gave me that impression[3]).
I don’t have such a convincing portfolio for doing research yet. And doing this seems to be much harder. Usually, the evaluation of such a portfolio requires technical expertise—e.g. how would you know if a particular math formalism makes sense if you don’t understand the mathematical concepts out of which the formalism is constructed?
Of course, if you have a flashy demo, it’s a very different situation. Imagine I had a video of an algorithm that learns Minecraft from scratch within a couple of real-time days, and then gets a diamond in less than 1 hour, without using neural networks (or any other black box optimization). It does not require much technical knowledge to see the significance of that.
But I don’t have that algorithm, and if I had it, I would not want to make that publicly known. And I am unsure what is the cutoff value. When would something be bad to publish? All of this complicates things.
(After rereading this I’m not actually sure what that means they’d be okay sharing or if they’d intend to share technical writing that’s not a flashy demo)
So I agree it would be an advance, but you could solve inner alignment in the sense of avoiding mesaoptimizers, yet fail to solve inner alignment in the senses of predictable generalization or stability of generalization across in-lifetime learning.
Understanding [how to design] rather than ‘growing’ search/agency-structure would actually equal solving inner alignment, if said structure does not depend on what target[1] it is intended to be given, i.e. is targetable (inner-alignable) rather than target-specific.[2]
Such an understanding would simultaneously qualify as of ‘how to code a capable AI’, but would be fundamentally different from what labs are doing in an alignment-relevant way. In this framing, labs are selecting for target-specific structures (that we don’t understand). (Another difference is that, IIRC, Johannes might intend not to share research on this publicly, but I’m less sure after rereading the quote that gave me that impression[3]).
includes outer alignment goals
If it’s not clear what I mean, reading this about my background model might help, also feel free to ask me questions
from one of Johannes’ posts:
(After rereading this I’m not actually sure what that means they’d be okay sharing or if they’d intend to share technical writing that’s not a flashy demo)
So I agree it would be an advance, but you could solve inner alignment in the sense of avoiding mesaoptimizers, yet fail to solve inner alignment in the senses of predictable generalization or stability of generalization across in-lifetime learning.