New proposals are useful mainly insofar as they overcome some subset of barriers which stopped other solutions.
CEV was stopped by being unimplementable, and possibly divergent:
The main problems with CEV include, firstly, the great difficulty of implementing such a program—“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004.
VELM and VETLM are easily implementable (on top of a superior ML algorithm). So does this fit the bill?
Well, we have lots of implementable proposals. What do VELM and VETLM offer which those other implementable proposals don’t? And what problems do VELM and VETLM not solve?
Alternatively: what’s the combination of problems which these solutions solve, which nothing else we’ve thought of simultaneously solves?
Maybe a good post could be about the compromise point between ‘solution proposer has to have familiarity with all other proposals’ and ‘experienced researchers have to evaluate any proposed idea’
What do VELM and VETLM offer which those other implementable proposals don’t? And what problems do VELM and VETLM not solve?
VETLM solves superalignment, I believe. It’s implementable (unlike CEV), and it should not be susceptible to wireheading (unlike RLHF, instruction following, etc) Most importantly, it’s intended to work with an arbitrarily good ML algorithm—the stronger the better.
So, will it self-improve, self-replace, escape, let you turn it off, etc.? Yes, if it thinks that this is what its creators would have wanted.
Will it be transparent? To the point where it can self-introspect and, again if it thinks that being transparent is what its creators would have wanted. If it thinks that this is a worthy goal to pursue, it will self-replace with increasingly transparent and introspective systems.
CEV was stopped by being unimplementable, and possibly divergent:
VELM and VETLM are easily implementable (on top of a superior ML algorithm). So does this fit the bill?
Well, we have lots of implementable proposals. What do VELM and VETLM offer which those other implementable proposals don’t? And what problems do VELM and VETLM not solve?
Alternatively: what’s the combination of problems which these solutions solve, which nothing else we’ve thought of simultaneously solves?
Maybe a good post could be about the compromise point between ‘solution proposer has to have familiarity with all other proposals’ and ‘experienced researchers have to evaluate any proposed idea’
VETLM solves superalignment, I believe. It’s implementable (unlike CEV), and it should not be susceptible to wireheading (unlike RLHF, instruction following, etc) Most importantly, it’s intended to work with an arbitrarily good ML algorithm—the stronger the better.
So, will it self-improve, self-replace, escape, let you turn it off, etc.? Yes, if it thinks that this is what its creators would have wanted.
Will it be transparent? To the point where it can self-introspect and, again if it thinks that being transparent is what its creators would have wanted. If it thinks that this is a worthy goal to pursue, it will self-replace with increasingly transparent and introspective systems.