Also look at the inverse direction. Right now, we finally found a trick that works well enough to form a useful AI. And it has serious issues (with hallucination, being threatening or creepy, or making statements that disagree with a web search or writing code that doesn’t compile and work). So to fix those issues, now that we know about them, there are straightforward things to try to fix the issues.*
...Would an alignment genius of any stripe, without knowing the results, have been able to predict the issues current models have and design algorithms for the countermeasures?
This is a testable query. Did anyone in the alignment field predict (1) the current problems with hallucination** (2) propose a solution?
*recursion. It appears that current models can examine text that happens to be their own output and can be prompted to inspect it for errors in logic and fact. This suggests a way to fix this particular issue with current scale llms.
**how did they do it. The field of mathematics doesn’t cover complex networks of arbitrary learned functions hallucinating information does it...
Another point: for fission, there are high performance reactor designs that can never be safe. Possibly for AGI this is the same, that only very restricted, specific designs are mostly safe, and anything else is not. There may not exist a general alignment solution. There is not for fission, and high performance reactors (like nuclear salt water rocket engines) are absurdly unsafe.
For fission it also took years and thousands of people working on it, armed with data from previous reactors and previous meltdowns, before they arrived at Mostly Safe designs.
Also look at the inverse direction. Right now, we finally found a trick that works well enough to form a useful AI. And it has serious issues (with hallucination, being threatening or creepy, or making statements that disagree with a web search or writing code that doesn’t compile and work). So to fix those issues, now that we know about them, there are straightforward things to try to fix the issues.*
...Would an alignment genius of any stripe, without knowing the results, have been able to predict the issues current models have and design algorithms for the countermeasures?
This is a testable query. Did anyone in the alignment field predict (1) the current problems with hallucination** (2) propose a solution?
*recursion. It appears that current models can examine text that happens to be their own output and can be prompted to inspect it for errors in logic and fact. This suggests a way to fix this particular issue with current scale llms.
**how did they do it. The field of mathematics doesn’t cover complex networks of arbitrary learned functions hallucinating information does it...
Another point: for fission, there are high performance reactor designs that can never be safe. Possibly for AGI this is the same, that only very restricted, specific designs are mostly safe, and anything else is not. There may not exist a general alignment solution. There is not for fission, and high performance reactors (like nuclear salt water rocket engines) are absurdly unsafe.
For fission it also took years and thousands of people working on it, armed with data from previous reactors and previous meltdowns, before they arrived at Mostly Safe designs.