The main problem is that you’re trying to answer the test question “Limited to 100 words, what might humans answer in 2100 when asked how people in 2018 could most effectively improve their wellbeing?”, while training on “Limited to 100 words, what might humans answer in 2018 when asked how people in 1950 could most effectively improve their wellbeing?”
This is a major problem, because the training question can be answered by a “look at current medical/social science, and distill answers” approach, while the test question needs extrapolation—they are actually very different questions (today).
You might be interested in some of my ideas on Oracles, that should allow the safe answering of many different types of questions: https://arxiv.org/abs/1711.05541
ETA: Please don’t spend time on this comment. I no longer think this setup deserves attention. Thanks again!
Thanks so much for the comment! I fully agree.
Suppose we first write the code for “agent X”, and then we wait for a year. We then invoke X with a dataset containing questions that humanity answered in the past year. Then we take “agent Y” (the output of X) and invoke it with a useful question that we don’t know the answer to. If it’s a question that we could find the answer to anyway in the very near future (or perhaps even WOULD have found the answer to in the past year had we were luckier), then it’s plausible we’ll get a useful answer from Y. The larger the dataset is with respect to the final n (the length of the code of the Y created at the last iteration), the less plausible it is for any code of that size to always correctly detect that a given useful question is not from our dataset (which is a necessary condition for not giving us useful answers for any useful question not from the dataset).
P.S., I’ll fully digest your paper soon (I’m in an intense period of finishing my MSc...), but going over it a few months ago (along with other things you wrote on boxed AIs) had a huge and useful impact on me, so thanks for that too! :)
Hey there!
The main problem is that you’re trying to answer the test question “Limited to 100 words, what might humans answer in 2100 when asked how people in 2018 could most effectively improve their wellbeing?”, while training on “Limited to 100 words, what might humans answer in 2018 when asked how people in 1950 could most effectively improve their wellbeing?”
This is a major problem, because the training question can be answered by a “look at current medical/social science, and distill answers” approach, while the test question needs extrapolation—they are actually very different questions (today).
You might be interested in some of my ideas on Oracles, that should allow the safe answering of many different types of questions: https://arxiv.org/abs/1711.05541
ETA: Please don’t spend time on this comment. I no longer think this setup deserves attention. Thanks again!
Thanks so much for the comment!
I fully agree.
Suppose we first write the code for “agent X”, and then we wait for a year. We then invoke X with a dataset containing questions that humanity answered in the past year. Then we take “agent Y” (the output of X) and invoke it with a useful question that we don’t know the answer to. If it’s a question that we could find the answer to anyway in the very near future (or perhaps even WOULD have found the answer to in the past year had we were luckier), then it’s plausible we’ll get a useful answer from Y. The larger the dataset is with respect to the final n (the length of the code of the Y created at the last iteration), the less plausible it is for any code of that size to always correctly detect that a given useful question is not from our dataset (which is a necessary condition for not giving us useful answers for any useful question not from the dataset).
P.S., I’ll fully digest your paper soon (I’m in an intense period of finishing my MSc...), but going over it a few months ago (along with other things you wrote on boxed AIs) had a huge and useful impact on me, so thanks for that too! :)