The first AI compresses advice that would allow answering the questions well on average into a short string. Then when “giving the human the information” it gets the human to memorize this short string using e.g. a memory palace technique, perhaps with error correcting codes. Then the second AI extracts this string from the human and uses it to answer the questions.
The second AI doesn’t get to extract anything, nor does it answer anything—it gets literally no information from the human.
What it would have to do is instruct the human (sight unseen) on how to use the hidden string to answer the question.
I don’t think that scenario quite works out for the AIs in most situations, but I am wary that there is something like that, something the AIs can do to allow the human to answer correctly without understanding.
Why wouldn’t it work? The second AI takes the question “ratios of FtsZ to Pikachurin plus C3-convertase to Von Willebrand Factor will rise in proportion to the average electronegativity in the Earth’s orbit” and translates it to something like “is the third object in the memory palace a warm color”.
Yes, if a human is able and willing to do better with a memory palace rather than true understanding, this could work. Note that it’s rather brittle, though (it needs to have the two AIs aware of the exact sequence), and we might be able to break it that way (increasing uncertainties between the two AIs in some way).
The big unknown is that if we have a billion+ questions to choose from, whether memorised data will perform better than genuineish understanding.
Does the following strategy for both AIs work:
The first AI compresses advice that would allow answering the questions well on average into a short string. Then when “giving the human the information” it gets the human to memorize this short string using e.g. a memory palace technique, perhaps with error correcting codes. Then the second AI extracts this string from the human and uses it to answer the questions.
The second AI doesn’t get to extract anything, nor does it answer anything—it gets literally no information from the human.
What it would have to do is instruct the human (sight unseen) on how to use the hidden string to answer the question.
I don’t think that scenario quite works out for the AIs in most situations, but I am wary that there is something like that, something the AIs can do to allow the human to answer correctly without understanding.
Why wouldn’t it work? The second AI takes the question “ratios of FtsZ to Pikachurin plus C3-convertase to Von Willebrand Factor will rise in proportion to the average electronegativity in the Earth’s orbit” and translates it to something like “is the third object in the memory palace a warm color”.
Yes, if a human is able and willing to do better with a memory palace rather than true understanding, this could work. Note that it’s rather brittle, though (it needs to have the two AIs aware of the exact sequence), and we might be able to break it that way (increasing uncertainties between the two AIs in some way).
The big unknown is that if we have a billion+ questions to choose from, whether memorised data will perform better than genuineish understanding.