Question: are we assuming that mesa optimizer and distributional shift problems have been solved somehow? Or should we assume that some context shift might suddenly cause the Oracle to start giving answered that aren’t optimized for the objective function that we have in mind, and plan our questions accordingly?
Where (under which assumption) would you suggest that people focus their efforts?
Also, what level of capability should we assume the Oracle to have, or which assumption about level of capability would you suggest that people focus their efforts on?
Your examples all seem to assume oracles that are superhumanly intelligent. If that’s the level of capability we should target with our questions, should we assume that we got this Oracle through a local or distributed takeoff? In other words, does the rest of the world look more or less like today’s or are there lots of other almost-as-capable AIs around?
ETA: The reason for asking these questions is that you’re only giving one prize for each type of Oracle, and would probably not give the prize to a submission that assumes something you think is very unlikely. It seems good to communicate your background views so that people aren’t surprised later when you don’t pick them as winners due to this kind of reason.
The ideal solution would have huge positive impacts and complete safety, under minimal assumptions. More realistically, there will be a tradeoff between assumptions and impact.
I’m not suggesting any area for people to focus their efforts, because a very effective approach with minimal assumptions might win, or a fantastically effective approach under stronger assumptions. It’s hard to tell in advance what will be the most useful.
Question: are we assuming that mesa optimizer and distributional shift problems have been solved somehow? Or should we assume that some context shift might suddenly cause the Oracle to start giving answered that aren’t optimized for the objective function that we have in mind, and plan our questions accordingly?
Assume either way, depending on what your suggestion is for.
Where (under which assumption) would you suggest that people focus their efforts?
Also, what level of capability should we assume the Oracle to have, or which assumption about level of capability would you suggest that people focus their efforts on?
Your examples all seem to assume oracles that are superhumanly intelligent. If that’s the level of capability we should target with our questions, should we assume that we got this Oracle through a local or distributed takeoff? In other words, does the rest of the world look more or less like today’s or are there lots of other almost-as-capable AIs around?
ETA: The reason for asking these questions is that you’re only giving one prize for each type of Oracle, and would probably not give the prize to a submission that assumes something you think is very unlikely. It seems good to communicate your background views so that people aren’t surprised later when you don’t pick them as winners due to this kind of reason.
The ideal solution would have huge positive impacts and complete safety, under minimal assumptions. More realistically, there will be a tradeoff between assumptions and impact.
I’m not suggesting any area for people to focus their efforts, because a very effective approach with minimal assumptions might win, or a fantastically effective approach under stronger assumptions. It’s hard to tell in advance what will be the most useful.