If we model reachability of an objective as simply its length in bits, then distinguishing O-base from every single more reachable O-mesa gets exponentially harder as O-base gets more complex. Thus, for a very complicated O-base, sufficiently incentivizing the base optimizer to find a mesa-optimizer with that O-base is likely to be very difficult, though not impossible
What is the intuition that makes you think that despite being expoentially harder this would not be impossible?
What is the intuition that makes you think that despite being expoentially harder this would not be impossible?