legibility allows easier ‘out of domain’ alarms but also easier munchkining, requiring greater tightness.
expense and legibility → security via obscurity
How could parameterizing proxy space fail?
choosing an efficient representation of proxies via an orthogonal carving up of proxy space is potentially bad because it robs you of consilience. Overlapping proxies give you more chances to construct and notice alarms.
quantity>quality of proxies?
orthogonality often considered desirable for modularity ie legibility of side effects.
long feedback cycles increase time for proxy drift.
underrated?: construction of new proxies and adversarial attacking of them for practice.
Inspiring! Decided to try brainstorming about proxies.
What causes proxy drift?
proxy optimized until outside its domain of validity with no legible alarm
landscape changes along dimensions that were not known to impact proxy, or impacted underlying thing desired to be measured without impacting proxy.
both consequences of proxies sharing some but not all dimensions with the underlying quantity.
few shared dimensions: fragile proxies
more shared dimensions: more expensive for adversarial munchkining to disentangle (ie find a lever that moves one only along non-shared dimensions)
key dimensions?: legibility, cost, tightness (overlapping dimensions/relevant dimensions?)
legibility allows easier ‘out of domain’ alarms but also easier munchkining, requiring greater tightness.
expense and legibility → security via obscurity
How could parameterizing proxy space fail?
choosing an efficient representation of proxies via an orthogonal carving up of proxy space is potentially bad because it robs you of consilience. Overlapping proxies give you more chances to construct and notice alarms.
quantity>quality of proxies?
orthogonality often considered desirable for modularity ie legibility of side effects.
long feedback cycles increase time for proxy drift.
underrated?: construction of new proxies and adversarial attacking of them for practice.