Avoid Optimizing Proxy Measures I have a gut-level averse reaction to being comforted by a reason my decision or action may work out when I only saw that reason in order to feel safer. Here is a possible exercise to show students that optimizing proxy measures is a terrible general algorithm: hand out a worksheet of two questions to each student, asking them to answer individually. There are two possible worksheets. The first says “1. Describe in a short paragraph Vatican City. 2. Write any short paragraph you like containing the following concepts: bright, cyclic, white, round, and dappled.” The second says “1. Describe in a short paragraph the appearance of the moon. 2. Write any short paragraph you like containing the following concepts: tiny, historic, beautiful, religious, gardens.” Then score every paragraph by giving each a score from 0 to 1 based on how well they mentioned each of the 5 concepts, and a final score from 0 to 5 by summing the scores, and note how the proxy measure is reasonable at picking out good descriptions (is it? playtest) if you don’t optimize for the proxy measure but terrible if you do.
Avoid Optimizing Proxy Measures I have a gut-level averse reaction to being comforted by a reason my decision or action may work out when I only saw that reason in order to feel safer. Here is a possible exercise to show students that optimizing proxy measures is a terrible general algorithm: hand out a worksheet of two questions to each student, asking them to answer individually. There are two possible worksheets. The first says “1. Describe in a short paragraph Vatican City. 2. Write any short paragraph you like containing the following concepts: bright, cyclic, white, round, and dappled.” The second says “1. Describe in a short paragraph the appearance of the moon. 2. Write any short paragraph you like containing the following concepts: tiny, historic, beautiful, religious, gardens.” Then score every paragraph by giving each a score from 0 to 1 based on how well they mentioned each of the 5 concepts, and a final score from 0 to 5 by summing the scores, and note how the proxy measure is reasonable at picking out good descriptions (is it? playtest) if you don’t optimize for the proxy measure but terrible if you do.