I’d be fine with it throwing a brick at me. It beats it having the patience to take over the entire world. The point is, if it throws a brick at me, I have data on what went wrong with its utility function and I have a lead on how to fix it.
EulersApprentice
Here’s my attempt at solving the puzzle you provide – I believe the following procedure will yield a list of approximate values for the E-Coli bacterium. (It’d take a research team and several years, but in principle it is possible.)
Isolate each distinct protein present in E-Coli individually. (The research I found (https://www.pnas.org/content/100/16/9232, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332353/) puts the number of different proteins in E-Coli at 1-4 thousand, which makes this difficult but not completely infeasible.)
For each protein, create a general list of its effects on the biochemical environment within the cell.
Collect each effect that is redundantly produced by several distinct proteins simultaneously (say, 10+). This gives us a rough estimate of the bacteria’s values, though it is not yet determined which are more instrumental and which are more terminal in nature.
Organize the list into a graph that links properties by cause and effect. (Example: Higher NaCl concentration makes the cell more hypertonic relative to the environment.)
For each biochemical condition on the chart, evaluate its causes and its effects. Conditions with many identifiable causes but few identifiable effects are more likely to be terminal goals than conditions with few identifiable causes but many identifiable effects. In this way, a rough hierarchy can be determined between terminal-ish goals and instrumental-ish goals.
Potential issues with this plan:
This procedure is ill-equipped to keep up with mutations in the E-Coli culture. It takes much longer to create a plan for a particular bacteria culture than it does for the culture to spontaneously change in significant ways.
Some E-Coli values may not be detectible in a lab environment. For example, if E-Coli cells evolved to excrete chemicals that promote growth of other, potentially symbiotic cells, values associated with that behavior will likely go undetected.
In expressing this plan, I make the assumption that the behavior of the E-Coli cell reflects its values. This presents a theoretical limit on how precisely we can specify the cell’s values, as we are not equipped to detect where behavior and volition diverge.
The only example I can think of is with parents and their children. Evolutionarily, parents are optimized to maximize the odds that their children will survive to reproduce, up to and including self-sacrifice to that end. However, parents do not possess ideal information about the current state of their child, so they must undergo a process resembling value alignment to learn what their children need.
At that point I think we’re running the risk of passing the buck forever. (Unless we can prove that process terminates.)
I am inclined to believe that indeed the buck will get passed forever. This idea you raise is remarkably similar to the Procrastination Paradox (which you can read about at https://intelligence.org/files/ProcrastinationParadox.pdf).
I should clarify that the discounting is not a shackle, per se, but a specification of the utility function. It’s a normative specification that results now are better than results later according to a certain discount rate. An AI that cares about results now will not change itself to be more “patient” – because then it will not get results now, which is what it cares about.
The key is that the utility function’s weights over time should form a self-similar graph. That is, if results in 10 seconds are twice as valuable as results in 20 seconds, then results in 10 minutes and 10 seconds need to be twice as valuable as results in 10 minutes and 20 seconds. If this is not true, the AI will indeed alter itself so its future self is consistent with its present self.