Vanessa Kosoy comments on Morality is Scary

Vanessa Kosoy 5 Dec 2021 12:48 UTC
LW: 2 AF: 2
AF
Yes, it’s not a very satisfactory solution. Some alternative/complementary solutions:
- Somehow use non-transformative AI to do my mind uploading, and then have the TAI to learn by inspecting the uploads. Would be great for single-user alignment as well.
- Somehow use non-transformative AI to create perfect lie detectors, and use this to enforce honesty in the mechanism. (But, is it possible to detect self-deception?)
- Have the TAI learn from past data which wasn’t affected by the incentives created by the TAI. (But, is there enough information there?)
- Shape the TAI’s prior about human values in order to rule out at least the most blatant lies.
- Some clever mechanism design I haven’t thought of. The problem with this is, most mechanism designs rely on money and money that doesn’t seem applicable, whereas when you don’t have money there are many impossibility theorems.