the gears to ascension comments on Can we create self-improving AIs that perfect their own ethics?

the gears to ascension 30 Jan 2024 23:08 UTC
2 points
0
do you know of any other sketches of how to measure that are reasonably close to mechanically specified?
- Dagon 31 Jan 2024 18:32 UTC
  6 points
  0
  Parent
  Simulation or hidden Schelling fences seem to be the main mechanisms. I have seen zero ideas of WHAT to measure on a “true alignment” level. They all seem to be about noticing specific problems. I have seen none that try to quantify a “semi-aligned power” tradeoff between the good it does and the harm it does.
  I think Eliezer’s early writing (and, AFAIK, current thinking) that it must be perfect or all is lost, with nothing in between, probably makes the goal impossible.
- ChristianKl 31 Jan 2024 14:37 UTC
  2 points
  0
  Parent
  You can measure AlphaGo’s ability to play go by letting it play go which you can very well mechanically specify. Just let it play a game against a pro. We don’t have a similar measurement for ethics.