I’ve recently become interested in holding some competent opinions on FAI. Trying these on for size:
FAI is like a thermostat. The thermostat does not set individual particles in motion, but measures and responds to particles moving in a particular average range. Similarly, FAI measures whether the world is a Nice Place to Live and makes corrections as needed to keep it that way.
Before we can have mature FAI, there is the initial dynamic or immature FAI. This is a program with a very well thought out, tested, reliable architecture that not only contains a representation of Friendliness, but is designed to keep that as part of its fundamental search patterns. As it searches for self-modifications, it passes each potential modification through a filter which rejects any change that fails to provably preserve the Friendliness goal.
Since provability is tricky, many optimizations which would preserve Friendliness could be rejected due to a lack of a strategy to prove them. This seemingly implies that a reliable system with non-trivial things needing proved will be slower to self-improve than a kludgey system with simpler goals like maximizing computronium.
I’ve recently become interested in holding some competent opinions on FAI. Trying these on for size:
FAI is like a thermostat. The thermostat does not set individual particles in motion, but measures and responds to particles moving in a particular average range. Similarly, FAI measures whether the world is a Nice Place to Live and makes corrections as needed to keep it that way.
Before we can have mature FAI, there is the initial dynamic or immature FAI. This is a program with a very well thought out, tested, reliable architecture that not only contains a representation of Friendliness, but is designed to keep that as part of its fundamental search patterns. As it searches for self-modifications, it passes each potential modification through a filter which rejects any change that fails to provably preserve the Friendliness goal.
Since provability is tricky, many optimizations which would preserve Friendliness could be rejected due to a lack of a strategy to prove them. This seemingly implies that a reliable system with non-trivial things needing proved will be slower to self-improve than a kludgey system with simpler goals like maximizing computronium.