Is this AI assessing things from a subjective point of view or objective? ie, is it optimizing for higher values on a counter on board the ship from one turn to the next or is it optimizing for a higher values on a counter kept in a gods-eye-view location?
You spent a great deal of your post explaining the testing scenario but I’m still entirely unclear about how your AI is supposed to learn from sample trajectories (training data?).
Assuming it’s given a training set which includes many trajectories where human pilots have killed themselves by going to +3 or −3 you might also run into the problem that those trajectories are also likely to be ones where the ships never reached the station with velocity zero because there were no humans alive on board to bring the ship in.
To the AI they would look like trajectories which effectively go to negative infinity because the ship has accelerated into the void never to be seen again, receiving unlimited −1 points per turn.
It might assume that +3 −3 accelerations cause automatic failure but not make any connection to PA.
To be clear, are you hoping that your AI will find statistical correlations between high-scores and various behaviors without you needing to clearly outline them? how is this different from a traditional scoring function when given training data?
Also I’m completely unclear about where “the discomfort of the passenger for accelerations” comes into the scoring. If the training data includes many human pilots taking +2 −2 trips and many pilots taking +1 −1 trips and the otherwise identical +2 −2 trips get better scores there’s no reason (at least that you’ve given) for your AI to derive that +1 −1 is better. Indeed it might consider the training data to have been included to teach it that +2 −2 is always better.
To be clear, in this model, the sample trajectories will all be humans performing perfectly—ie only accelerations of +1 and −1, for every turn (expect maybe two turns).
Informally, what I expect the AI to do is think “hum, the goal I’m given should imply that accelerations of +3 and −3 give the optimal trajectories. Yet the trajectories I see do not use them. Since I know the whole model, I know that accelerations of +3 and −3 move me to PA=0. It seems likely that there’s a penalty for reaching that state, so I will avoid it.
Hum, but I also see that no-one is using accelerations of +2 and −2. Strange. This doesn’t cause the system to enter any new state. So there must be some intrinsic penalty in the real reward function for using such accelerations. Let’s try and estimate what it could be...”
OK, so the AI simply apes humans and doesn’t attempt to optimize anything?
Lets imagine something we actually might want to optimize.
lets imagine that the human pilots take 10 turns to run through a paper checklist at the start before any acceleration is applied. The turnover in the middle is 10 turns instead of 2 simply because the humans take time to manually work the controls and check everything.
Do we want our AI to decide that there simply must be some essential reason for this and sit still for no reason, aping humans without knowing why and without any real reason? It could do all checks in a single turn but doesn’t actually start moving for 10 turns because of artificial superstition?
And now that we’re into superstitions, how do we deal with spurious correlations?
If you give it a dataset of humans doing the same task many times with little variation but by pure chance whenever [some condition is met] the pilot takes one extra turn to start moving because he was picking his nose or just happened to take slightly longer does the AI make sure to always take 11 turns when those conditions are met?
Self driving cars which simply ape the car in front and don’t attempt to optimise anything already exist and are probably quite safe indeed but they’re not all that useful for coming up with better ways to do anything.
Yes. But it might not be as bad as the general AI problem. If we end up with an AI that believes that some biases are values, it will waste a lot of effort, but might not be as terrible as it could be. But that requires a bit more analysis than my vague comment, I’ll admit.
Is this AI assessing things from a subjective point of view or objective? ie, is it optimizing for higher values on a counter on board the ship from one turn to the next or is it optimizing for a higher values on a counter kept in a gods-eye-view location?
You spent a great deal of your post explaining the testing scenario but I’m still entirely unclear about how your AI is supposed to learn from sample trajectories (training data?).
Assuming it’s given a training set which includes many trajectories where human pilots have killed themselves by going to +3 or −3 you might also run into the problem that those trajectories are also likely to be ones where the ships never reached the station with velocity zero because there were no humans alive on board to bring the ship in.
To the AI they would look like trajectories which effectively go to negative infinity because the ship has accelerated into the void never to be seen again, receiving unlimited −1 points per turn.
It might assume that +3 −3 accelerations cause automatic failure but not make any connection to PA.
To be clear, are you hoping that your AI will find statistical correlations between high-scores and various behaviors without you needing to clearly outline them? how is this different from a traditional scoring function when given training data?
Also I’m completely unclear about where “the discomfort of the passenger for accelerations” comes into the scoring. If the training data includes many human pilots taking +2 −2 trips and many pilots taking +1 −1 trips and the otherwise identical +2 −2 trips get better scores there’s no reason (at least that you’ve given) for your AI to derive that +1 −1 is better. Indeed it might consider the training data to have been included to teach it that +2 −2 is always better.
To be clear, in this model, the sample trajectories will all be humans performing perfectly—ie only accelerations of +1 and −1, for every turn (expect maybe two turns).
Informally, what I expect the AI to do is think “hum, the goal I’m given should imply that accelerations of +3 and −3 give the optimal trajectories. Yet the trajectories I see do not use them. Since I know the whole model, I know that accelerations of +3 and −3 move me to PA=0. It seems likely that there’s a penalty for reaching that state, so I will avoid it.
Hum, but I also see that no-one is using accelerations of +2 and −2. Strange. This doesn’t cause the system to enter any new state. So there must be some intrinsic penalty in the real reward function for using such accelerations. Let’s try and estimate what it could be...”
OK, so the AI simply apes humans and doesn’t attempt to optimize anything?
Lets imagine something we actually might want to optimize.
lets imagine that the human pilots take 10 turns to run through a paper checklist at the start before any acceleration is applied. The turnover in the middle is 10 turns instead of 2 simply because the humans take time to manually work the controls and check everything.
Do we want our AI to decide that there simply must be some essential reason for this and sit still for no reason, aping humans without knowing why and without any real reason? It could do all checks in a single turn but doesn’t actually start moving for 10 turns because of artificial superstition?
And now that we’re into superstitions, how do we deal with spurious correlations?
If you give it a dataset of humans doing the same task many times with little variation but by pure chance whenever [some condition is met] the pilot takes one extra turn to start moving because he was picking his nose or just happened to take slightly longer does the AI make sure to always take 11 turns when those conditions are met?
Self driving cars which simply ape the car in front and don’t attempt to optimise anything already exist and are probably quite safe indeed but they’re not all that useful for coming up with better ways to do anything.
That’s the next step, once we have the apping done correctly (ie distinguishing noise from performance).
That, actually, is a very very big problem.
Yes. But it might not be as bad as the general AI problem. If we end up with an AI that believes that some biases are values, it will waste a lot of effort, but might not be as terrible as it could be. But that requires a bit more analysis than my vague comment, I’ll admit.