I’ve only read a couple of pieces on corrigibility but curious why the following scheme wouldn’t work/ I haven’t seen any work in this direction (from my admittedly v brief scan).
Suppose for every time t some decision maker would choose a probability p(t) with which the AI would shutdown. Further, suppose the AI’s utility function in any period was initially U(t). Now scale the new AI’s utility function to be U(t)/(1-p(t))- I think this can be quite easily generalized so future periods of utility are likewise unaffected by an increase in the risk of shutdown (eg. scale them by the product of (1-p(t)) over all intervening time periods).
In this world the AI should be indifferent over changes to p(t) (as long as it gets arbitrarily close to 1 but never reaches it) and so should take actions trying to maximize U whilst being indifferent to if humans decide to shut it down.
Just adding some additional context that might be useful. PredictIt is a similar election betting platform but has a cap on the maximum amount traders are able to bet (I think <$1k, so relatively low). This means that if Polymarket is a money-weighted information aggregation mechanism, PredictIt is a person-weighted information aggregation mechanism. As noted in the post, from 6th October to just now Trump has gone from 50.8 to 60.1 meaning a difference of 1.6 cents with Kamala to a 20.2 cent difference (18.6 cent swing). On predictit he’s gone from 51:53 to 55:48 on the same interval (9 cent swing). Of course it’s possible that people on PredictIt are changing their bets due to Polymarket prices (eg. by arbitraging), but there’s some evidence that at least half the change isn’t due to these large bettors (whether those large bettors are trading based on private information, for manipulation purposes, or other reasons).