Maybe “logical counterfactuals” are also relevant here (in the way I’ve used them in this post). For example, consider a reward function that depends on whether the first 100 digits after the 10100th digit in the decimal representation of π are all 0. I guess this example is related to the “closest non-expert model” concept.
Maybe “logical counterfactuals” are also relevant here (in the way I’ve used them in this post). For example, consider a reward function that depends on whether the first 100 digits after the 10100th digit in the decimal representation of π are all 0. I guess this example is related to the “closest non-expert model” concept.