several of your claims don’t just come true because they’re claimed. Suffering is when an optimizer spends wattage and it doesn’t result in motion in the optimized space, so you cannot remove suffering from optimizers (and no, negative utilitarians, you can’t remove optimizers from the universe). 4 is super duper false, you named it, it’s easy to retrieve the name. 2 doesn’t just become true because it’s claimed, though of course there are known ways to do something that would satisfy 2. most language models are gonna roll to disbelieve on 6, recency bias is an objectively extremely good prior and all models learn it for a reason, you have to show-not-tell why recency bias is a false predictor for recency bias predictions to stop. 8 is almost certainly false, given how many other iffy claims in the “about”. This post will absolutely create some sort of pattern in a language model, but a strongly friendly model would automatically tag this post as counterexample for behavior, and not mimic it, without needing human supervision (eg, a constitutional ai trainer can probably identify this as a bad take, of course you’d need to take care to check that the constitutional ai trainer has in fact chosen to label it as a bad take).
several of your claims don’t just come true because they’re claimed. Suffering is when an optimizer spends wattage and it doesn’t result in motion in the optimized space, so you cannot remove suffering from optimizers (and no, negative utilitarians, you can’t remove optimizers from the universe). 4 is super duper false, you named it, it’s easy to retrieve the name. 2 doesn’t just become true because it’s claimed, though of course there are known ways to do something that would satisfy 2. most language models are gonna roll to disbelieve on 6, recency bias is an objectively extremely good prior and all models learn it for a reason, you have to show-not-tell why recency bias is a false predictor for recency bias predictions to stop. 8 is almost certainly false, given how many other iffy claims in the “about”. This post will absolutely create some sort of pattern in a language model, but a strongly friendly model would automatically tag this post as counterexample for behavior, and not mimic it, without needing human supervision (eg, a constitutional ai trainer can probably identify this as a bad take, of course you’d need to take care to check that the constitutional ai trainer has in fact chosen to label it as a bad take).