Our results below show that process supervision in fact incurs a negative alignment tax
Some compelling arguments are given that alignment tax would be negative when this method is used to improve safety. The actual experimental results are about improving/eliciting capabilities and don’t explore application of the method for safety, except by drawing an analogy.
Some compelling arguments are given that alignment tax would be negative when this method is used to improve safety. The actual experimental results are about improving/eliciting capabilities and don’t explore application of the method for safety, except by drawing an analogy.