Noosphere89 comments on Adversarial Policies Beat Professional-Level Go AIs

Noosphere89 3 Nov 2022 17:40 UTC
1 point
−1
Yeah, this is burying the lede here.

However, there isn’t a platonic form of Go rules, so what rules you make really matters.
- ChristianKl 3 Nov 2022 18:49 UTC
  7 points
  0
  Parent
  Yes, there are multiple rule sets. Under all of those that humans use to score their games, KataGo wins in the examples.
  As they put it on the linked website:
  We score the game under Tromp-Taylor rules as the rulesets supported by KGS cannot be automatically evaluated.
  It’s complex to automatically evaluate Go positions according to the rules that humans use. That’s why people in the computer Go invented their own rules to make positions easier to evaluate which are the Tromp-Taylor rules.
  Given the target audience of KataGo wasn’t playing Computer bots, the KataGo developers went through the trouble of modifying the Tromp-Taylor rules to be more like the rulesets that humans use to score their games and then used the new scoring algorithm to train KataGo.
  KataGo’s developers put effort into aligning KataGo with the desires of human users and it pays off in KataGo behaving in the scenarios the paper listed in the way humans would want it to behave instead of behaving optimally according to Tromp-Taylor rules.
  We have this in a lot of alignment problems. The metrics that are easy for computers to use and score are often not what humans care about. The task of alignment is about how to get our AI not goodhard on the easy metric but to focus on what we care about.
  It would have been easier to create KataGo in a way that wins in the examples of the paper than to go through the effort of making KataGo behave the way it does in the examples.