I tried code interpreter on some of the D&D.Sci challenges here. As expected, it failed miserably at generating any useful insights. It also had some egregious logic errors. I didn’t, but should have, expected this.
For example on https://www.lesswrong.com/posts/2uNeYiXMs4aQ2hfx9/d-and-d-sci-5e-return-of-the-league-of-defenders the dataset is three columns of green team comp, three of blue team comp, and a win/loss result. To get an idea of which picks win against the known opponent team, it grabbed all games with that team participating, found the games where the other team won, and did some stats on the other team’s comp. Except no, instead, it forgot that it had grabbed games where green was that comp and where blue was that comp, so actually it checked for when blue won and did stats on all of those, aka half the “winning opponent teams” were just the original comp. Its analysis included “maybe just mirror them, seems to work quite well”.
I tried code interpreter on some of the D&D.Sci challenges here. As expected, it failed miserably at generating any useful insights. It also had some egregious logic errors. I didn’t, but should have, expected this.
For example on https://www.lesswrong.com/posts/2uNeYiXMs4aQ2hfx9/d-and-d-sci-5e-return-of-the-league-of-defenders the dataset is three columns of green team comp, three of blue team comp, and a win/loss result. To get an idea of which picks win against the known opponent team, it grabbed all games with that team participating, found the games where the other team won, and did some stats on the other team’s comp. Except no, instead, it forgot that it had grabbed games where green was that comp and where blue was that comp, so actually it checked for when blue won and did stats on all of those, aka half the “winning opponent teams” were just the original comp. Its analysis included “maybe just mirror them, seems to work quite well”.