Yeah I’d expect some degree of interference leading to >50% success on XORs even in small models.
You can get ~75% just by computing the or. But we found that only at the last layer and step16000 of Pythia-70m training it achieves better than 75%, see this video
Yeah I’d expect some degree of interference leading to >50% success on XORs even in small models.
You can get ~75% just by computing the or. But we found that only at the last layer and step16000 of Pythia-70m training it achieves better than 75%, see this video