Finding patterns in Knowledge Attribution for Transformers
We analyze the Knowledge Neurons framework for the attribution of factual and relational knowledge to particular neurons in the transformer network. We use a 12-layer multi-lingual BERT model for our experiments. Our study reveals various interesting phenomena. We observe that mostly factual knowledge can be attributed to middle and higher layers of the network(≥ 6). Further analysis reveals that the middle layers(6-9) are mostly responsible for relational information, which is further refined into actual factual knowledge or the "correct answer" in the last few layers(10-12). Our experiments also show that the model handles prompts in different languages, but representing the same fact, similarly, providing further evidence for effectiveness of multi-lingual pre-training. Applying the attribution scheme for grammatical knowledge, we find that grammatical knowledge is far more dispersed among the neurons than factual knowledge.
READ FULL TEXT