Neuron-level Interpretation of Deep NLP Models: A Survey
The proliferation of deep neural networks in various domains has seen an increased need for interpretability of these methods. A plethora of research has been carried out to analyze and understand components of the deep neural network models. Preliminary work done along these lines and papers that surveyed such, were focused on a more high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level, analyzing neurons and groups of neurons in these large models. In this paper, we survey work done on fine-grained neuron analysis including: i) methods developed to discover and understand neurons in a network, ii) their limitations and evaluation, iii) major findings including cross architectural comparison that such analyses unravel and iv) direct applications of neuron analysis such as model behavior control and domain adaptation along with potential directions for future work.
READ FULL TEXT