Are Your Explanations Reliable? Investigating the Stability of LIME in Explaining Textual Classification Models via Adversarial Perturbation
Local Surrogate models have increased in popularity for use in explaining complex black-box models for diverse types of data, including text, tabular, and image. One particular algorithm, LIME, continues to see use within the field of machine learning due to its inherently interpretable explanations and model-agnostic behavior. But despite continued use, questions about the stability of LIME persist. Stability, a property where similar instances result in similar explanations, has been shown to be lacking in explanations generated for tabular and image data, both of which are continuous domains. Here we explore the stability of LIME's explanations generated on textual data and confirm the trend of instability shown in previous research for other data types.
READ FULL TEXT