Learning Disentangled Representations of Texts with Application to Biomedical Abstracts
We propose a method for learning disentangled sets of vector representations of texts that capture distinct aspects. We argue that such representations afford model transfer and interpretability. To induce disentangled embeddings, we propose an adversarial objective based on the (dis)similarity between triplets of documents w.r.t. specific aspects. Our motivating application concerns embedding abstracts describing clinical trials in a manner that disentangles the populations, interventions, and outcomes in a given trial. We show that the induced representations indeed encode these targeted clinically salient aspects and that they can be effectively used to perform aspect-specific retrieval. We demonstrate that the approach generalizes beyond this motivating example via experiments on two multi-aspect review corpora.
READ FULL TEXT